This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

NRF9160 - BSD lib crash - UDP recv() - potential security issue

I am using nrfconnect 1.5.1 (zephyr/command line)  with modem firmware 1.2.3 and believe I have found a fairly serious bug/behavior in the UDP API.    It is pretty easy "crash" the NRF9160 with small amounts of UDP data.

Details:

In my Application, I am sending a small 100 byte UDP packet once a minute.   The server on the receiving end responds with two UDP messages sent back to back (40 bytes followed by 96 bytes).           I use a standard open/connect/send/recv pattern.   They key piece of information is that I am using recv() in a non-blocking fashion.   

recv(client_fd, (uint8_t *)&RcvBuf[0]sizeof(RcvBuf), MSG_DONTWAIT)

I periodically call recv() and check to see if data is returned.   The UDP  recv() gets called in a zephyr delayed work queue handler. I would notice that after a day or two,    system work queue thread would crash/exit but the shell thread would be active.     What I found was that if I UDP data is received faster than the poll rate of my recv() function,   bsd lib would hang.        

To trigger the issue more quickly,  I would have the server return QTY (50)   96 byte messages when it recieves a report from NRF9160.   If I disable the recv() call in my logic,   the NRF9160 will crash quickly.   It appears that if data is not read from from a socket,  the bsd_lib will simply crash.   I think the proper behavior is for bsd_lib to drop unread packets if its internal buffers are full.   Calling recv() more quickly (every 10mS)  helps but doesn't solve the issue.   

CONFIG_NRF_MODEM_LIB_HEAP_SIZE=2048

CONFIG_NRF_MODEM_LIB_SHMEM_RX_SIZE= 16384
"Crash":
In my application, all of the logic is processing on the zephyr system work queue thread.   I also have a shell configured.    I am using the term "crash" to indicate that the system work queue thread appears to exit/terminate.     When I trigger the condition,    I notice that my logic messages cease from my logic.   However,  the shell is still active.    I attempted to use a debugger but I could not find any information about the state of the worker queue thread.   It simple behaves like the work queue thread has exited.
 
To recreate:
Simply open a socket for UDP datagrams and never read the socket.    Send data to the socket/port and the NRF9160 will eventually freeze.     Note that I always run the shell and the its thread will keep running.     It also seems like the main thread exits.  I have tried to attach a debugger and cannot detect where the bsd_lib hangs.    
I do believe this is all on the receive side.   If I disable messages being sent back from my server,  I never see an issue.
I am also going to run a test where I send UDP data to unbound sockets.  I hope this doesn't cause issues as the NRF9160 would be security risk on a public IP network.  There is a potential for lots of unsolicited UDP frames.
Questions:
  • Are there known issues with UDP on the NRF9160 bsd_lib?
  • Has the bsd_lib been fuzz tested for UDP traffic? I am concerned that a small amount of unread packets can bring down the system.
  • Can you look into the bsd_lib/modem_lib as to the behavior with unread packets?
  • Is the source for nrfxlib/nrf_modem available?    It looks like this is where the issue may lie.

 
Parents Reply Children
  • Hello Emdi:

     If you read the data after the Heap has become full, you should see both the Heap being emptied and your send() call unblocking.

    Please read the problem report more carefully.   Once this issue occurs,    it is unrecoverable.      I initially discovered in my application that performs recv().    Disabling recv() allows us to demonstrate the error more quickly.   (seconds instead of hours/days).     Even with recv() being called frequently,   several packets coming in quickly can lock up the modem library.

    Once the allocator fails,  it looks like the modem library blocks forever (verified w/ debugger).     It is not possible to perform any network operations at that point.     The library is closed source so the developers need to look into it.

  • Hi,

    if you do not read incoming data, it is expected that the Heap becomes full and allocations start to fail so I don't see that as an indication of a leak.

    To sum up what I gathered so far:

    - you observe that when you stop reading, the send() call will block; this is a known issue.

    - you observe that when you stop reading, the heap becomes full and allocation fails; this is expected behavior

    - you observe that when you do read, the send() function will block eventually; this needs investigation

    Are you able to observe the heap filling up and allocations failing when you keep reading from the socket? That would be an indication of a leak.

  • Hello:

    To clarify,    the issue happens if you read() "slowy".   For example,  if my thread sleeps for 250mS in between read calls.   If several packets come in quickly, send() will block forever.    The condition appears to be unrecoverable other than a watchdog reset.

    Having separate threads for send and recv() may help but makes the code/logic  messy.

    The most stable workaround is 

    1.)   Implement a task_wdt system that will reset the system if there is a lockup.

    2.)   When calling recv(),   do something like this in a while loop:

        while((rcvd = recv(krusty_client_fd, (uint8_t *)&krusty_rx_buf, sizeof(krusty_rx_buf), MSG_DONTWAIT) ) >0)

    This will help drain the queue.      In my case,  I have send/recv in the same thread/state machine.    If I do not use the while() loop,  the system can lock up in the time in between the send() and recv() call.

    3.)  Minimize an delay/sleep time in between a send() and recv() call.    Initially my network thread had some sleep calls in the state machine and I had to make sure I call recv() frequently.      

    4.)  Make the packet heap as large as possible.

    It is pretty easy to send a bunch of packets from a server quickly to cause the lockup.

    It would be nice if the modem library had its source code in the NCS/Zephyr repository.     Simply having an option to never have send() block would be beneficial (at the expense of dropping some incoming packets).  Applications based upon UDP expect  potential packet loss anyway.

     

Related