This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

NRF9160 - BSD lib crash - UDP recv() - potential security issue

I am using nrfconnect 1.5.1 (zephyr/command line)  with modem firmware 1.2.3 and believe I have found a fairly serious bug/behavior in the UDP API.    It is pretty easy "crash" the NRF9160 with small amounts of UDP data.

Details:

In my Application, I am sending a small 100 byte UDP packet once a minute.   The server on the receiving end responds with two UDP messages sent back to back (40 bytes followed by 96 bytes).           I use a standard open/connect/send/recv pattern.   They key piece of information is that I am using recv() in a non-blocking fashion.   

recv(client_fd, (uint8_t *)&RcvBuf[0]sizeof(RcvBuf), MSG_DONTWAIT)

I periodically call recv() and check to see if data is returned.   The UDP  recv() gets called in a zephyr delayed work queue handler. I would notice that after a day or two,    system work queue thread would crash/exit but the shell thread would be active.     What I found was that if I UDP data is received faster than the poll rate of my recv() function,   bsd lib would hang.        

To trigger the issue more quickly,  I would have the server return QTY (50)   96 byte messages when it recieves a report from NRF9160.   If I disable the recv() call in my logic,   the NRF9160 will crash quickly.   It appears that if data is not read from from a socket,  the bsd_lib will simply crash.   I think the proper behavior is for bsd_lib to drop unread packets if its internal buffers are full.   Calling recv() more quickly (every 10mS)  helps but doesn't solve the issue.   

CONFIG_NRF_MODEM_LIB_HEAP_SIZE=2048

CONFIG_NRF_MODEM_LIB_SHMEM_RX_SIZE= 16384
"Crash":
In my application, all of the logic is processing on the zephyr system work queue thread.   I also have a shell configured.    I am using the term "crash" to indicate that the system work queue thread appears to exit/terminate.     When I trigger the condition,    I notice that my logic messages cease from my logic.   However,  the shell is still active.    I attempted to use a debugger but I could not find any information about the state of the worker queue thread.   It simple behaves like the work queue thread has exited.
 
To recreate:
Simply open a socket for UDP datagrams and never read the socket.    Send data to the socket/port and the NRF9160 will eventually freeze.     Note that I always run the shell and the its thread will keep running.     It also seems like the main thread exits.  I have tried to attach a debugger and cannot detect where the bsd_lib hangs.    
I do believe this is all on the receive side.   If I disable messages being sent back from my server,  I never see an issue.
I am also going to run a test where I send UDP data to unbound sockets.  I hope this doesn't cause issues as the NRF9160 would be security risk on a public IP network.  There is a potential for lots of unsolicited UDP frames.
Questions:
  • Are there known issues with UDP on the NRF9160 bsd_lib?
  • Has the bsd_lib been fuzz tested for UDP traffic? I am concerned that a small amount of unread packets can bring down the system.
  • Can you look into the bsd_lib/modem_lib as to the behavior with unread packets?
  • Is the source for nrfxlib/nrf_modem available?    It looks like this is where the issue may lie.

 
Parents
  • Hi, I am a litle bit curious regarding this issue. Does this only happen if the nrf9160 receives data?

    I have an application which sends udp data every 5-7 seconds, but I never read the recv because I do not expect any data to receive (it sends udp data to a server which does not respond, similiar to your udp-sample-test). the application seems to crash after  1hour-1h30min.

    I didn't debugged it yet, but I am just curious if this maybe something has to do with this issue :)

  • This issue became apparent in the read case for me.   Note that it also affect TCP as well.   I have not tested "write only".     I would turn on:

    CONFIG_NRF_MODEM_LIB_DEBUG_ALLOC=y
    CONFIG_NRF_MODEM_LIB_DEBUG_SHM_TX_ALLOC=y
    CONFIG_NRF_MODEM_LIB_HEAP_DUMP_PERIODIC=y
    CONFIG_NRF_MODEM_LIB_SHM_TX_DUMP_PERIODIC=y
    CONFIG_NRF_MODEM_LIB_LOG_LEVEL_DBG=y
    If it can happen in the write-only case,  then it should be apparent in the UDP example in nrf_connect as well (it writes data only).
    The "read" side of things makes it much easier to reproduce (within seconds)
Reply
  • This issue became apparent in the read case for me.   Note that it also affect TCP as well.   I have not tested "write only".     I would turn on:

    CONFIG_NRF_MODEM_LIB_DEBUG_ALLOC=y
    CONFIG_NRF_MODEM_LIB_DEBUG_SHM_TX_ALLOC=y
    CONFIG_NRF_MODEM_LIB_HEAP_DUMP_PERIODIC=y
    CONFIG_NRF_MODEM_LIB_SHM_TX_DUMP_PERIODIC=y
    CONFIG_NRF_MODEM_LIB_LOG_LEVEL_DBG=y
    If it can happen in the write-only case,  then it should be apparent in the UDP example in nrf_connect as well (it writes data only).
    The "read" side of things makes it much easier to reproduce (within seconds)
Children
No Data
Related