NB-IoT packet loss

I'm testing a simple application based on the UDP/CoAP example.  It samples some peripherals and ADC channels, these values are then packed into a CoAP message. This CoAP message is sent trough the socket api (UDP), and the application waits for a response. I've run some tests using different power saving settings. The current version uses PSM in between samples and uses RAI to prevent long time spent in the 'active timer'.

A observation we made after analyzing the logs is the high rate of timeouts/packet loss. We declare a packet lost if the uplink CoAP packet, or the downlink CoAP ACK message is lost. We got a packet loss of around 3% over around 1000 messages (multiple devices, connected to the same cell). Questions:

  • I known UDP is a unreliable protocol by design, but I'm wondering what packet loss numbers I should expect? What numbers are other users experiencing?
  • Is there some way to identify the cause of these lost packets, using the PCAP/modem log files for example?

The devices all have a Taoglas MFX3 antenna connected, the RSSI is around -85dBm (%CESQ: 54,2,21,3).

  • 3% packet loss sounds on the high side with good signal quality - is this consistent across requests and responses? What is the connection density - are they operating in an urban environment, how many devices are connected to the same cell, and how often do they transmit? When they transmit, do they all transmit simultaneously? Latency may be a factor It would be interesting to log the round trip time and see if this coincides with packet loss/ time of day.

    From what I've seen, NB-IoT uplinks had a higher rate of success. CoAP is flexible with retransmission schemes, so this may not be a problem. 

    sdk-nrf v1.9 and the nRF connect desktop app support PCAP trace capture.

  • If you write "timeouts/packet loss.", do you use CON requests? Or NON? (And of course, which client implementation?)

    In my experience with NB-IoT 3% seems to be quite large. So may be there are some other reasons for that.

    One common cause would be some "timing details" using RAI and/or PSM. If the CoAP layer send a message, the modem just start the sending. it takes sometimes a couple of seconds until that message is really send. I use therefore LTE_LC_EVT_RRC_UPDATE - LTE_LC_RRC_MODE_CONNECTED to start the timer for retransmission or response timeout. Also, if the timeout get larger and your device gets in LTE_LC_RRC_MODE_IDLE, you may need some extra time (I use 3s), in order to really receive the response.

    With that, using CON, my retransmission rate is less than 1% and the failure rate is close to 0.

  • The 3% seems to be consistent between requests and responses. The devices are indoors, in an industrial/office area. The number of  (our) devices connected to the cell are 3 max. at this moment. I did some test with a high rate, a upload every minute, and also tests with a rate of 10 minutes or >30 minutes, all test result in a loss of around 3%.

    I know CoAP has retransmissions, but I want to prevent them, the packet loss needs to be lower before I enable the retransmissions, to optimize the power usage. I will take a look at the timing, linking the timeouts to the RCC_MODE's is a good idea, although my timeout is currently longer than the RCC connected time.

    I will also setup a test in another area, maybe this cell is just busy, I think the number of NB-IoT devices here in the Netherlands is already quite high.

  • Though my modems are usually wake up from PSM, the time to get connected is sometimes large. In fast cases 2-3s, but 10-15s do also occur (not too often), even larger (luckily rare, I usually give up at 60s). 

    How long do you wait for your response?

    Not that long "waiting time" consumes more power then a retransmission in time.

  • > I will also setup a test in another area, maybe this cell is just busy, I think the number of NB-IoT devices here in the Netherlands is already quite high.

    Any update? Does it work better in the other area? Did you try to start your communication timers with RRC connected?

Related