NB-IoT TCP Packet loss and retransmittions

Hardware: nrf9160DK
Modem FW: v1.3.1

We have been evaluating and configuring our proprietary application for NB-IoT, which is initially designed for LTE-M (CAT-M1). The Application is designed to exchange data with the cloud using MQTT over TLS and works fine with LTE-M.
However, when explicitly configured to connect to NB-IoT only network, we see a lot of TLS handshakes getting timed out (even after increasing the TLS timeout from 20 seconds for LTE-M to 90 seconds for NB-IoT).
Evaluating the Network capture files we observed many instances of TCP packet retries especially when a large data is exchanged over several TCP segments (the case where the server and clients exchange the certificates and keys during a TLS handshake).

To remove the dependency of our application and to just test the MQTT over TLS on the NB-IoT network, we switched to using the nrfConnect sample application "asset_tracker_v2".
The "asset_tracker_v2" application was configured to connect to NB-IoT only network and modem trace enabled.
After letting the application execute for a few hours, analysis was done using the application logs, modem trace and network capture.
We still see the same behaviour (TCP packet retransmissions) with the large data exchanges over several TCP segments.


Typical behaviour is observed, after receiving the first TCP segment of a large data packet, the modem seems to be not accepting any further segments for a period of approximately 20 seconds. (see the attached screenshot of the comparison between modem trace(left) and network capture (right))
Are we missing any configuration or can you give an explanation of the behaviour?

Also, attached is the Application Log, Modem Trace and Network capture for your further investigation.attachments.zip

Parents
  • Hi, 

    Thanks for your response.

     
    The device being tested is in UK and we are using a test NB-IoT network (using Nutaq).
    The problem is not with registering to a network. The problem is with modem not accepting the TCP segments (which are transmitted by the server, seen in the network  capture. however not getting received on the client-side, seen in the modem trace)
    Have a look at the image in the attachments.zip, where it shows a duration of approximately 20 seconds when the modem doesn't log any TCP packets whereas the server is sending them at the same time.

  • We have our test network which is configured for NB-IoT (NB1)
    I agree, you may be seeing the logs showing problems registering to the network as these are complete logs and we may be in the middle of changing some configuration or setting up the test network.
    That's why I am not concerned about the device not registering. What I am more concerned about is when it successfully registers to the network, it faces issues transferring data over TCP packets (The modem seems to not receive any packets for around 20 seconds, highlighted in the screenshot).

  • Please see answer from our nRF91 team:

    There are two things that are causing delays and re-transmissions

    1. The network has not enabled enableStatusReportSN-Gap configuration in eNB. This means that UE can't report about missing RLC PDU once it has been noticed. Instead, the report of missing RLC PDU is sent after there is nothing to transmit and network polls the device. This brings the delay in attached picture the customer is asking for. RLC PDU from the first TCP segment is lost and UE must wait for re-transmission from network.
    2. TCP re-transmissions. Due to missing TCP ACKs the server starts to re-transmit TCP packets. This makes things even worse in lower layers as re-transmission of RLC PDUs happen when there is nothing to be send. TCP re-transmission makes the send queue even longer and missing RLC PDUs are sent with increasing delay.

    When it might be difficult to add support for enableStatusReportSN-Gap in network the only way would be trying to increase TCP re-transmission timeout in server side. This may  shorten the RLC PDU re-transmission delay and make the overall transmission faster.

    Kind regards,
    Øyvind

Reply
  • Please see answer from our nRF91 team:

    There are two things that are causing delays and re-transmissions

    1. The network has not enabled enableStatusReportSN-Gap configuration in eNB. This means that UE can't report about missing RLC PDU once it has been noticed. Instead, the report of missing RLC PDU is sent after there is nothing to transmit and network polls the device. This brings the delay in attached picture the customer is asking for. RLC PDU from the first TCP segment is lost and UE must wait for re-transmission from network.
    2. TCP re-transmissions. Due to missing TCP ACKs the server starts to re-transmit TCP packets. This makes things even worse in lower layers as re-transmission of RLC PDUs happen when there is nothing to be send. TCP re-transmission makes the send queue even longer and missing RLC PDUs are sent with increasing delay.

    When it might be difficult to add support for enableStatusReportSN-Gap in network the only way would be trying to increase TCP re-transmission timeout in server side. This may  shorten the RLC PDU re-transmission delay and make the overall transmission faster.

    Kind regards,
    Øyvind

Children
No Data
Related