NB-IoT TCP Packet loss and retransmittions

Hardware: nrf9160DK
Modem FW: v1.3.1

We have been evaluating and configuring our proprietary application for NB-IoT, which is initially designed for LTE-M (CAT-M1). The Application is designed to exchange data with the cloud using MQTT over TLS and works fine with LTE-M.
However, when explicitly configured to connect to NB-IoT only network, we see a lot of TLS handshakes getting timed out (even after increasing the TLS timeout from 20 seconds for LTE-M to 90 seconds for NB-IoT).
Evaluating the Network capture files we observed many instances of TCP packet retries especially when a large data is exchanged over several TCP segments (the case where the server and clients exchange the certificates and keys during a TLS handshake).

To remove the dependency of our application and to just test the MQTT over TLS on the NB-IoT network, we switched to using the nrfConnect sample application "asset_tracker_v2".
The "asset_tracker_v2" application was configured to connect to NB-IoT only network and modem trace enabled.
After letting the application execute for a few hours, analysis was done using the application logs, modem trace and network capture.
We still see the same behaviour (TCP packet retransmissions) with the large data exchanges over several TCP segments.


Typical behaviour is observed, after receiving the first TCP segment of a large data packet, the modem seems to be not accepting any further segments for a period of approximately 20 seconds. (see the attached screenshot of the comparison between modem trace(left) and network capture (right))
Are we missing any configuration or can you give an explanation of the behaviour?

Also, attached is the Application Log, Modem Trace and Network capture for your further investigation.attachments.zip

  • Please see answer from our nRF91 team:

    There are two things that are causing delays and re-transmissions

    1. The network has not enabled enableStatusReportSN-Gap configuration in eNB. This means that UE can't report about missing RLC PDU once it has been noticed. Instead, the report of missing RLC PDU is sent after there is nothing to transmit and network polls the device. This brings the delay in attached picture the customer is asking for. RLC PDU from the first TCP segment is lost and UE must wait for re-transmission from network.
    2. TCP re-transmissions. Due to missing TCP ACKs the server starts to re-transmit TCP packets. This makes things even worse in lower layers as re-transmission of RLC PDUs happen when there is nothing to be send. TCP re-transmission makes the send queue even longer and missing RLC PDUs are sent with increasing delay.

    When it might be difficult to add support for enableStatusReportSN-Gap in network the only way would be trying to increase TCP re-transmission timeout in server side. This may  shorten the RLC PDU re-transmission delay and make the overall transmission faster.

    Kind regards,
    Øyvind

Related