Diagnosing AWS IoT Communication failure

Hello,

I'm tinkering (motion detection mostly) with a Thingy:91 unit (I don't have a DK board), and in general it works fine reporting sensory data to AWS IoT Core.

Last night it some point the communication suddenly stopped. Today I opened the nRF Connect's Serial Terminal and I saw this:


A bit more context:

  1. Faulty behaviour was noticed last Thursday (a week ago) and while searching for hints in this forum I found that some of the error messages (mostly -119 during that previous occasion) from this ticket match my case.
  2. I was just about to start gathering the traces mentioned and the communication suddenly got restored. The peculiar part was that in AWS MQTT Test Client I could only see the Shadow Updates (none of the other service topics).
  3. The situation remained like this up until last night when the next outage appeared, hence the output in the screenshot.

I spoke to my colleague who monitors closely the AWS activity and he said that everything in that side looks fine and unchanged (the lambdas ingesting the MQTT messages were acting optimally). And it can't be Amazon because we have some other types of devices that report MQTT messages through Wi-Fi successfully.

This leaves me to wonder if the mobile operator can be blamed for this outage. In my country currently there is only one operator supporting IoT data transfers and they support only NB-IoT (LTE-M is not an option).

The Question:

Should I try to get some modem traces and if the problem is with the mobile operator can these traces narrow down the exact issue?

Thanks and regards!

Parents Reply Children
  • Ivan Popov said:
    Here, in the archive there are some traces from today, including the required .bin file:

    Thanks, I will take a look.

    Ivan Popov said:
    Do you think it might give me some hints about what's going on if I go and compare both samples?

    They use entirely different libraries, so I doubt it is very useful.

  • Comment from modem team regarding the modem trace;

    Attached picture points to a problem in DNS query but modem log doesn’t reveal any issues in DNS.

    However, there seem to be problems in TLS data transmission. In all TLS connections the device (modem) closes TLS connection with error cause “Too long SSL fragment has been received from a network” meaning that server has sent a TLS packet that does not fit to devices TLS buffer. For the TLS buffer sizes there is following limitation mentioned in modem release notes:
    Limitations
    - TLS/DTLS Secure socket buffer size is 2kB

    The actual TLS record size is not visible in the log. Is customer able to check and reconfigure the TLS record size server is using.

  • > The actual TLS record size is not visible in the log. Is customer able to check and reconfigure the TLS record size server is using.

    Searching for that issue other tickets already mentions the solution for this years ago. AFAIK mbedtls supports that extension so it may be more a question to configure the modem build to use it in a matching way.

Related