Diagnosing AWS IoT Communication failure

Hello,

I'm tinkering (motion detection mostly) with a Thingy:91 unit (I don't have a DK board), and in general it works fine reporting sensory data to AWS IoT Core.

Last night it some point the communication suddenly stopped. Today I opened the nRF Connect's Serial Terminal and I saw this:


A bit more context:

  1. Faulty behaviour was noticed last Thursday (a week ago) and while searching for hints in this forum I found that some of the error messages (mostly -119 during that previous occasion) from this ticket match my case.
  2. I was just about to start gathering the traces mentioned and the communication suddenly got restored. The peculiar part was that in AWS MQTT Test Client I could only see the Shadow Updates (none of the other service topics).
  3. The situation remained like this up until last night when the next outage appeared, hence the output in the screenshot.

I spoke to my colleague who monitors closely the AWS activity and he said that everything in that side looks fine and unchanged (the lambdas ingesting the MQTT messages were acting optimally). And it can't be Amazon because we have some other types of devices that report MQTT messages through Wi-Fi successfully.

This leaves me to wonder if the mobile operator can be blamed for this outage. In my country currently there is only one operator supporting IoT data transfers and they support only NB-IoT (LTE-M is not an option).

The Question:

Should I try to get some modem traces and if the problem is with the mobile operator can these traces narrow down the exact issue?

Thanks and regards!

Parents
  • UPDATE

    I followed the Cellular Monitor section in the documentation. As an added value I updated both the SIP and SoC firmwares to the latest versions (with no change in the behaviour).

    I could test with a set of three applications:

    1. The main application (App A) I'm using - it is the aws_iot sample configured the proper way and with additions only appending sensor readings (ext_sensor.h & ext_sensor.c) to the shadow update - the MQTT (or overall connectivity) functionality was not touched.
    2. Basic MQTT client (App B) - only the aws_iot sample as it is configured with my credentials (tested was previously working - not touched since then).
    3. The full Nordic Asset Tracker v2 (App C) - the sample from the repository only configured with my credentials - the MQTT (or overall connectivity) functionality was not touched.

    The Wireshark for App A and App B showed sets of these:

    App A

    App B

    App C

    I can see that there is some resetting of the connection on the TCP protocol level but I'm not familiar with this type of logs and traces and I'm not sure how to extract the relations between the messages and the functionality in the code.

    Here I provide the relevant trace in two file formats in full

    aws-iot-mqtt-connection-problem.zip

    I hope there are enough hints in there so I can get a proper advise and help to figure out the problem. I'll provide any additional information if required.

    Almost forgot to mention that on the serial terminal the behviour looks like this:

    Regards!

Reply
  • UPDATE

    I followed the Cellular Monitor section in the documentation. As an added value I updated both the SIP and SoC firmwares to the latest versions (with no change in the behaviour).

    I could test with a set of three applications:

    1. The main application (App A) I'm using - it is the aws_iot sample configured the proper way and with additions only appending sensor readings (ext_sensor.h & ext_sensor.c) to the shadow update - the MQTT (or overall connectivity) functionality was not touched.
    2. Basic MQTT client (App B) - only the aws_iot sample as it is configured with my credentials (tested was previously working - not touched since then).
    3. The full Nordic Asset Tracker v2 (App C) - the sample from the repository only configured with my credentials - the MQTT (or overall connectivity) functionality was not touched.

    The Wireshark for App A and App B showed sets of these:

    App A

    App B

    App C

    I can see that there is some resetting of the connection on the TCP protocol level but I'm not familiar with this type of logs and traces and I'm not sure how to extract the relations between the messages and the functionality in the code.

    Here I provide the relevant trace in two file formats in full

    aws-iot-mqtt-connection-problem.zip

    I hope there are enough hints in there so I can get a proper advise and help to figure out the problem. I'll provide any additional information if required.

    Almost forgot to mention that on the serial terminal the behviour looks like this:

    Regards!

Children
No Data
Related