Receiving MQTT Shadow Data before NRF_CLOUD_EVT_READY

Hello,

I'm currently debugging an issue we are having with MQTT device shadow interaction on the nrf9160. The issue presents when we push a large amount of config JSON to the device shadow. This typically happens when we are setting up a new device for the first time and need to set all the parameters in the "config" object via the REST API. The issue is when this large delta occurs the device will try and sync it via MQTT and it gets stuck in a connect/disconnect loop. 

I'm currently looking at once specific case that I can get to repeat multiple times. Where the following events occur in sequence:

1. NRF_CLOUD_EVT_TRANSPORT_CONNECTED occurs 

2. NRF_CLOUD_EVT_RX_DATA_SHADOW occurs with the large delta (in my case about 1.3Kb)

3.  NRF_CLOUD_EVT_USER_ASSOCIATED occurs

4.  NRF_CLOUD_EVT_TRANSPORT_DISCONNECTED occurs for reason  -> NRF_CLOUD_DISCONNECT_MISC

5. The LTE connection is reset (not a reboot) and it attempts to connect again and goes through the same sequence.

Now the device never gets a chance to send the shadow delta back up to the cloud because our firmware waits until NRF_CLOUD_EVT_READY before we send any data back to nrfCloud. And because we don't acknowledge the shadow delta it gets sent again the next time we connect, resulting in an endless loop.  

The question that comes to mind is why we are getting shadow data before receiving, NRF_CLOUD_EVT_READY? The documentation here seems to imply this shouldn't happen?

Any insight on how to debug this further would be greatly appreciated.

Thanks!

Parents
  • Hi Eric,

    This issue seems to happen before you push a large amount of config JSON to the device shadow.

    Are you able to repeat the issue with other official nRF Cloud samples like Cellular: nRF Cloud multi-service?

    Best regards,

    Charlie

  • Charlie,

    Our code is based on the nRF Cloud MQTT multi-service sample (from SDK v2.4.2) and has basically the same core connection management. I also attempted loading the unmodified sample onto the same hardware in question above and got basically the same results see below logs:

     - - - - - - - - - - - - - - - - TARGET RESET - - - - - - - - - - - - - - - - -
    *** Booting Zephyr OS build v3.3.99-ncs1-1 ***
    [00:00:00.250,946] \033[0m<dbg> led_control: start_led_pattern: LED Pattern Requested\033[0m
    [00:00:00.250,976] \033[0m<inf> connection: Setting up modem...\033[0m
    [00:00:10.185,699] \033[0m<inf> main: nRF Cloud MQTT multi-service sample has started.\033[0m
    [00:00:10.185,729] \033[0m<dbg> connection: await_lte_connection: Awaiting LTE Connection\033[0m
    [00:00:10.185,791] \033[0m<dbg> led_control: led_animation_thread_fn: LED Management Started\033[0m
    [00:00:10.185,791] \033[0m<dbg> led_control: led_animation_thread_fn: LED Pattern Started\033[0m
    [00:00:10.185,882] \033[0m<dbg> connection: consume_device_message: Consuming an enqueued device message\033[0m
    [00:00:10.199,737] \033[0m<dbg> connection: setup_modem: Modem library initialized.\033[0m
    [00:00:10.199,737] \033[0m<inf> connection: Setting up nRF Cloud library...\033[0m
    [00:00:10.207,855] \033[0m<inf> connection: Setting up LTE...\033[0m
    [00:00:10.208,557] \033[0m<inf> connection: Starting connection to LTE network...\033[0m
    [00:00:10.257,537] \033[0m<inf> connection: Connecting to LTE network. This may take several minutes...\033[0m
    [00:00:10.257,537] \033[0m<inf> connection: Waiting for connection to LTE network...\033[0m
    [00:00:10.257,568] \033[0m<dbg> led_control: start_led_pattern: LED Pattern Requested\033[0m
    [00:00:10.257,598] \033[0m<dbg> connection: await_lte_connection: Awaiting LTE Connection\033[0m
    [00:00:11.645,507] \033[0m<dbg> connection: lte_event_handler: LTE_EVENT: Network registration status 2, Searching\033[0m
    [00:00:11.645,568] \033[0m<dbg> connection: lte_event_handler: LTE_EVENT: LTE cell changed: Cell ID: 118021391, Tracking area: 24838\033[0m
    [00:00:11.645,629] \033[0m<dbg> connection: lte_event_handler: LTE_EVENT: Active LTE mode changed: LTE-M\033[0m
    [00:00:16.202,575] \033[0m<dbg> connection: lte_event_handler: LTE_EVENT: RRC mode: Connected\033[0m
    [00:00:18.461,151] \033[0m<dbg> connection: lte_event_handler: LTE_EVENT: Modem domain event, type: Search done\033[0m
    [00:00:18.466,369] \033[0m<dbg> connection: lte_event_handler: LTE_EVENT: Network registration status 5, Registered Roaming\033[0m
    [00:00:18.466,705] \033[0m<inf> connection: Connected to LTE network\033[0m
    [00:00:18.466,735] \033[0m<inf> connection: Connecting to nRF Cloud...\033[0m
    [00:00:18.466,735] \033[0m<inf> connection: Next connection retry in 30 seconds\033[0m
    [00:00:18.466,766] \033[0m<dbg> connection: await_cloud_connected: Awaiting Cloud Connection\033[0m
    [00:00:18.466,827] \033[0m<dbg> connection: await_cloud_ready: Awaiting Cloud Ready\033[0m
    [00:00:18.466,857] \033[0m<dbg> connection: cloud_event_handler: NRF_CLOUD_EVT_TRANSPORT_CONNECTING\033[0m
    [00:00:18.567,382] \033[0m<dbg> connection: lte_event_handler: LTE_EVENT: PSM parameter update: TAU: 3240, Active time: -1\033[0m
    [00:00:18.570,678] \033[0m<dbg> connection: await_cloud_ready: Awaiting Cloud Ready\033[0m
    [00:00:23.246,063] \033[1;31m<err> nrf_cloud_transport: MQTT input error: -128\033[0m
    [00:00:23.246,154] \033[1;31m<err> nrf_cloud_transport: Error disconnecting from cloud: -128\033[0m
    [00:00:23.246,185] \033[0m<dbg> connection: cloud_event_handler: NRF_CLOUD_EVT_TRANSPORT_DISCONNECTED\033[0m
    [00:00:23.246,215] \033[0m<inf> connection: nRF Cloud failed to become ready. Resetting connection.\033[0m
    [00:00:23.246,246] \033[0m<inf> connection: Disconnecting from nRF Cloud\033[0m
    [

  • But this log shows your device is not connected with the nRF Cloud at all.

    Have you correctly provisioned your device to nRF Cloud?  There is guidance about nRF Cloud Access Provisioning - Software and Protocols - Cellular IoT guides - Nordic DevZone (nordicsemi.com). You need to follow the flowchart "Own CA" branch for your product.

    Best regards,

    Charlie

Reply Children
  • Charlie,

    This device is provisioned. It was connecting to nrfCloud and functioning fine prior to us pushing the update to the shadow via MQTT. As I mentioned this i a very repeatable issue and I have done it on multiple devices that have been provisioned and working correctly prior to pushing the config update to REST. 

    Additionally, on our custom firmware as mentioned in the first post we are getting the shadow JSON that we pushed to the device we just never get the NRF_CLOUD_EVT_READY event and therefor don't process the shadow JSON. I imagine it wouldn't be possible to get the shadow for the device if it wasn't provisioned correctly on the cloud.  

    Eric

  • Charlie,

    Here is a log from the exact same device running our custom firmware (I loaded it again after running the MQTT sample mentioned above), note I have not changed the state of nrfCloud (shadow) or the device certs between firmware loads.

    You can can see that we get shadow data before receiving the NRF_CLOUD_EVT_READY event. We simply get NRF_CLOUD_EVT_TRANSPORT_DISCONNECTED occurs for reason - NRF_CLOUD_DISCONNECT_MISC (3)

  • Any update on this? This is preventing us from releasing our MQTT firmware.  

  • Hi Eric,

    Sorry for the late reply. I was occupied by other tasks last week. I will try to help you solve the issue this week if possible.

    I have tried to reproduce this issue with  Cellular: nRF Cloud multi-service today. NRF_CLOUD_EVT_RX_DATA_SHADOW indeed happens before NRF_CLOUD_EVT_USER_ASSOCIATED, but I did not see anything that prevents NRF_CLOUD_EVT_READY happening.
    Is it possible that some configuration from your JSON file disturbed the cloud connecting process? Could you share more detailed information to help me reproduce your failing test with nRF Cloud multi-service before?

    [00:00:02.223,602] <inf> connection: Connecting to nRF Cloud...
    [00:00:02.223,602] <inf> connection: Next connection retry in 30 seconds
    [00:00:02.223,632] <dbg> connection: await_cloud_connected: Awaiting Cloud Connection
    [00:00:02.223,724] <dbg> connection: await_cloud_ready: Awaiting Cloud Ready
    [00:00:02.223,754] <dbg> connection: cloud_event_handler: NRF_CLOUD_EVT_TRANSPORT_CONNECTING
    [00:00:02.224,639] <dbg> connection: lte_event_handler: LTE_EVENT: PSM parameter update: TAU: 9000, Active time: -1
    [00:00:02.227,905] <dbg> connection: await_cloud_ready: Awaiting Cloud Ready
    [00:00:04.647,521] <dbg> connection: cloud_event_handler: NRF_CLOUD_EVT_TRANSPORT_CONNECTED
    [00:00:05.166,015] <inf> nrf_cloud_codec_internal: AlertsEn changed to 0
    [00:00:05.166,015] <inf> nrf_cloud_codec_internal: LogLvl changed to 1
    [00:00:05.167,724] <dbg> connection: cloud_event_handler: NRF_CLOUD_EVT_RX_DATA_SHADOW
    [00:00:05.212,036] <dbg> connection: cloud_event_handler: NRF_CLOUD_EVT_USER_ASSOCIATED
    [00:00:05.806,671] <dbg> connection: cloud_event_handler: NRF_CLOUD_EVT_READY
    [00:00:05.806,732] <inf> connection: Connected to nRF Cloud
    [00:00:05.807,525] <inf> application: Waiting for modem to determine current date and time
    [00:00:05.807,556] <inf> application: Current date and time determined

    Best regards,

    Charlie

  • Charlie,

    The issue only happens when a large chunk of JSON is posted to the shadow via the REST API. As mentioned in the first post it's about 1.3KB of data that gets sent. 

    I can send you the JSON from our config object in a private message or you can reach out to me via email, if you want to test it on your device to reproduce my exact issue. However, I suspect it may have to do with the size of the update we are trying to make. 

    Another thing I will note is that this issue is made worse when you attempt to clean out the shadow by sending "config":null via the REST API because the device still has not be able to acknowledge the large config object write, so now two shadow updates are queued. The only way I have found to "reset" the shadow and clear the issue is to delete the device from nrfCloud and then re-add with new keys. 

    Eric

Related