Receiving MQTT Shadow Data before NRF_CLOUD_EVT_READY

Hello,

I'm currently debugging an issue we are having with MQTT device shadow interaction on the nrf9160. The issue presents when we push a large amount of config JSON to the device shadow. This typically happens when we are setting up a new device for the first time and need to set all the parameters in the "config" object via the REST API. The issue is when this large delta occurs the device will try and sync it via MQTT and it gets stuck in a connect/disconnect loop. 

I'm currently looking at once specific case that I can get to repeat multiple times. Where the following events occur in sequence:

1. NRF_CLOUD_EVT_TRANSPORT_CONNECTED occurs 

2. NRF_CLOUD_EVT_RX_DATA_SHADOW occurs with the large delta (in my case about 1.3Kb)

3.  NRF_CLOUD_EVT_USER_ASSOCIATED occurs

4.  NRF_CLOUD_EVT_TRANSPORT_DISCONNECTED occurs for reason  -> NRF_CLOUD_DISCONNECT_MISC

5. The LTE connection is reset (not a reboot) and it attempts to connect again and goes through the same sequence.

Now the device never gets a chance to send the shadow delta back up to the cloud because our firmware waits until NRF_CLOUD_EVT_READY before we send any data back to nrfCloud. And because we don't acknowledge the shadow delta it gets sent again the next time we connect, resulting in an endless loop.  

The question that comes to mind is why we are getting shadow data before receiving, NRF_CLOUD_EVT_READY? The documentation here seems to imply this shouldn't happen?

Any insight on how to debug this further would be greatly appreciated.

Thanks!

Parents
  • Hi Eric,

    Thanks for the detailed explanation. 

    This symptom let me think of a limitation of the modem: It can only handle packet sizes up to 2k buffer size on secure sockets due to limited RAM size. This is stated in the modem release notes limitations section.

    This can be verified by sending smaller JSON files. I am not sure if you have verified this before. If this is true, I believe splitting configurations into smaller JSON files could be a solution.

    If the problem still exists, you can try to collect a modem trace file. This will help to see what happened on the sockets level. It uses UART by default, but Modem tracing with RTT is also supported.

    Best regards,

    Charlie

  • Charlie,

    A few things:

    1. It doesn't make sense that if the buffer size was limited to 2K without some sort of mechanisum to breakup larger packets how am I getting the chucks of JSON downloaded before I disconnect? It's not until after I received the JSON that the socket disconnects. If it was a buffer limit I would expect it to fail while attempting to receive the JSON junk but, not some time after. Additionally, I have other units that are downloading the same shadow, in fact slightly larger shadows without issue. 

    2. How would I go about splitting the config into smaller files? My understand is that full shadow is sent on boot to sync the cloud and firmware and then after it switches to sending only delta changes. If this is the case, it will be sending all this data regardless of if I break it into sperate object in the same shadow. Is there any control or setting that firmware has to control how much is sent or is this all driven by nrfCloud side?

    3. I will attempt to capture I modem trace on my end, but it would be great if you could do the same on your end. I have given you everything you need to reproduce the issue. That way you could confirm the issue and suggest a course of action. 

    Thanks,

    Eric

  • Charlie,

    Here is a trace from our custom firmware. I let it cycle through 2 attempts to connect nrfCloud. Now I don't know if I'm reding the trace correctly but in wireshark the largest TCP frame is 760 and the largest TLS frame is 758.  I think the issue is present in this trace but I don't know how to interpret it. It seems like the modem is requesting 5 separate times to rest the connection just before RRC release. Some help interrupting would be great. 

      

    modemtrace.bin

  • Charlie,

    Additional update for you. I spent some time playing around with the shadow and a few things are apparent:

    1. If I clear out the "config" object with "config":null using the rest API and power cycle the device it can again gain a connection to nrfCloud.

    2. The "config" JSON that was attempting to be sent does seem to break if it exceeds > 2048 bytes. And seems to work if it's kept below this limit. 

    This does seem to point to the 2KB size limit you mentioned, although I would still like to see if this can be confirmed with the above trace. It then raises some additional questions:  

    1. The shadow can be a max of 8KB (4K reported / 4K desired) per the AWS spec. What prevents nrfCloud from sending more than 2KB to the device? If nothing is preventing >2K from being send, then does that mean that the shadow must be limited externally to 4K (2K reported / 2K desired) in total size?

    2. You mention splitting the "config" JSON up into smaller documents, can you go into more detail on how we would go about this? 

    Thanks,

    Eric

Reply
  • Charlie,

    Additional update for you. I spent some time playing around with the shadow and a few things are apparent:

    1. If I clear out the "config" object with "config":null using the rest API and power cycle the device it can again gain a connection to nrfCloud.

    2. The "config" JSON that was attempting to be sent does seem to break if it exceeds > 2048 bytes. And seems to work if it's kept below this limit. 

    This does seem to point to the 2KB size limit you mentioned, although I would still like to see if this can be confirmed with the above trace. It then raises some additional questions:  

    1. The shadow can be a max of 8KB (4K reported / 4K desired) per the AWS spec. What prevents nrfCloud from sending more than 2KB to the device? If nothing is preventing >2K from being send, then does that mean that the shadow must be limited externally to 4K (2K reported / 2K desired) in total size?

    2. You mention splitting the "config" JSON up into smaller documents, can you go into more detail on how we would go about this? 

    Thanks,

    Eric

Children
  • Hi Eric,

    If you send what you pasted before, the JSON chunk is over 2300 bytes including whitespace.  The problem is the device subscribing to one of the shadow topics which RECEIVES a too big update. In the modem trace, the connection is closed because they receive too big TLS fragment. As I said, nRF9160 has limited resources reserved for TLS socket handling. This limitation has nothing to do with the nRF Cloud/AWS 8KB shadow size requirement. As long as the MQTT/TCP(TLS) from nRF Cloud to nRF9160 packet size is less than 2KB, it should be OK.

    Splitting the "config" JSON into several update processes is like what you did already. Just update a smaller JSON with part of your configurations first.When they are accepted by the device and become "reported" in nRF Cloud, repeat to update the next part until you upload all of them.

    Best regards,

    Charlie

  • Charlie,

    First, I think that nrfCloud or AWS IoT core is removing the white-space and extra chars, because the actual shadow that is received has no extra white space and is only around 1.3Kb. 

    The only issue I see with what you mentioned about breaking up the shadow is that when the device re-boots and the session restarts, a full copy of the shadow is sent. Even if I had sent it in smaller chunks during the prior session. So even though I may not break it when I first place the data into the shadow it would break the next time the device would boot. 

    On boot I get the shadow in the following format:

    {
        "reported": {
            All reported object data, accept the config object
        }
        
        "desired":{
            All desired object data, accept the config object
        }
        "config":{
            Config obejct data
        }
    }

    nrfCloud seems to break the config object out from desired and reported and send it at the same level within the JSON as the first message sent at boot. So this suggest that:

    (desired_obj size - config_obj size) + (reported_obj size - config_obj size) + config_obj size < 2KB

    This is incredibly limiting, consider even the default shadow with base Nordic data is already around 600Bytes, leaving only around 1.4KB total for user defined config obj and other control data. 

    Is there any way around this? nrfCloud already seems to be modifying the format of what is sent back, it doesn't seem far fetch that nrfCloud could limit the packet size to be in line with the nrf9160's buffer size limitation. I have found a few other posts on DevZone of others with the same problem and no good answers from Nordic.

    (+) nrf9160 - MQTT over TLS - Socket buffer limit - AWS Shadow - Nordic Team Proposal - Nordic Q&A - Nordic DevZone - Nordic DevZone (nordicsemi.com)

    (+) Asset_tracker_v2 disconnect from AWS when shadow data is more than 2k - Nordic Q&A - Nordic DevZone - Nordic DevZone (nordicsemi.com)

    Are there any plans from Nordic at this point on a solution to this? 

  • Hi Eric,

    I am fully understand your concern now. I have sent a request to our nRF Cloud team to check if they can make changes according to your proposal from the nRF Cloud side.
    I will let you know if I get some updates.

    Best regards,

    Charlie

Related