This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

LWM2M lifetime registration fails during modem delta erase

I have a product with an nRF9160 and using its modem to connect to a LWM2M server. I have recently enabled modem delta firmware updates, and have seen an annoying issue.

The first thing that is being done by the dfu_modem_delta code is to create a DFU socket and then erase the memory used for the delta update on the modem flash. This erase takes around 10-15 seconds in my case. After the erase is done, The data from the first block is used/written, then next block etc.

The lwm2m_rd_client/lwm2m_engine has a lifetime, used to re-register to the lwm2m server at an interval, to indicate that the client is still alive. What I noticed when doing modem delta update was that if I was unlucky and happened to want to re-register during the erase, the re-register fails, causing the client to lose connection to the server, and thus kind of aborting the modem update, since no more firmware blocks can be received until we have done a complete new registration, and then I have to actually re-initiate the download on the server side. And following downloads don't have to do the erase, but then I get errors from the modem, with type -5, basically indicating that the data is incorrect/corrupted (Maybe because the erase was interrupted? Or because we transfer wrong block of data?)

Is this a known problem? And what is the suggested way to avoid this?

I generally use a lifetime of 30 seconds, which gives around 50 % risk to get the error. I could force a re-register before doing the erase (to make sure I have ~30 seconds to do the erase), but that seems like only a temporary fix, and wouldn't work for shorter lifetimes.

Regards,

Andreas Pettersson

Parents
  • Hi again, follow up question: How come the erase is done on the whole block at once and not one section at a time? That's how it's done for DFU of the nrf9160 application. That would make each erase take shorter time, allowing for better interleaving.

    Regards,

    Andreas Pettersson

  • Hello Andreas, 

    What modem delta FW files are you using to download and update your device? How did you enable modem delta firmware in your application?

    Do you have any log output that can provide more details on this matter? What version of NCS are you running? 

    Kind regards,
    Øyvind

  • Hi,

    I have tried multiple different delta files, but mostly tested with 1.3.1_to_1.3.1-FOTA-TEST and 1.3.1-FOTA-TEST_to_1.3.1. And of course, when using those I have previously flashed the 1.3.1 modem version.

    My application is based on NCS v1.8.0.

    I have enabled modem delta with the CONFIG_DFU_TARGET_MODEM_DELTA, and I am calling the appropriate functions. And as I said, modem delta firmware does work, just not all the time because of the lifetime issue.

    Yes I could of course provide some logs for this, but I wanted to pose a more general question. For example, my follow up question is quite important I feel, why is the erase done for the whole section at once and not block by block when needed?

    Excerpt from log:

    [00:01:05.049,011] <dbg> net_lwm2m_engine: lwm2m_udp_receive: checking for reply from [MY_IP_ADDR]
    [00:01:05.049,102] <dbg> net_lwm2m_engine: handle_request: No accept option given. Assume OMA TLV.
    [00:01:05.049,255] <dbg> net_lwm2m_engine: lwm2m_write_handler: BLOCK1: total:13772 current:0 last:0
    [00:01:05.049,346] <dbg> net_lwm2m_engine: lwm2m_notify_observer: NOTIFY EVENT 5/0/3
    [00:01:05.049,346] <dbg> net_lwm2m_obj_firmware: lwm2m_firmware_set_update_state: Update state = 1
    [00:01:05.049,377] <inf> app_lwm2m_firmware: DFU image type: Modem Delta
    [00:01:05.049,438] <inf> dfu_target_modem_delta: Modem DFU Socket created
    [00:01:05.049,743] <inf> dfu_target_modem_delta: Modem firmware version: 492e44a8-1e75-48dc-843e-ffdc314af1aa
    [00:01:05.050,262] <inf> dfu_target_modem_delta: Deleting firmware image, this can take several minutes
    [00:01:08.493,774] <inf> lte_helpers: RRC mode: Idle
    [00:01:12.996,154] <inf> battery_monitor: Charge complete
    [00:01:13.397,003] <dbg> net_lwm2m_engine: lwm2m_engine_set: path:4/0/2, value:0x2002e2d7, len:1
    [00:01:13.885,528] <inf> dfu_target_modem_delta: Modem FW delete complete
    [00:01:13.885,559] <inf> app_lwm2m_firmware: Modem Delta firmware download started.
    [00:01:13.997,772] <dbg> net_lwm2m_engine: lwm2m_engine_get: path:1/0/1, buf:0x2003194c, buflen:4
    [00:01:13.997,863] <dbg> net_lwm2m_rd_client: sm_send_registration: registration sent [MY_IP_ADDR]
    [00:01:13.999,633] <dbg> net_lwm2m_engine: generate_notify_message: [MANUAL] NOTIFY MSG START: 5/0/0(2) token:'MY_TOKEN' [MY_IP_ADDR] 73999
    [00:01:13.999,725] <dbg> net_lwm2m_engine: generate_notify_message: NOTIFY MSG: SENT
    [00:01:14.063,781] <inf> lte_helpers: RRC mode: Connected
    [00:01:14.255,889] <dbg> net_lwm2m_engine: lwm2m_udp_receive: checking for reply from [MY_IP_ADDR]
    [00:01:14.255,981] <dbg> net_lwm2m_engine: handle_request: No accept option given. Assume OMA TLV.
    [00:01:14.256,042] <wrn> net_lwm2m_engine: Block already handled 0, expected 1
    [00:01:14.437,255] <dbg> net_lwm2m_engine: lwm2m_udp_receive: checking for reply from [MY_IP_ADDR]
    [00:01:14.437,286] <inf> net_lwm2m_rd_client: Update callback (code:4.4)
    [00:01:14.437,347] <err> net_lwm2m_rd_client: Failed with code 4.4 (Not Found). Retrying registration.
    [00:01:14.437,408] <dbg> net_lwm2m_engine: lwm2m_udp_receive: reply 0x2001dac4 handled and removed
    [00:01:14.453,979] <dbg> net_lwm2m_engine: lwm2m_udp_receive: checking for reply from [MY_IP_ADDR]
    [00:01:14.454,040] <dbg> net_lwm2m_engine: notify_message_reply_cb: NOTIFY ACK type:3 code:0.0 reply_token:'MY_TOKEN'
    [00:01:14.454,101] <dbg> net_lwm2m_engine: engine_remove_observer: observer 'MY_TOKEN' removed
    [00:01:14.454,193] <dbg> net_lwm2m_engine: lwm2m_udp_receive: reply 0x2001dadc handled and removed
    [00:01:14.498,687] <dbg> net_lwm2m_engine: lwm2m_engine_get: path:0/0/1, buf:0x20031927, buflen:1
    [00:01:14.498,779] <dbg> net_lwm2m_engine: lwm2m_engine_get: path:0/0/10, buf:0x2003199e, buflen:2
    [00:01:14.498,840] <inf> net_lwm2m_rd_client: RD Client started with endpoint '352656109477326' with client lifetime 10
    [00:01:14.498,931] <dbg> net_lwm2m_engine: lwm2m_parse_peerinfo: Parse url: coaps://MY_IP_ADDR:5688
    [00:01:14.861,389] <dbg> net_lwm2m_engine: lwm2m_engine_get: path:1/0/1, buf:0x2003194c, buflen:4
    [00:01:14.863,281] <dbg> net_lwm2m_rd_client: sm_send_registration: registration sent [MY_IP_ADDR]
    [00:01:14.962,707] <dbg> net_lwm2m_engine: lwm2m_udp_receive: checking for reply from [MY_IP_ADDR]
    [00:01:14.962,707] <dbg> net_lwm2m_rd_client: do_registration_reply_cb: Registration callback (code:2.1)
    [00:01:14.962,738] <inf> asset_tracker: Registration complete
    [00:01:14.962,799] <inf> net_lwm2m_rd_client: Registration Done (EP='7uNZdLxFue')
    

    Please note that in this log, it does look like the re-registration is supposed to happen after the delete is done, but that is not true, this is only the first time the the other thread has yielded and we can do the re-registration.

    Regards,

    Andreas Pettersson

Reply
  • Hi,

    I have tried multiple different delta files, but mostly tested with 1.3.1_to_1.3.1-FOTA-TEST and 1.3.1-FOTA-TEST_to_1.3.1. And of course, when using those I have previously flashed the 1.3.1 modem version.

    My application is based on NCS v1.8.0.

    I have enabled modem delta with the CONFIG_DFU_TARGET_MODEM_DELTA, and I am calling the appropriate functions. And as I said, modem delta firmware does work, just not all the time because of the lifetime issue.

    Yes I could of course provide some logs for this, but I wanted to pose a more general question. For example, my follow up question is quite important I feel, why is the erase done for the whole section at once and not block by block when needed?

    Excerpt from log:

    [00:01:05.049,011] <dbg> net_lwm2m_engine: lwm2m_udp_receive: checking for reply from [MY_IP_ADDR]
    [00:01:05.049,102] <dbg> net_lwm2m_engine: handle_request: No accept option given. Assume OMA TLV.
    [00:01:05.049,255] <dbg> net_lwm2m_engine: lwm2m_write_handler: BLOCK1: total:13772 current:0 last:0
    [00:01:05.049,346] <dbg> net_lwm2m_engine: lwm2m_notify_observer: NOTIFY EVENT 5/0/3
    [00:01:05.049,346] <dbg> net_lwm2m_obj_firmware: lwm2m_firmware_set_update_state: Update state = 1
    [00:01:05.049,377] <inf> app_lwm2m_firmware: DFU image type: Modem Delta
    [00:01:05.049,438] <inf> dfu_target_modem_delta: Modem DFU Socket created
    [00:01:05.049,743] <inf> dfu_target_modem_delta: Modem firmware version: 492e44a8-1e75-48dc-843e-ffdc314af1aa
    [00:01:05.050,262] <inf> dfu_target_modem_delta: Deleting firmware image, this can take several minutes
    [00:01:08.493,774] <inf> lte_helpers: RRC mode: Idle
    [00:01:12.996,154] <inf> battery_monitor: Charge complete
    [00:01:13.397,003] <dbg> net_lwm2m_engine: lwm2m_engine_set: path:4/0/2, value:0x2002e2d7, len:1
    [00:01:13.885,528] <inf> dfu_target_modem_delta: Modem FW delete complete
    [00:01:13.885,559] <inf> app_lwm2m_firmware: Modem Delta firmware download started.
    [00:01:13.997,772] <dbg> net_lwm2m_engine: lwm2m_engine_get: path:1/0/1, buf:0x2003194c, buflen:4
    [00:01:13.997,863] <dbg> net_lwm2m_rd_client: sm_send_registration: registration sent [MY_IP_ADDR]
    [00:01:13.999,633] <dbg> net_lwm2m_engine: generate_notify_message: [MANUAL] NOTIFY MSG START: 5/0/0(2) token:'MY_TOKEN' [MY_IP_ADDR] 73999
    [00:01:13.999,725] <dbg> net_lwm2m_engine: generate_notify_message: NOTIFY MSG: SENT
    [00:01:14.063,781] <inf> lte_helpers: RRC mode: Connected
    [00:01:14.255,889] <dbg> net_lwm2m_engine: lwm2m_udp_receive: checking for reply from [MY_IP_ADDR]
    [00:01:14.255,981] <dbg> net_lwm2m_engine: handle_request: No accept option given. Assume OMA TLV.
    [00:01:14.256,042] <wrn> net_lwm2m_engine: Block already handled 0, expected 1
    [00:01:14.437,255] <dbg> net_lwm2m_engine: lwm2m_udp_receive: checking for reply from [MY_IP_ADDR]
    [00:01:14.437,286] <inf> net_lwm2m_rd_client: Update callback (code:4.4)
    [00:01:14.437,347] <err> net_lwm2m_rd_client: Failed with code 4.4 (Not Found). Retrying registration.
    [00:01:14.437,408] <dbg> net_lwm2m_engine: lwm2m_udp_receive: reply 0x2001dac4 handled and removed
    [00:01:14.453,979] <dbg> net_lwm2m_engine: lwm2m_udp_receive: checking for reply from [MY_IP_ADDR]
    [00:01:14.454,040] <dbg> net_lwm2m_engine: notify_message_reply_cb: NOTIFY ACK type:3 code:0.0 reply_token:'MY_TOKEN'
    [00:01:14.454,101] <dbg> net_lwm2m_engine: engine_remove_observer: observer 'MY_TOKEN' removed
    [00:01:14.454,193] <dbg> net_lwm2m_engine: lwm2m_udp_receive: reply 0x2001dadc handled and removed
    [00:01:14.498,687] <dbg> net_lwm2m_engine: lwm2m_engine_get: path:0/0/1, buf:0x20031927, buflen:1
    [00:01:14.498,779] <dbg> net_lwm2m_engine: lwm2m_engine_get: path:0/0/10, buf:0x2003199e, buflen:2
    [00:01:14.498,840] <inf> net_lwm2m_rd_client: RD Client started with endpoint '352656109477326' with client lifetime 10
    [00:01:14.498,931] <dbg> net_lwm2m_engine: lwm2m_parse_peerinfo: Parse url: coaps://MY_IP_ADDR:5688
    [00:01:14.861,389] <dbg> net_lwm2m_engine: lwm2m_engine_get: path:1/0/1, buf:0x2003194c, buflen:4
    [00:01:14.863,281] <dbg> net_lwm2m_rd_client: sm_send_registration: registration sent [MY_IP_ADDR]
    [00:01:14.962,707] <dbg> net_lwm2m_engine: lwm2m_udp_receive: checking for reply from [MY_IP_ADDR]
    [00:01:14.962,707] <dbg> net_lwm2m_rd_client: do_registration_reply_cb: Registration callback (code:2.1)
    [00:01:14.962,738] <inf> asset_tracker: Registration complete
    [00:01:14.962,799] <inf> net_lwm2m_rd_client: Registration Done (EP='7uNZdLxFue')
    

    Please note that in this log, it does look like the re-registration is supposed to happen after the delete is done, but that is not true, this is only the first time the the other thread has yielded and we can do the re-registration.

    Regards,

    Andreas Pettersson

Children
  • Andreas, thanks for providing this information and for clarifying. 

    In regards to the lifetime, configured with CONFIG_LWM2M_ENGINE_DEFAULT_LIFETIME, would it help to shorten or extend the time? 

    I will forward this and your follow-up question to our developers. 

    Kind regards,
    Øyvind

  • Hi,

    Increasing the CONFIG_LWM2M_ENGINE_DEFAULT_LIFETIME would only be effective until the server sets it to some other value. And also, increasing the default lifetime or setting the lifetime from the server will both just decrease the risk of this happening, not remove it.

    Of course, a block by block erase wouldn't remove the risk entirely either, but it would be so unlikely that it shouldn't be a problem in most cases, at least if the CONFIG_LWM2M_SECONDS_TO_UPDATE_EARLY is set to a not too small value. The default of 6 seconds should be fine (if using block by block erase).

    Thanks for forwarding this to the developers.

    Regards,

    Andreas

  • Hello Andreas, 

    The developers inform that they are aware of the erase issue, and that they are currently working on a patch. There isn't much that we can do to reduce the modem erase time. Have you tried calling the keep-alive in its own thread?

     Still waiting for more feedback from our LwM2M expert.

    Kind regards,
    Øyvind

  • Hello,

    OK, thanks, good that they are working on a fix. I am assuming that won't be made available until 1.3.2 at best? And I am guessing it needs updates to the DFU lib as well then?

    Well, the problem is that the modem erase is done when I get the first block of firmware data, because that block is used to decode what type of firmware it is. This thread seems to be the lwm2m_recv thread, not sure exactly of the priorities of that, but it seems to have higher priorities than my user threads. And I cannot delegate the firmware block/modem erase to another user thread/app work task, because when I reply to the lwm2m cb, I could immediately get the next block of data, overwriting the first block before I have had time to handle it (this is when pushing firmware).

    I have firmware HTTP download also set up, and since that is done in a lower priority thread I have been able to do a  workaround, where I temporarily increase the lifetime, and manually force an update, then sleep/yield to make sure the lwm2m thread that handles updates is allowed to update, before I continue with the modem erase and further download. But this is not a pretty fix in my opinion.

    The keep-alive/update is already done in its own thread, but the issue lies in the thread priorities, and the write callback of the firmware taking too long (due to the modem erase).

    Regards,

    Andreas

  • Hi Öyvind,

    Any updates from the LwM2M expert?

    Regards,

    Andreas

Related