This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

lwm2m_engine notifications causing modem send to lock up

We're developing a product using NB-IoT, lwm2m and the nordic nrf9160 mcu.

What we're observing ist, that using the latest nordic connect sdk, multiple lwm2m ressource notifications cause the modem socket library to indefinitely block on send() calls.

Steps to reproduce:

  1. download and extract attached tarball
  2. west init -l app
  3. west update
  4. west build -t menuconfig -b nrf9160dk_nrf9160ns app
  5. set CONFIG_SERVER_URL to a valid lwm2m server (server must observe reported ressources)
  6. west build && west flash

The fw will create a lwm2m connection, report an object with 26 ressources and notify them every 5s.

This stops after 1-2 times. The issue is the send() call in lwm2m_engine.c:1012, which after a few notificatios will block indefinely.

In the tarball you will also find the uart log output (screenlog.0) and a modem trace (trace-2021-04-12T10-44-32.273Z.bin) of the issue.
6443.lwm2m_modem_lockup.tar.gz

Parents
  • reverting ecebb6fdb34a213af294061529ef352c4ee73927 from ncs v1.5.0 also fixes the issue in our actual fw.

  • Okay, apparently it only significantly reduces the chance of this happening. When upping the number of notications to 75/5s it still happens. Still seems to be ~good with 1.4.2 (I don't see the lock up, but the fw crashes after 1-2 minutes. 1.5.0 locks up way earlier. Bisecting again.

  • ncs commit 89427822a05a5fae1ed578fffc45145075dae8b0 "works", though it regularly produces this error:

    [00:00:50.957,580] <err> net_lwm2m_engine: Poll reported a socket error, 09.
    [00:00:50.957,580] <err> net_lwm2m_rd_client: RD Client socket error: 5

    But at least it doesn't lock up and recovers.

    ncs commit 7971211ea2d05a1a1bf5ffc5a4f3c4a728520f56 locks up, same as v1.5.0.
    There's commit  0ac4f23e4f501a310fd9c2d1f46b1bb419a028d5
    in betrween the two that updates the zephyr sdk from 3420cde0e37be536cda67f293784dcc1c6a92001 to 21046b8cdb4eac989c6b17bb21ffc8196be3d5e4, which I suspect is the actual culprit, but doesn't build.

    Unfortunately I've not yet figured out how to bisect properly through two levels of west manifests... What I can say though is that cherry-picking all lwm2m-related changes onto 3420cde0e37be536cda67f293784dcc1c6a92001 does not cause the lock up to happen, so the issue seems to be somewhere else.

Reply
  • ncs commit 89427822a05a5fae1ed578fffc45145075dae8b0 "works", though it regularly produces this error:

    [00:00:50.957,580] <err> net_lwm2m_engine: Poll reported a socket error, 09.
    [00:00:50.957,580] <err> net_lwm2m_rd_client: RD Client socket error: 5

    But at least it doesn't lock up and recovers.

    ncs commit 7971211ea2d05a1a1bf5ffc5a4f3c4a728520f56 locks up, same as v1.5.0.
    There's commit  0ac4f23e4f501a310fd9c2d1f46b1bb419a028d5
    in betrween the two that updates the zephyr sdk from 3420cde0e37be536cda67f293784dcc1c6a92001 to 21046b8cdb4eac989c6b17bb21ffc8196be3d5e4, which I suspect is the actual culprit, but doesn't build.

    Unfortunately I've not yet figured out how to bisect properly through two levels of west manifests... What I can say though is that cherry-picking all lwm2m-related changes onto 3420cde0e37be536cda67f293784dcc1c6a92001 does not cause the lock up to happen, so the issue seems to be somewhere else.

Children
Related