This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

NFR_ERROR_NO_MEM

After updating from THREAD SDK 2.0 to SDK 3.1 I am getting NRF_ERROR_NO_MEM in nrf_sdh.c.

The project is based from the multiprotocol BLE thread dynamic example.

The code will run for hours then after a reboot during initialization when calling the otThreadSetEnabled function;

I get the message from line 391 in nrf_sdh.c  NRF_ERROR_NO_MEM error; 

Looking in the app_sched_event_put function it seems that the event_index value is not being changed from the default value.

I have increased the SCHED_QUEUE_SIZE define from 32 to 48 but the issue is still happening.

The BLE DFU is enabled in this code.

Once this happens the radio is bricked.

What would cause the app queue to be full so early in startup

Parents
  • Hi Jay,

    I am not completely sure what the issue could be here so I consulted with our Thread team. They would like to know the following:

    1. Is the reboot triggered by the user? What do you exactly mean by saying reboot? Why does a reboot happens after some hours?

    2. What do you do to recover the board after reboot? Which steps do you take to make it run for some hours again?

    3. Maybe the scheduler queue is full? Could you try to find out what is in it?

    Best regards,

    Marjeris

Reply
  • Hi Jay,

    I am not completely sure what the issue could be here so I consulted with our Thread team. They would like to know the following:

    1. Is the reboot triggered by the user? What do you exactly mean by saying reboot? Why does a reboot happens after some hours?

    2. What do you do to recover the board after reboot? Which steps do you take to make it run for some hours again?

    3. Maybe the scheduler queue is full? Could you try to find out what is in it?

    Best regards,

    Marjeris

Children
  • The reboot is not caused by user.

    We have a network of 500 thread nodes for test units here in our office, after a DFU upgrade from the older code to the new code from SDK 3.1 the units will run usually for a few hours, even over night before we lose connectivity to the units. We have I2C connectivity to all units for out of band monitoring. Also not all units do it. Once we lose the connectivity we have several LEDs on the modules we can check for status. From these we can tell the unit is constantly rebooting it self. 

    I have connected the debugger to the unit and watched the boot. That is how I know the unit is failing during boot when calling the openthread enable function. The logger is reporting the NRF_ERROR_NO_MEM error when trying to access the scheduler. 

    Once a unit is in this state the unit is bricked and has to be erased and reprogrammed.

    We are testing this with different size network to see if it is node count dependent. I will let you know the results.

  • Hi Jay,

    I passed this information to the Thread team but I am still waiting for their feedback. Let me know if you find any additional information after your testing is done.

    BR,

    Marjeris

  • If we have 71 or less nodes in the network this does not happen. We have tried two different networks of 64 that worked for a weekend with no problems. We increased the networks to 80 nodes then ran them overnight. In each case 9 nodes crashed. The networks have since run several days without anymore failing.

  • Hi Jay,

    Where are you calling the app_sched_event_put function in your code?

    What was the scheduler queue size when you tested with 64 nodes? Did you tried increasing SCHED_QUEUE_SIZE with 90 nodes?

    Best regards,

    Marjeris

  • In my code I never call app_sched_event_put directly. It is called from either functions that are part of the Nordic SDK or the OPENTHREAD. In the code I have the TWIS, BLUETOOTH and OPENTHREAD running. Before updating to SDK 3.1 we have had a network of 256 nodes running with this same code for weeks. After the nodes with the SDK 3.1 code  crash I have connected a debugger to a crashed node. It is calling the app_sched_event_put function when the otThreadSetEnabled function is called. It never returns from trying to start the OPENTHREAD during startup. I have increase the SCHED_QUEUE_SIZE from the pre SDK 3.1 code. It still happens. 

    The real question is what is happening to cause the crashing that bricks the radio in the first place. 

Related