random ZBOSS fatal error occurred when operating after a while

Question

I am running a nrf5340 with zigbee as a coordinator, I observed after a while that battery life of my end devices don't last as expected. Then after debugging I found that a random ZBOSS fatal error causes the coordinator to restart after a while with no other logs, just that single message and then restart. this causes all end devices to redo some on-connect messages when they rediscover the coordinator and that drains the battery. unfortunately, I did not find any way to enable more debugging logs, all there is, is the trace logs which I cannot read and I have to send to you, so you can decode and maybe get something from. Is there a strait forward way to look at the potential reason for this fatal error instead of throwing assumptions until it stops happening? I have no idea on how to begin debugging this because I don't see an obvious way to review what is happening inside the zboss stack.

Belalshinnawey · Accepted Answer

That was already enabled, still no output. I found the reason and fixed it by disabling all the zboss related functions and enabling them one by one, which is not the fastest way to debug an issue. I found a buffer allocation function that was returning a zigbee invalid buffer that was not valid, and was not raising an error and was passing all If statement checks of the buffer. The problem was it was not failing the first time it happened and it just kept going on until zigbee crashed somewhere else becuase of the invalid memory address issue. I fixed it by making this specific buffer allocation deferred instead of immediate. I still don't know why this specific buffer is causing an issue. If I can go back in time, I would just pick a different zigbee backend instead of nrf.

random ZBOSS fatal error occurred when operating after a while

Top Replies