SOFTDEVICE: ASSERTION FAILED

Dear Nordic Engineer,
Hello!
We are currently in the mass production of our products. The chip model we are using is nRF52805, and the SDK version is nRF5_SDK_15.3.0_59ac345.
Among the products in production, one set of products is experiencing a continuous reset phenomenon. In order to verify and troubleshoot this issue, we used the original ble_app_uart project in the SDK and made the following modification tests:

1. Added error log output in the app_error_fault_handler function. The specific situation is shown in the attached figure.

2. Modified the protocol stack clock configuration, and the relevant content is shown in the attached figure.

3. Compiled the modified project, burned the hex file into the chip, and the operation result is shown in the attached figure.

Since our products are in mass production, this problem has a relatively large impact on the production progress. We hope that your engineers can assist us in troubleshooting and solving this problem. Thank you very much!
If you need further information, please feel free to contact us.
Best regards!
Parents
  • Hello,

    The error isn't populated when the id is a softdevice assert. The same goes for the file name and line number.

    What SoftDevice are you using? S112? S132?

    Are you using any other protocols in addition to BLE? If so, which ones? Or do you use the Timeslot API?

    2. Modified the protocol stack clock configuration, and the relevant content is shown in the attached figure.

    Was this the change that you did that lead to the softdevice assert? Are you able to detect during what API the assert occurs?

    Best regards,

    Edvin

  • Firstly, we used the protocol stack of S112. In order to better verify the issue, we directly used the ble_app_uart demo of the SDK, and the modification points have been explained.

    Secondly, regarding the clock configuration modification, we verified that it was ok. We have produced many products, but only one set of products had the above-mentioned issues.

    Finally, we are not sure which API caused the assertion error, as it occurred in the protocol stack.

    If you have anything you want to verify, please let me know. Thank you!

  • Will said:
    Since there is no high-speed crystal oscillator on my custom board, I configured the internal RC as the clock source during development.

    The SYNTH is using the high speed XTAL (32MHz) as the source for the LFCLK (32kHz). You probably do have a high speed xtal, since this is not optional, and very much needed to do anything BLE related (it is used to generate the radio signals). 

    You should not use SYNTH in production, since this will basically make your HFCLK run at all times, which costs a lot of current. So this is mostly for debugging when we suspect that the LFLCK has some issues.

    Will said:
    In a single step debugging environment, the program will immediately generate an assertion error when it runs to sd-ble_gap-adv_start.

    Yes, it is not possible to step through the code when the softdevice is doing any BLE related activities, such as advertising. It is possible to set breakpoints, but after hitting a breakpoint, you need to restart the application. The reason is that when you stop, and click resume, or take one step, the softdevice will have missed a lot of time critical events, and it will assert.

    Best regards,

    Edvin

  • Hi Edvin,


    I have verified that NRF_CLOCK_LF_SRC_SYNTH also experiences a reset phenomenon. At the same time, I checked the chip datasheet and found that the clock source of RTC is LFCLK. Can we verify whether the internal RC of the chip is normal through app_timer or RTC drivers?

    Best regards,

    Will


  • I will forward your ticke to our QA Department. This sounds like an anomaly in the chip. As you said, you have several hundred thousand chips, and one that is failing. 

    Please note that we are very short staffed due to public holidays here in Norway. Please expect some delay in our responses until Tuesday.

    If you want to look around in the meantime, you could start all your RTC instances, and set up an interrupt on one of them. When this triggers, you can use PPI to store the RTCs' counter value in one of their CC channels, and compare them. Also, you can try  to add the HFCLK starting one of the TIMER instances, and see if the RTCs are within the expected tolerance. 

    But I am not sure what you can expect to see, if anything. It may also be that the error is marked if you start the HFCLK and have it continuously running, because the chip will not enter it's low power mode. Logging may affect the behavior, and again, we are not really sure what we are looking for. 

    Best regards,

    Edvin

  • Does Nordic have relevant documentation describing the sd_ble_gap_adv_start interface? Can we obtain some assertion information from the documentation? Or, is it possible to output a debug version of the softdevice?

    If you have a faster way to verify the issue, please let me know, thank you!

    Best regards,

    Will

  • I read from an internal ticket that this may be triggered if you have an interrupt with a priority higher than PRIO_LOW, and spend a significant amount of time in it. However, since I believe all your devices are running the same application, and the issue is only present in one device, I don't think that would be the case. I see it popped back in the queue. Probably because we had a public holiday here in Norway, and unhandled tickets are popped back in the queue. I will assign it to our QA once more.

    Sorry for the delay. 

    Best regards,

    Edvin

Reply
  • I read from an internal ticket that this may be triggered if you have an interrupt with a priority higher than PRIO_LOW, and spend a significant amount of time in it. However, since I believe all your devices are running the same application, and the issue is only present in one device, I don't think that would be the case. I see it popped back in the queue. Probably because we had a public holiday here in Norway, and unhandled tickets are popped back in the queue. I will assign it to our QA once more.

    Sorry for the delay. 

    Best regards,

    Edvin

Children
No Data
Related