nRF53 802.15.4 unit timeout when attached to SWD debugger

Dear nRF support,

nRF5340 custom board

nRF NCS 2.5.x

Running into this very weird issue:

Program runs fine if I don't attach debugger to it

When I do "attach to running program" using Ozone debugger, while it's past unit, program runs and debugger works fine.

however, if I attach to it and reset it using the debugger or reset using some other way while debugger is attached (e.g. kernel reboot cold via shell) - then it fails to boot, and I can catch it with a breakpoint in sys_utils.c asser_post_action. Call stack points to a failure in comms with the 802.15.4 core , sterilization fault with error -5. I believe it is timing out.

Any idea how SWD debugger could affect this stage of init?

Parents

0 Sigurd Hellesvik 5 months ago

Hi,

Generally, debugging applications that use the radio can lead to faults. This is because radio applications have hard timing requirements and therefore stopping them and then continuing is not a good idea.

So, while this does not explain exactly what happens in your case, maybe it is enough to convince you to use logging to debug the application instead of debugging.

Regards,
Sigurd Hellesvik
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Farhang 5 months ago in reply to Sigurd Hellesvik

Thanks for the quick response.

#1Well, as you know logging gives only a small fraction of info, i.e. there's almost nothing that can give the same info as breakpoint (instructions or data breakpoints) specially when a crash/reset/assert is involved.

#2: I was on the impression that when I'm debugging the app core in nRF53, the netcore is still doing its thing and I'm not interrupting radio activity.

Also I am not pausing and resuming, I am simply just resetting the app while the debugger is attached. It catches a breakpoint in assert/sys_utils saying radio initi failed.

#3- what is the current recommendation of debugging software with nRF/nRF NCS? Is it still ozone?
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Reply

0 Farhang 5 months ago in reply to Sigurd Hellesvik

Thanks for the quick response.

#1Well, as you know logging gives only a small fraction of info, i.e. there's almost nothing that can give the same info as breakpoint (instructions or data breakpoints) specially when a crash/reset/assert is involved.

#2: I was on the impression that when I'm debugging the app core in nRF53, the netcore is still doing its thing and I'm not interrupting radio activity.

Also I am not pausing and resuming, I am simply just resetting the app while the debugger is attached. It catches a breakpoint in assert/sys_utils saying radio initi failed.

#3- what is the current recommendation of debugging software with nRF/nRF NCS? Is it still ozone?
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Children

0 Sigurd Hellesvik 5 months ago in reply to Farhang

Good that you push some back!

Farhang said:
#1Well, as you know logging gives only a small fraction of info, i.e. there's almost nothing that can give the same info as breakpoint (instructions or data breakpoints) specially when a crash/reset/assert is involved.

I often add more logs to where I want to see more. It is more of a hazzle than debugging but it works. However,

Farhang said:
#2: I was on the impression that when I'm debugging the app core in nRF53, the netcore is still doing its thing and I'm not interrupting radio activity.

Yes you are right. I missed that this was the 5340, even though you were very clear about it. Also the fact that it is the netcore that fails and not the appcore.

Farhang said:
#3- what is the current recommendation of debugging software with nRF/nRF NCS? Is it still ozone?

Either Ozone or debugging with the nRF Connect extension for VS Code.

All this being said, a colleague spotted a possible reason for this: Errata 161: RESET: Network core is not fully reset after Force-OFF.
We are not 100% sure this is the reason for the error you see, so I suggest that you test it in the following way.

Errata 161 should not trigger if UART is enabled in all images running on the device. Try to set CONFIG_SERIAL for both netcore, appcore and potential bootloaders. Then try again and see if the error still happens.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Farhang 5 months ago in reply to Sigurd Hellesvik

Thank you Sigurd, we will try that and get back to you.

1. Is it possible to debug the nRF5340 netcore at all?

Afterall, it is a core and subject to obscure bugs that need breakpoint style debugging. What's Nordic's way of debugging netcore?

2. Update regarding ticket above: the issue is not limited to debugging, we do get a timeout error on nrf5_init(). And expanding the turnout does not help. We've also narrowed it to some changes we had made, borrowed from entropy sample which was needed to get/read the firmware version out of nRF53 netcore.

We implemented our own "custom rpc" message based on that sample to get a custom word out /#define out of the netcore to know what version it is at.

Has this been addressed? In other words do you have a better suggestion on how to read the nRF53 netcore image version?

Ideally MCUMGR image management would be able to provide netcore FW version and hash sha256 in the same manner that it does for app core.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Farhang 4 months ago in reply to Farhang

Sigurd Hellesvik Updates:
1- we figured we can debug cpunet just like cpuapp using its own zephyr.elf.. i don't know why we thought we can't. my bad.

2- The issue we were facing looks to be caused by trying to use 2x ipc channels, one for BT/Openthread and the other for custom nrf_rpc messaging. Although we couldn't find any documentation saying we shouldn't have 2x ipcs - we combined into 1 and it seems to be stable now.

Suggestions for nRF:
1- If you could streamline the firmware update of the netcore the same as appcore (right now "image mgmt" : read firmware image version doesn't work for netcore) that'd be great

2- our understanding of the IPC/shared ram/spinel implementation is very vague... documentation could be improved such that us developers can actually understand what is going on.

IPC/openamp/rpmsg/nrf_rpc/shared ram/ble/openthread. There's a few options and samples use them in different ways. There's no unifying guide on what to choose if each scenario.
also there's a notion of zephyr,ipc_shm which seems is deprecated?
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Sigurd Hellesvik 4 months ago in reply to Farhang

Farhang said:
1- we figured we can debug cpunet just like cpuapp using its own zephyr.elf.. i don't know why we thought we can't. my bad.

2- The issue we were facing looks to be caused by trying to use 2x ipc channels, one for BT/Openthread and the other for custom nrf_rpc messaging. Although we couldn't find any documentation saying we shouldn't have 2x ipcs - we combined into 1 and it seems to be stable now.

Looks like you figured it out before I got back to you!
Sorry for the late answer and good job!

Farhang said:
Has this been addressed? In other words do you have a better suggestion on how to read the nRF53 netcore image version?

Ideally MCUMGR image management would be able to provide netcore FW version and hash sha256 in the same manner that it does for app core.

No, I do not have any better suggestions here. Ideally we should have image management for reading the network core version, but we do not.

Farhang said:
Suggestions for nRF:
1- If you could streamline the firmware update of the netcore the same as appcore (right now "image mgmt" : read firmware image version doesn't work for netcore) that'd be great

2- our understanding of the IPC/shared ram/spinel implementation is very vague... documentation could be improved such that us developers can actually understand what is going on.

Thanks for the feedback!
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel