SoftDevice Controller ASSERT: 53, 296

Hi!

Can you please help me identify what is causing this issue and/or provide more info about this:

[00:04:23.038,330] \033[1;31m<err> bt_sdc_hci_driver: SoftDevice Controller ASSERT: 53, 296\033[0m
[00:04:57.431,335] \033[1;31m<err> os: ***** HARD FAULT *****\033[0m
[00:04:57.431,335] \033[1;31m<err> os: Fault escalation (see below)\033[0m
[00:04:57.431,365] \033[1;31m<err> os: ARCH_EXCEPT with reason 3
\033[0m
[00:04:57.431,365] \033[1;31m<err> os: r0/a1: 0x00000003 r1/a2: 0x00000000 r2/a3: 0x0001981b\033[0m
[00:04:57.431,396] \033[1;31m<err> os: r3/a4: 0x00000000 r12/ip: 0x20000ab0 r14/lr: 0xffffffff\033[0m
[00:04:57.431,396] \033[1;31m<err> os: xpsr: 0x41000011\033[0m
[00:04:57.431,396] \033[1;31m<err> os: r4/v1: 0x20015c70 r5/v2: 0x0003370d r6/v3: 0x0000000a\033[0m
[00:04:57.431,427] \033[1;31m<err> os: r7/v4: 0x20015c70 r8/v5: 0x20001944 r9/v6: 0x2000e204\033[0m
[00:04:57.431,427] \033[1;31m<err> os: r10/v7: 0x200008bc r11/v8: 0x00000000 psp: 0x20015f40\033[0m
[00:04:57.431,457] \033[1;31m<err> os: EXC_RETURN: 0xfffffff1\033[0m
[00:04:57.431,457] \033[1;31m<err> os: Faulting instruction address (r15/pc): 0x00033744\033[0m
[00:04:57.431,488] \033[1;31m<err> os: >>> ZEPHYR FATAL ERROR 3: Kernel oops on CPU 0\033[0m
[00:04:57.431,518] \033[1;31m<err> os: Fault during interrupt handling
\033[0m
[00:04:57.431,549] \033[1;31m<err> os: Current thread: 0x2000ae30 (unknown)\033[0m
[00:05:09.972,747] \033[1;31m<err> fatal_error: Resetting system\033[0m

When this happens it is in the process of uploading a new fw thru mcumgr image group, upload command.

I haven't been able to fully verify this but I think this affected by updating to SDK 2.6.0 (from 2.4.0).
That code part is unchanged before and after the SDK update except that "zcbor_new_decode_state" had two new parameter which we have set to NULL and 0 (after comparing how some examples had changed between sdk versions).

BR,
Mårten

Parents

0 Einar Thorsrud over 1 year ago

Hi Mårten,

The assert is triggered by an "overstay" event in the SoftDevice controller scheduling related to BLE central operation. So something has prevetned that code from running on time. That something can be higher priority threads or interrupts. I cannot say anything more specific based on the assert, but perhaps disconnect any central links when initiating DFU?
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Maos over 1 year ago in reply to Einar Thorsrud

Hi!

Thanks for the reply.

Maybe wasn't that clear but the assert is happening on the central that is sending the image to a peripheral.
There is only one connection doing an update and possibly one more connection simultaneously (to different peripheral device) sending a smaller amount of data.

I have however managed to reproduce the issue with just a single peripheral.

Is there some "bluetooth" thread where the priority can be raised?

I guess this is deeper down and therefore not connected to what is done in the callback passed to bt_dfu_smp_command?

BR,
Mårten
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Maos over 1 year ago in reply to Edvin

Hi!

Any updates on this issue?
Have you been able to reproduce it?

BR,

Mårten
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Edvin 11 months ago in reply to Edvin

Hello Mårten. I have had the device running for a while now, but I have not seen any crashes yet. I then remembered that you mentioned the part about the RSSI. It is currently running in the office (and I am at home), but I can try to crank down the peripheral's TX power, to see if that does the trick. I will be out of office from next week, but I will write some notes for the person taking over. (I am sorry for the inconvenience, but I hope you understand it has to be that way in the summers). Hopefully the next person can have a flying start at this issue.

BR,

Edvin
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Edvin 11 months ago in reply to Edvin

Yupp! There you go:

I will report it to our softdevice controller team, along with instructions on how to use it. This means that the next person handling this ticket will only have to relay the information from the softdevice team.

Best regards,

Edvin
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 AHaug 11 months ago in reply to Edvin

Hi,

Just a status update from us here: Verified as a bug and we're working on getting a fix to NCS main soon. We'll update you when the PR is available for monitoring

Could you try to remove the data length updates from the project? This should work around the issue.

Kind regards,
Andreas
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Maos 11 months ago in reply to AHaug

Hi!

Sorry for not responding sooner.

Thank you for confirming so we know that that there is nothing we need to change.

Unfortunately we had to go back to 2.5.3 where we don't see this issue so I can't test the workaround.

BR,
Mårten
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Reply

0 Maos 11 months ago in reply to AHaug

Hi!

Sorry for not responding sooner.

Thank you for confirming so we know that that there is nothing we need to change.

Unfortunately we had to go back to 2.5.3 where we don't see this issue so I can't test the workaround.

BR,
Mårten
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Children

0 AHaug 11 months ago in reply to Maos

Hi Mårten,

No worries, glad you found a middle ground that works for you. The patch will nonetheless be added to main shortly

Kind regards,
Andras
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 mertzt 6 months ago in reply to AHaug

Has this been addressed?

I am encountering the same assert on my project using an NRF5340 and NCS v2.6.1.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 AHaug 6 months ago in reply to mertzt

Hi,

The fix is in NCS v2.8.0 and was determined previously to not be back ported to v2.6.1.

If you need it back ported, please reach out to your regional sales manager and request it through them

Kind regards,
Andreas
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 mertzt 6 months ago in reply to AHaug
I've reached out to our local sales manager to request the backport. In the meantime, we are trying to gather as much information as possible.

Do you have any additional information about what commits address the issue?

Are there any known workarounds?

Thanks,
Tim
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 AHaug 6 months ago in reply to mertzt

Hi Tim,

I can verify that I've seen the discussion this has raised. I will monitor it.

Since the fix is within non-open source code material, I can't share anything about the commit here. The best approach willl thus be to keep in touch with the sales manager and/or raise your own private case (with relevant company info and request info)

Kind regards,
Andreas
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel