SoftDevice Controller ASSERT: 53, 296

Hi!

Can you please help me identify what is causing this issue and/or provide more info about this:

[00:04:23.038,330] \033[1;31m<err> bt_sdc_hci_driver: SoftDevice Controller ASSERT: 53, 296\033[0m
[00:04:57.431,335] \033[1;31m<err> os: ***** HARD FAULT *****\033[0m
[00:04:57.431,335] \033[1;31m<err> os:   Fault escalation (see below)\033[0m
[00:04:57.431,365] \033[1;31m<err> os: ARCH_EXCEPT with reason 3
\033[0m
[00:04:57.431,365] \033[1;31m<err> os: r0/a1:  0x00000003  r1/a2:  0x00000000  r2/a3:  0x0001981b\033[0m
[00:04:57.431,396] \033[1;31m<err> os: r3/a4:  0x00000000 r12/ip:  0x20000ab0 r14/lr:  0xffffffff\033[0m
[00:04:57.431,396] \033[1;31m<err> os:  xpsr:  0x41000011\033[0m
[00:04:57.431,396] \033[1;31m<err> os: r4/v1:  0x20015c70  r5/v2:  0x0003370d  r6/v3:  0x0000000a\033[0m
[00:04:57.431,427] \033[1;31m<err> os: r7/v4:  0x20015c70  r8/v5:  0x20001944  r9/v6:  0x2000e204\033[0m
[00:04:57.431,427] \033[1;31m<err> os: r10/v7: 0x200008bc  r11/v8: 0x00000000    psp:  0x20015f40\033[0m
[00:04:57.431,457] \033[1;31m<err> os: EXC_RETURN: 0xfffffff1\033[0m
[00:04:57.431,457] \033[1;31m<err> os: Faulting instruction address (r15/pc): 0x00033744\033[0m
[00:04:57.431,488] \033[1;31m<err> os: >>> ZEPHYR FATAL ERROR 3: Kernel oops on CPU 0\033[0m
[00:04:57.431,518] \033[1;31m<err> os: Fault during interrupt handling
\033[0m
[00:04:57.431,549] \033[1;31m<err> os: Current thread: 0x2000ae30 (unknown)\033[0m
[00:05:09.972,747] \033[1;31m<err> fatal_error: Resetting system\033[0m

When this happens it is in the process of uploading a new fw thru mcumgr image group, upload command.

I haven't been able to fully verify this but I think this affected by updating to SDK 2.6.0 (from 2.4.0).
That code part is unchanged before and after the SDK update except that "zcbor_new_decode_state" had two new parameter which we have set to NULL and 0 (after comparing how some examples had changed between sdk versions).

BR,
Mårten

Parents
  • Hi Mårten,

    The assert is triggered by an "overstay" event in the SoftDevice controller scheduling related to BLE central operation. So something has prevetned that code from running on time. That something can be higher priority threads or interrupts. I cannot say anything more specific based on the assert, but perhaps disconnect any central links when initiating DFU?

  • Hi!

    Thanks for the reply.

    Maybe wasn't that clear but the assert is happening on the central that is sending the image to a peripheral.
    There is only one connection doing an update and possibly one more connection simultaneously (to different peripheral device) sending a smaller amount of data.

    I have however managed to reproduce the issue with just a single peripheral.

    Is there some "bluetooth" thread where the priority can be raised?

    I guess this is deeper down and therefore not connected to what is done in the callback passed to bt_dfu_smp_command?

    BR,
    Mårten

  • Hi Mårten,

    I am checking internally to see what we can understand with this information, but I do not have any updates yet unforunately.

    Is it possible for you to share a proejct that can reproduce this on a DK so that we can test on our side?

    Br,

    Einar

  • Hi!

    Great any suggestion or advice is appreciated.

    I was actually thinking about asking if that was a possibility....
    And am working on such a project (also as a way to try to reduce factors affecting it or eliminating mistakes on our side) but not quite finished, I will update here when it is ready.

    BR,

    Mårten

  • Hi!

    I have created a reproducer fw, so this should be run as a central.
    As I have removed all functionality that gets the fw to be sent during update I have made this compile the fw top transfer as ro-data.

    Two things needs local adaption the mac address of the peripherial to connect and update.
    And the path to the app_uppdate.bin file to send to the peripherial.
    These are located in main.c line 102 and bin_handler.c line 19.

    For the peripheral any client with support for dfu-smp and advertising in a compatible manner (coded phy) should be possible to use (and to fit in the central fw it should not be bigger than around 250k),

    I have also attached some logs showing the output I get at two different locations running this fw, as I have had some problems recreate the issue in of the locations but as you can see from the logs I now can reproduce it in both locations.

    BR,
    Mårten


    nrf-reproducer.zip

    rttlog-office.log

    rttlog-home.log

  • Hello Mårten,

    Einar is out of office for a few weeks to come, so I will have a look at your ticket as soon as possible, hopefully on Monday. Please understand that we are a bit short staffed due to summer holidays. 

    Best regards,

    Edvin

  • Hi!

    Great!

    Yes, I understand, I will myself be on vacation from the 18th.


    BR,
    Mårten

Reply Children
  • Hello Mårten,

    Let me get this straight. This is the central application that is supposed to upload new FW to a peripheral, right? So I would need a peripheral device with an MCUBOOT bootloader running with this? Does the standard smp_svr sample work? Or does it need to be something special?

    BR,
    Edvin

  • Hi!

    yes that is true.

    We set the advertise parameters as follows and call bt_le_ext_adv_create:
    static const uint32_t ADV_OPTS = (BT_LE_ADV_OPT_CONNECTABLE | BT_LE_ADV_OPT_EXT_ADV |
                                      BT_LE_ADV_OPT_CODED | BT_LE_ADV_OPT_USE_IDENTITY |
                                      BT_LE_ADV_OPT_ONE_TIME);

    static struct bt_le_adv_param adv_param = BT_LE_ADV_PARAM_INIT(ADV_OPTS,
                                                                   BT_GAP_ADV_FAST_INT_MIN_2,
                                                                   BT_GAP_ADV_FAST_INT_MAX_2,
                                                                   NULL);


    And when a connection is made they are updated using bt_le_ext_adv_update_param to the following:
    static const uint32_t ADV_DURING_CON_OPTS = (BT_LE_ADV_OPT_EXT_ADV | BT_LE_ADV_OPT_CODED |
                                                 BT_LE_ADV_OPT_USE_IDENTITY);

    static struct bt_le_adv_param adv_conn_param = BT_LE_ADV_PARAM_INIT(ADV_DURING_CON_OPTS,
                                                                        BT_GAP_ADV_FAST_INT_MIN_2,
                                                                        BT_GAP_ADV_FAST_INT_MAX_2,
                                                                        NULL);

    But other than that I think the smp_svr example should work.
    I will look into providing you with the peripherial side as well.

    BR,

    Mårten

  • Hi again!

    I have now created a fw for the peripheral side as well.
    Also suitable to link inas the fw binary to upload in the central build

    I have also tested that I can reproduce the issue using these firmwares

    At the moment the central is configured with RTT loggin so the crash dump can be viewed.

    I did not have a dk available to test these on but by disabling various external signals as can be seen in the overlay file I was able to run it on our custom hardware building for nrf52840dk_nrd52840.


    BR,
    Mårten

    nrf-reproducer-peripheral.zip

  • Thank you! That is very helpful, and speeds things up a bit! I am sorry for getting back so late. I will try to run it tomorrow, to replicate the issue. Once that is done, it is a lot easier to either pinpoint, or forward the question to the developers, if they can reproduce it themselves.

    You will hear from me tomorrow.

    Best regards,

    Edvin

  • Hi!

    Any updates on this issue?
    Have you been able to reproduce it?

    BR,

    Mårten

Related