Usage fault in NCS v2.7.0

Hi,

I have updated the device application firmware from nRF Connect SDK v2.6.1 to v2.7.0. 

The firmware was working fine in v2.6.1 without issue. But the same firmware is running into hard fault for SDK v2.7.0

In log, I can see the USAGE FAULT error. It shows the  Illegal use of the EPSR and sometimes Unaligned memory access. Following is the error detail,

*** Booting My Application v1.3.0-beta-98d26a300f1a ***
*** Using nRF Connect SDK v2.7.0-5cb85570ca43 ***
*** Using Zephyr OS v3.6.99-100befc70c74 ***
[00:00:00.008,941] <inf> fs_nvs: 2 Sectors of 4096 bytes
[00:00:00.009,307] <inf> fs_nvs: alloc wra: 0, fd0
[00:00:00.009,674] <inf> fs_nvs: data wra: 0, 1c
[00:00:00.010,131] <inf> bt_sdc_hci_driver: SoftDevice Controller build revision: 
                                            d6 da c7 ae 08 db 72 6f  2a a3 26 49 2a 4d a8 b3 |......ro *.&I*M..
                                            98 0e 07 7f                                      |....             
[00:00:00.014,221] <inf> bt_hci_core: HW Platform: Nordic Semiconductor (0x0002)
[00:00:00.014,648] <inf> bt_hci_core: HW Variant: nRF52x (0x0002)
[00:00:00.015,045] <inf> bt_hci_core: Firmware: Standard Bluetooth controller (0x00) Version 214.51162 Build 1926957230
[00:00:00.016,143] <inf> bt_hci_core: No ID address. App must call se[00:00:21.809,936] <wrn> bt_hci_core: opcode 0x200a status 0x0d
[00:00:33.065,460] <err> os: ***** USAGE FAULT *****
[00:00:33.065,795] <err> os:   Illegal use of the EPSR
[00:00:33.066,162] <err> os: r0/a1:  0x20004040  r1/a2:  0x000013d5  r2/a3:  0x0000523f
[00:00:33.066,680] <err> os: r3/a4:  0x000013c1 r12/ip:  0x000013c1 r14/lr:  0x000013c1
[00:00:33.067,169] <err> os:  xpsr:  0x00000000
[00:00:33.067,504] <err> os: Faulting instruction address (r15/pc): 0x000013c1
[00:00:33.067,962] <err> os: >>> ZEPHYR FATAL ERROR 35: Unknown error on CPU 0
[00:00:33.068,420] <err> os: Current thread: 0x2000bcb8 (unknown)
[00:00:33.068,817] <err> os: Halting system
*** Booting My Application v1.3.0-beta-98d26a300f1a ***
*** Using nRF Connect SDK v2.7.0-5cb85570ca43 ***
*** Using Zephyr OS v3.6.99-100befc70c74 ***
[00:00:00.008,941] <inf> fs_nvs: 2 Sectors of 4096 bytes
[00:00:00.009,307] <inf> fs_nvs: alloc wra: 0, fd0
[00:00:00.009,674] <inf> fs_nvs: data wra: 0, 1c
[00:00:00.010,131] <inf> bt_sdc_hci_driver: SoftDevice Controller build revision: 
                                            d6 da c7 ae 08 db 72 6f  2a a3 26 49 2a 4d a8 b3 |......ro *.&I*M..
                                            98 0e 07 7f                                      |....             
[00:00:00.014,221] <inf> bt_hci_core: HW Platform: Nordic Semiconductor (0x0002)
[00:00:00.014,648] <inf> bt_hci_core: HW Variant: nRF52x (0x0002)
[00:00:00.015,075] <inf> bt_hci_core: Firmware: Standard Bluetooth controller (0x00) Version 214.51162 Build 1926957230
[00:00:00.016,143] <inf> bt_hci_core: No ID address. App must call se[00:00:11.894,958] <wrn> bt_hci_core: opcode 0x200a status 0x0d
[00:00:34.955,596] <err> os: ***** USAGE FAULT *****
[00:00:34.955,932] <err> os:   Unaligned memory access
[00:00:34.956,298] <err> os: r0/a1:  0x2000bbad  r1/a2:  0x0000002c  r2/a3:  0x200308ec
[00:00:34.956,817] <err> os: r3/a4:  0x00000040 r12/ip:  0x00000000 r14/lr:  0x0006e0e5
[00:00:34.957,305] <err> os:  xpsr:  0x21000000
[00:00:34.957,672] <err> os: Faulting instruction address (r15/pc): 0x0006e0ce
[00:00:34.958,099] <err> os: >>> ZEPHYR FATAL ERROR 31: Unknown error on CPU 0
[00:00:34.958,557] <err> os: Current thread: 0x20012a30 (unknown)
[00:00:34.958,984] [1;31m<err> os: Halting system

The device is working as a BLE extender and has a central and peripheral role. (1 peripheral and multiple central).

I observed an error occur in the device when it is already connected (via peripheral) and it initiates a connection to another peripheral (using central).

The extender device has an nRF52840 chip and uses nRF Connect SDK v2.7.0.

Would you please help me to debug and fix this issue?

Let me know any other information require.

Thanks,

Narendra.

  • Hi

    The first thing you could try is to increase the CONFIG_MAIN_STACK_SIZE for example to twice the size, as that has been the issue in cases like this before. If that doesn't work out, make sure you followed the migration guide from NCS 2.6.1 to 2.7.0 here as some changes are required when moving from the NCS 2.6.x base to 2.7.x. 

    https://docs.nordicsemi.com/bundle/ncs-latest/page/nrf/releases_and_maturity/migration/migration_guide_2.7.html 

    Best regards,

    Simon

  • Hi 

    I think main stack size should not be the issue. I have run the thread analyzer and checked. The main thread occupies only 42% of the stack. Following is the log of the thread analyzer. 

    [00:03:16.586,669] <inf> thread_analyzer: Thread analyze:
    [00:03:16.587,188] <inf> thread_analyzer:  BT CTLR ECDH        : STACK: unused 792 usage 168 / 960 (17 %); CPU: 0 %
    [00:03:16.587,829] <inf> thread_analyzer:       : Total CPU cycles used: 0
    [00:03:16.589,233] <inf> thread_analyzer:  BT RX WQ            : STACK: unused 6088 usage 2104 / 8192 (25 %); CPU: 0 %
    [00:03:16.589,874] <inf> thread_analyzer:       : Total CPU cycles used: 4859
    [00:03:16.590,393] <inf> thread_analyzer:  BT TX               : STACK: unused 224 usage 1824 / 2048 (89 %); CPU: 0 %
    [00:03:16.591,033] <inf> thread_analyzer:       : Total CPU cycles used: 2516
    [00:03:16.591,522] <inf> thread_analyzer:  thread_analyzer     : STACK: unused 136 usage 888 / 1024 (86 %); CPU: 0 %
    [00:03:16.592,163] <inf> thread_analyzer:       : Total CPU cycles used: 49158
    [00:03:16.592,926] <inf> thread_analyzer:  mcumgr smp          : STACK: unused 1888 usage 160 / 2048 (7 %); CPU: 0 %
    [00:03:16.593,566] <inf> thread_analyzer:       : Total CPU cycles used: 0
    [00:03:16.599,060] <inf> thread_analyzer:  usbd_workq          : STACK: unused 200 usage 824 / 1024 (80 %); CPU: 0 %
    [00:03:16.599,700] <inf> thread_analyzer:       : Total CPU cycles used: 433
    [00:03:16.600,341] <inf> thread_analyzer:  BT LW WQ            : STACK: unused 1184 usage 160 / 1344 (11 %); CPU: 0 %
    [00:03:16.605,987] <inf> thread_analyzer:       : Total CPU cycles used: 1
    [00:03:16.607,452] <inf> thread_analyzer:  sysworkq            : STACK: unused 6312 usage 1880 / 8192 (22 %); CPU: 0 %
    [00:03:16.608,062] <inf> thread_analyzer:       : Total CPU cycles used: 1590
    [00:03:16.608,581] <inf> thread_analyzer:  MPSL Work           : STACK: unused 184 usage 840 / 1024 (82 %); CPU: 0 %
    [00:03:16.609,222] <inf> thread_analyzer:       : Total CPU cycles used: 6916
    [00:03:16.610,107] <inf> thread_analyzer:  usbworkq            : STACK: unused 2620 usage 5572 / 8192 (68 %); CPU: 0 %
    [00:03:16.610,748] <inf> thread_analyzer:       : Total CPU cycles used: 7486
    [00:03:16.616,271] <inf> thread_analyzer:  idle                : STACK: unused 272 usage 48 / 320 (15 %); CPU: 98 %
    [00:03:16.616,882] <inf> thread_analyzer:       : Total CPU cycles used: 6319381
    [00:03:16.618,103] <inf> thread_analyzer:  main                : STACK: unused 4684 usage 3508 / 8192 (42 %); CPU: 0 %
    [00:03:16.618,743] <inf> thread_analyzer:       : Total CPU cycles used: 44629
    [00:03:16.624,450] <inf> thread_analyzer:  ISR0                : STACK: unused 1120 usage 928 / 2048 (45 %)

    I have looked at the migration guide. I have not changed board version (hwv1 to hwv2) and multi-image parent and child to system build. Could it be the issue?

    Thanks,

    Narendra.

  • It would still be nice to know the context of the thread that is triggering this Fault exception. Right now it shows

    Current thread: 0x2000bcb8 (unknown)

    Can you add these in your prj.conf, prestine build and run your applicaiton to provide some more detailed fault into here? That was, we can know what is the context of the fault, that will have impact on our debugging direction.

    CONFIG_ASSERT_VERBOSE=y
    CONFIG_ASSERT_NO_COND_INFO=n
    CONFIG_ASSERT_NO_MSG_INFO=n
    CONFIG_RESET_ON_FATAL_ERROR=n
    CONFIG_THREAD_NAME=y
    CONFIG_STACK_SENTINEL=y

  • Please find below the updated debug log. 

    *** Booting My Application v1.3.0-beta-41ee9b2d1389 ***
    *** Using nRF Connect SDK v2.7.0-5cb85570ca43 ***
    *** Using Zephyr OS v3.6.99-100befc70c74 ***
    [00:00:00.008,239] <inf> fs_nvs: 2 Sectors of 4096 bytes
    [00:00:00.008,239] <inf> fs_nvs: alloc wra: 0, fd0
    [00:00:00.008,270] <inf> fs_nvs: data wra: 0, 1c
    [00:00:00.008,422] <inf> bt_sdc_hci_driver: SoftDevice Controller build revision: 
                                                d6 da c7 ae 08 db 72 6f  2a a3 26 49 2a 4d a8 b3 |......ro *.&I*M..
                                                98 0e 07 7f                                      |....             
    [00:00:00.010,711] <inf> bt_hci_core: HW Platform: Nordic Semiconductor (0x0002)
    [00:00:00.010,742] <inf> bt_hci_core: HW Variant: nRF52x (0x0002)
    [00:00:00.010,772] <inf> bt_hci_core: Firmware: Standard Bluetooth controller (0x00) Version 214.51162 Build 1926957230
    [00:00:00.011,230] <inf> bt_hci_core: No ID address. App mus0m
    [0m
    [00:00:06.044,769] <wrn> bt_hci_core: opcode 0x200a status 0x0d
    [00:00:07.264,739] <wrn> bt_hci_core: opcode 0x200a status 0x0d
    [00:01:02.571,838] <wrn> bt_conn: conn 0x20004688 failed to establish. RF noise?
    [00:01:02.572,296] <wrn> bt_hci_core: opcode 0x2022 status 0x02
    [00:01:18.508,789] <err> os: ***** USAGE FAULT *****
    [00:01:18.508,819] <err> os:   Unaligned memory accessm
    [00:01:18.508,819] <err> os: r0/a1:  0x2000beb5  r1/a2:  0x0000002c  r2/a3:  0x2002d36c
    [00:01:18.508,850] <err> os: r3/a4:  0x00000040 r12/ip:  0x00000000 r14/lr:  0x0006a4a5
    [00:01:18.508,850] <err> os:  xpsr:  0x21000000
    [00:01:18.508,850] <err> os: Faulting instruction address (r15/pc): 0x0006a48e
    [00:01:18.508,880] <err> os: >>> ZEPHYR FATAL ERROR 31: Unknown error on CPU 0
    [00:01:18.508,911] <err> os: Current thread: 0x20012d58 (MPSL Work)
    [00:01:18.738,616] <err> os: Halting system

  • That bad that it has been almost 2 months and you still have this issue. I still cannot see the thread name where the fault instruction happens. 

    Are you sure you added the configs I suggested to add in prj.conf. Without thread name  it is hard for anyone to get the context of this hardfault and hence no one will be able to give you a proper debugging direction.

Related