Usage fault in NCS v2.7.0

Hi,

I have updated the device application firmware from nRF Connect SDK v2.6.1 to v2.7.0. 

The firmware was working fine in v2.6.1 without issue. But the same firmware is running into hard fault for SDK v2.7.0

In log, I can see the USAGE FAULT error. It shows the  Illegal use of the EPSR and sometimes Unaligned memory access. Following is the error detail,

*** Booting My Application v1.3.0-beta-98d26a300f1a ***
*** Using nRF Connect SDK v2.7.0-5cb85570ca43 ***
*** Using Zephyr OS v3.6.99-100befc70c74 ***
[00:00:00.008,941] <inf> fs_nvs: 2 Sectors of 4096 bytes
[00:00:00.009,307] <inf> fs_nvs: alloc wra: 0, fd0
[00:00:00.009,674] <inf> fs_nvs: data wra: 0, 1c
[00:00:00.010,131] <inf> bt_sdc_hci_driver: SoftDevice Controller build revision: 
                                            d6 da c7 ae 08 db 72 6f  2a a3 26 49 2a 4d a8 b3 |......ro *.&I*M..
                                            98 0e 07 7f                                      |....             
[00:00:00.014,221] <inf> bt_hci_core: HW Platform: Nordic Semiconductor (0x0002)
[00:00:00.014,648] <inf> bt_hci_core: HW Variant: nRF52x (0x0002)
[00:00:00.015,045] <inf> bt_hci_core: Firmware: Standard Bluetooth controller (0x00) Version 214.51162 Build 1926957230
[00:00:00.016,143] <inf> bt_hci_core: No ID address. App must call se[00:00:21.809,936] <wrn> bt_hci_core: opcode 0x200a status 0x0d
[00:00:33.065,460] <err> os: ***** USAGE FAULT *****
[00:00:33.065,795] <err> os:   Illegal use of the EPSR
[00:00:33.066,162] <err> os: r0/a1:  0x20004040  r1/a2:  0x000013d5  r2/a3:  0x0000523f
[00:00:33.066,680] <err> os: r3/a4:  0x000013c1 r12/ip:  0x000013c1 r14/lr:  0x000013c1
[00:00:33.067,169] <err> os:  xpsr:  0x00000000
[00:00:33.067,504] <err> os: Faulting instruction address (r15/pc): 0x000013c1
[00:00:33.067,962] <err> os: >>> ZEPHYR FATAL ERROR 35: Unknown error on CPU 0
[00:00:33.068,420] <err> os: Current thread: 0x2000bcb8 (unknown)
[00:00:33.068,817] <err> os: Halting system
*** Booting My Application v1.3.0-beta-98d26a300f1a ***
*** Using nRF Connect SDK v2.7.0-5cb85570ca43 ***
*** Using Zephyr OS v3.6.99-100befc70c74 ***
[00:00:00.008,941] <inf> fs_nvs: 2 Sectors of 4096 bytes
[00:00:00.009,307] <inf> fs_nvs: alloc wra: 0, fd0
[00:00:00.009,674] <inf> fs_nvs: data wra: 0, 1c
[00:00:00.010,131] <inf> bt_sdc_hci_driver: SoftDevice Controller build revision: 
                                            d6 da c7 ae 08 db 72 6f  2a a3 26 49 2a 4d a8 b3 |......ro *.&I*M..
                                            98 0e 07 7f                                      |....             
[00:00:00.014,221] <inf> bt_hci_core: HW Platform: Nordic Semiconductor (0x0002)
[00:00:00.014,648] <inf> bt_hci_core: HW Variant: nRF52x (0x0002)
[00:00:00.015,075] <inf> bt_hci_core: Firmware: Standard Bluetooth controller (0x00) Version 214.51162 Build 1926957230
[00:00:00.016,143] <inf> bt_hci_core: No ID address. App must call se[00:00:11.894,958] <wrn> bt_hci_core: opcode 0x200a status 0x0d
[00:00:34.955,596] <err> os: ***** USAGE FAULT *****
[00:00:34.955,932] <err> os:   Unaligned memory access
[00:00:34.956,298] <err> os: r0/a1:  0x2000bbad  r1/a2:  0x0000002c  r2/a3:  0x200308ec
[00:00:34.956,817] <err> os: r3/a4:  0x00000040 r12/ip:  0x00000000 r14/lr:  0x0006e0e5
[00:00:34.957,305] <err> os:  xpsr:  0x21000000
[00:00:34.957,672] <err> os: Faulting instruction address (r15/pc): 0x0006e0ce
[00:00:34.958,099] <err> os: >>> ZEPHYR FATAL ERROR 31: Unknown error on CPU 0
[00:00:34.958,557] <err> os: Current thread: 0x20012a30 (unknown)
[00:00:34.958,984] [1;31m<err> os: Halting system

The device is working as a BLE extender and has a central and peripheral role. (1 peripheral and multiple central).

I observed an error occur in the device when it is already connected (via peripheral) and it initiates a connection to another peripheral (using central).

The extender device has an nRF52840 chip and uses nRF Connect SDK v2.7.0.

Would you please help me to debug and fix this issue?

Let me know any other information require.

Thanks,

Narendra.

Parents
  • That bad that it has been almost 2 months and you still have this issue. I still cannot see the thread name where the fault instruction happens. 

    Are you sure you added the configs I suggested to add in prj.conf. Without thread name  it is hard for anyone to get the context of this hardfault and hence no one will be able to give you a proper debugging direction.

  • Yes, right. But I was busy with some other task. Just resume this. 

    And yes, I have added the config you suggested in prj.conf file. 

    The above logs were generated with the new configuration. 

    [00:01:18.508,911] <err> os: Current thread: 0x20012d58 (MPSL Work)

    is this not a thread name? 

  • Some additional logs, set 

    CONFIG_ASSERT=y

    ASSERTION FAIL [err == 0] @ WEST_TOPDIR/zephyr/subsys/bluetooth/host/hci_core.c:333
      command opcode 0x0406 timeout with err -11
    [00:00:23.780,364] <err> os: r0/a1:  0x00000004  r1/a2:  0x0000014d  r2/a3:  0x00000002
    [00:00:23.780,395] <err> os: r3/a4:  0x00000004 r12/ip:  0x00000010 r14/lr:  0x0003d0b7
    [00:00:23.780,395] <err> os:  xpsr:  0x01000000
    [00:00:23.780,426] <err> os: Faulting instruction address (r15/pc): 0x000694be
    [00:00:23.780,456] <err> os: >>> ZEPHYR FATAL ERROR 4: Kernel panic on CPU 0
    [00:00:23.780,487] <err> os: Current thread: 0x20004320 (BT TX)
    [00:00:24.014,434] <err> os: Halting system

  • Now we have the context, There seems to be stack overflow in the BT transmit thread. 

    please try below in your prj.conf, try to increase the stack sizes if you already have that size.

    CONFIG_BT_RX_STACK_SIZE=2048
    CONFIG_BT_HCI_TX_STACK_SIZE=1024
    

    you can experiment with recalibrating the rx and tx thread prio for the host as below

    CONFIG_BT_HCI_TX_PRIO=7
    CONFIG_BT_RX_PRIO=8
    

    pristine build and try again. I hope this might help

Reply
  • Now we have the context, There seems to be stack overflow in the BT transmit thread. 

    please try below in your prj.conf, try to increase the stack sizes if you already have that size.

    CONFIG_BT_RX_STACK_SIZE=2048
    CONFIG_BT_HCI_TX_STACK_SIZE=1024
    

    you can experiment with recalibrating the rx and tx thread prio for the host as below

    CONFIG_BT_HCI_TX_PRIO=7
    CONFIG_BT_RX_PRIO=8
    

    pristine build and try again. I hope this might help

Children
  • Increased the stack size up to 8192, but didn't help. (CONFIG_BT_HCI_TX_STACK_SIZE=8192). 

    CONFIG_BT_RX_STACK_SIZE=8192 already set. 

    Also when setting CONFIG_BT_HCI_TX_PRIO=7 in pjr.conf file, receiving an error in the build process. 

    Please check below is the application log, which may help to debug the issue. As mentioned above our device is working as an extender, having one peripheral and multiple central roles. Transmitting BLE packets on peripheral and central both side after connection. 

    [00:00:34.580,718] <inf> SL: Initiating connection
    [00:00:34.580,780] <inf> PSH: NUS server send. Length 20
    [00:00:34.580,810] <inf> PSH: ServerTx
    [00:00:34.581,085] <dbg> BCS: on_sent: Data send, conn 0x20004430
    [00:00:34.581,085] <inf> PSH: Server Tx Complete
    [00:00:34.679,901] <inf> SL: Central connected: 1
    [00:00:34.680,114] <inf> SL: Security changed: level 0
    [00:00:34.680,175] <inf> SL: MTU exchange pending
    [00:00:34.704,040] <inf> SL: MTU exchange successful
    [00:00:34.704,071] <inf> SL: Start scanning: interval 181 ms, window 43 ms
    [00:00:34.705,230] <inf> CSH: NUS Client module initialized
    [00:00:34.705,261] <inf> CSH: Service discovery completed
    [00:00:34.705,413] <inf> CSH: NUS client send: connIndx 1, pkt length 20
    [00:00:34.705,688] <inf> CSH: Client Tx Complete 1
    [00:00:34.705,780] <inf> PSH: NUS server send. Length 20
    [00:00:34.705,810] <inf> PSH: ServerTx
    ASSERTION FAIL [err == 0] @ WEST_TOPDIR/zephyr/subsys/bluetooth/host/hci_core.c:333
      command opcode 0x0406 timeout with err -11
    [00:00:44.706,146] <err> os: r0/a1:  0x00000004  r1/a2:  0x0000014d  r2/a3:  0x00000002
    [00:00:44.706,176] <err> os: r3/a4:  0x00000004 r12/ip:  0x00000010 r14/lr:  0x0003b5fb[0m
    [00:00:44.706,176] <err> os:  xpsr:  0x01000000
    [00:00:44.706,207] <err> os: Faulting instruction address (r15/pc): 0x000642fe
    [00:00:44.706,237] <err> os: >>> ZEPHYR FATAL ERROR 4: Kernel panic on CPU 0
    [00:00:44.706,268] <err> os: Current thread: 0x200042d0 (BT TX)
    [00:00:44.940,032] <err> os: Halting system

  • Updating NCS to v2.8.0 fixed the above issue. Not generating reported error or fault at the same point.

    Thanks,

    Narendra

Related