This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

BUS FAULT - Instruction bus error

Hello,

I am running a single ( main() starts sysworkq thread) thread application in Segger Embedded Studio v5.34a, built for the nRF52833 SoC. 

Although I am using a proprietary board built around the nRF52833 SoC, I am using the config for the board nrf52833dk_nrf52833 along with an overlay file.

The program runs for few minutes then crashes with a BUS FAULT, Instruction bus error. I have tried to find the root cause of this fault but until now without success. So, I am now turning to the experts for help. 

I tried to run addr2line command to find which function is at address 0x0059c986 but the information returned was a useless ??:0.

>C:\Zypher\v1.5.0-rc1\toolchain\segger_embedded_studio\gcc\arm-none-eabi\bin\arm-none-eabi-addr2line.exe -e C:\Main\build_nrf52833dk_nrf52833\zephyr\zephyr.elf 0x0059c986
??:0

>C:\Zypher\v1.5.0-rc1\toolchain\opt\bin\arm-none-eabi-addr2line.exe -e C:\Main\build_nrf52833dk_nrf52833\zephyr\zephyr.elf 0x0059c986
??:0

Below is a capture from the Debug Terminal.

*** Booting Zephyr OS build v2.4.99-ncs1-rc1 ***

...

...

[00:08:57.476,654] <err> os: ***** BUS FAULT *****
[00:08:57.476,654] <err> os: Instruction bus error
[00:08:57.476,684] <err> os: r0/a1: 0x00000000 r1/a2: 0x00000000 r2/a3: 0x20002c38
[00:08:57.476,684] <err> os: r3/a4: 0x00000001 r12/ip: 0x200033ec r14/lr: 0x0001f721
[00:08:57.476,684] <err> os: xpsr: 0x81000000
[00:08:57.476,684] <err> os: Faulting instruction address (r15/pc): 0x0059c986
[00:08:57.476,684] <err> os: >>> ZEPHYR FATAL ERROR 0: CPU exception on CPU 0
[00:08:57.476,715] <err> os: Current thread: 0x20002c38 (sysworkq)
[00:08:57.714,080] <err> os: Halting system

I have carried out further tests to get more insight into the possible root cause of the crash.

I disabled different parts of the code and each time I get a similar but different error. See below the different flavours of the error.

[00:22:34.910,186] <err> os: ***** MPU FAULT *****
[00:22:34.910,186] <err> os: Instruction Access Violation


[00:11:06.176,818] <err> os: ***** BUS FAULT *****
[00:11:06.176,818] <err> os: Imprecise data bus error


[00:02:27.178,894] <err> os: ***** MPU FAULT *****
[00:02:27.178,894] <err> os: Instruction Access Violation


[00:01:45.790,191] <err> os: ***** BUS FAULT *****
[00:01:45.790,191] <err> os: Instruction bus error


[00:07:29.002,807] <err> os: ***** BUS FAULT *****
[00:07:29.002,807] <err> os: Imprecise data bus error


[00:12:01.936,401] <err> os: ***** MPU FAULT *****
[00:12:01.936,401] <err> os: Instruction Access Violation

It has been about a week since I posted this question and I am still waiting for a reply. I am stuck and I desperately need some guidance on how to debug this kind of problem.

I am looking at this page https://developer.arm.com/documentation/100235/0003/the-cortex-m33-processor/fault-handling for inspiration but until now I am no wiser.

I increased the work queue stack size from 2048 to 4096 in the overlay file but I am still seeing this error about 7 minutes after boot-up.  

CONFIG_SYSTEM_WORKQUEUE_STACK_SIZE=4096

<err> os: ***** MPU FAULT *****
<err> os: Instruction Access Violation
<err> os: r0/a1: 0x00000000 r1/a2: 0x00000000 r2/a3: 0x20002c30
<err> os: r3/a4: 0x00000001 r12/ip: 0x2000337c r14/lr: 0x0001ee1d
<err> os: xpsr: 0x81000000
<err> os: Faulting instruction address (r15/pc): 0xef3dc986
<err> os: >>> ZEPHYR FATAL ERROR 0: CPU exception on CPU 0
<err> os: Current thread: 0x20002c30 (sysworkq)
<err> os: Halting system

I can see the PC is pointing to an off-range location 0xef3dc986. I tried addr2line but it did not help, it returned ??:0 as expected.

>C:\Zypher\v1.5.0-rc1\toolchain\opt\bin\arm-none-eabi-addr2line.exe -e C:\...\Main\build_nrf52833dk_nrf52833_nRF52833\zephyr\zephyr.elf 0xef3dc986
??:0

Finally, after I added these two lines in the project configuration file prj.conf, the crash seems to have been resolved. I have been running the program for over 20.5 hours and I still have not seen the crash. However, I would like to understand how these two lines have impacted the firmware and solved the crash problem.

CONFIG_DEBUG_OPTIMIZATIONS=y
CONFIG_COMPILER_OPT=""

Can someone please help me get to the bottom of this problem?

Thank you.

Kind regards

Mohamed Belaroussi

Parents
  • Hello,

     

    <err> os: r3/a4: 0x00000001 r12/ip: 0x2000337c r14/lr: 0x0001ee1d

     Can you check the address on "lr"? It should work with addr2line and you can also find it in the .map file.

  • Hi Hakon,

    Thank you for your suggestion. I will need to undo some of the changes that made the MPU/BUS FAULT disappear in order to reproduce the problem. I will let you know how I got on.

    I have reproduced the MPU FAULT,

    [00:04:46.795,989] <err> os: ***** MPU FAULT *****
    [00:04:46.795,989] <err> os: Instruction Access Violation
    [00:04:46.796,020] <err> os: r0/a1: 0x00000000 r1/a2: 0x00000000 r2/a3: 0x20002c40
    [00:04:46.796,020] <err> os: r3/a4: 0x00000001 r12/ip: 0x200033f4 r14/lr: 0x0001f7d5
    [00:04:46.796,020] <err> os: xpsr: 0x81000000
    [00:04:46.796,020] <err> os: Faulting instruction address (r15/pc): 0xee19c986
    [00:04:46.796,020] <err> os: >>> ZEPHYR FATAL ERROR 0: CPU exception on CPU 0
    [00:04:46.796,051] <err> os: Current thread: 0x20002c40 (sysworkq)
    [00:04:47.033,508] <err> os: Halting system

    the addr2line command gave this response,

    C:\WINDOWS\System32>C:\Zypher\v1.5.0-rc1\toolchain\opt\bin\arm-none-eabi-addr2line.exe -e C:\Sandbox\Main-RX-TX\build_nrf52833dk_nrf52833_NO_DEBUG_OPTIMIZATIONS_MPU_FAULT\zephyr\zephyr.elf 0x0001f7d5
    C:/Zypher/v1.5.0-rc1/zephyr/drivers/spi/spi_nrfx_spim.c:250

    Line 250 corresponds to the return from the function spi_nrfx_transceive() 

    static int spi_nrfx_transceive(const struct device *dev,
                                                   const struct spi_config *spi_cfg,
                                                   const struct spi_buf_set *tx_bufs,
                                                   const struct spi_buf_set *rx_bufs)
    {
       return transceive(dev, spi_cfg, tx_bufs, rx_bufs, false, NULL);
    } >>> This is line 250

    It looks like the fault occurred in the SPI driver function transceive(). However, This function gets called for every SPI transaction. Why is it that it works for seconds, minutes, and sometimes even over an hour then eventually fails with a MPU FAULT?

    I will have to debug a bit deeper than I was expecting to come to the bottom of this problem

    Meanwhile, can you please comment on why after this change in prj.conf the fault disappeared?

    CONFIG_DEBUG_OPTIMIZATIONS=y
    CONFIG_COMPILER_OPT=""

    Hopefully, the reply will be quicker this time.

    Thank you.

    Kind regards

    Mohamed Belaroussi

  • Learner said:
    Meanwhile, can you please comment on why after this change in prj.conf the fault disappeared?

     Compiler optimizations generally reduces the amount of stack required by the application. It's difficult to tell without having looked at your code. It could be a stack overflow in some of the threads, but I'm not 100 % certain.

Reply Children
No Data
Related