This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

How to find cause of a fault occurring inside a SoftDevice?

Hi,

 I'm using SDK 13.1, SoftDevice 4.0.5 on nRF52832. I have an app which runs for a long time but will occasionally enter the hardfault handler from an address within the SoftDevice. Logging is turned off. My app stashes registers from the hardfault info into a small area of ram and upon reboot it is sent to my server. I have not yet seen this error happen using the debugger so I cannot get a stack trace. When I disassemble the fault address in the SoftDevice the instruction is always "svc 255". I don't see this svc code in the SDK, so I presume it is some sort of fatal error code. Any ideas how I might resolve this?

The offending address is most often 0x142e4, but I also see 0x1a968, 0x16d36, and 0x1117c.

Thanks

Parents
  • Hi.

    The SoftDevice is a precompiled hex file, as you might know. I cannot show you the code which hard faults, but I can tell you the cause of the hard faults.

    All the hard fault addresses could relate to either:

    1. Wrong use of timeslots, by that I mean that a timeslot is not properly managed and closed before the SoftDevice needs to do something.

    2. An interrupt has to high priority and interrupts the SoftDevice when it is doing something.

    Do you use timeslots? Have you given an interrupt a to high priority?

    Best regards,

    Andreas

  • Hi Andreas,

    I make no use of timeslots and have no high priority interrupts of which I am aware. I'm using standard drivers and haven't changed their priority.

    Searching my sdk_config.h for *IRQ_PRIORITY* shows all are defined as 7. I see no calls to nrf_drv_common_irq_enable or NVIC_SetPriority or sd_nvic_SetPriority in my code.

    Using the debugger, I stopped while my code was running and inspected the NVIC IPR registers. The non-zero values there were either 0xE0 or 0x80, except for the MWU (debugger) which was 0x20. I think the 0xE0 ones correspond to drivers I am using at priority 7 and the 0x80 ones (POWER_CLOCK_IRQn, RNG_IRQn, ECB_IRQn, CCM_AAR_IRQn, SWI5_EGU5_IRQn) are from the SoftDevice.

    Is there anything else I should look for?

    Here is an annotated dump of the NVIC

    0x80  POWER_CLOCK_IRQn          =   0,              /*!< 0  POWER_CLOCK                                                            */
    0x00  RADIO_IRQn                =   1,              /*!< 1  RADIO                                                                  */
    0x00  UARTE0_UART0_IRQn         =   2,              /*!< 2  UARTE0_UART0                                                           */
    0xE0  SPIM0_SPIS0_TWIM0_TWIS0_SPI0_TWI0_IRQn=   3,  /*!< 3  SPIM0_SPIS0_TWIM0_TWIS0_SPI0_TWI0                                      */
    0xE0  SPIM1_SPIS1_TWIM1_TWIS1_SPI1_TWI1_IRQn=   4,  /*!< 4  SPIM1_SPIS1_TWIM1_TWIS1_SPI1_TWI1                                      */
    0x00  NFCT_IRQn                 =   5,              /*!< 5  NFCT                                                                   */
    0xE0  GPIOTE_IRQn               =   6,              /*!< 6  GPIOTE                                                                 */
    0x00  SAADC_IRQn                =   7,              /*!< 7  SAADC                                                                  */
    0x00  TIMER0_IRQn               =   8,              /*!< 8  TIMER0                                                                 */
    0x00  TIMER1_IRQn               =   9,              /*!< 9  TIMER1                                                                 */
    0x00  TIMER2_IRQn               =  10,              /*!< 10 TIMER2                                                                 */
    0x00  RTC0_IRQn                 =  11,              /*!< 11 RTC0                                                                   */
    0x00  TEMP_IRQn                 =  12,              /*!< 12 TEMP                                                                   */
    0x80  RNG_IRQn                  =  13,              /*!< 13 RNG                                                                    */
    0x80  ECB_IRQn                  =  14,              /*!< 14 ECB                                                                    */
    0x80  CCM_AAR_IRQn              =  15,              /*!< 15 CCM_AAR                                                                */
    0xE0  WDT_IRQn                  =  16,              /*!< 16 WDT                                                                    */
    0xE0  RTC1_IRQn                 =  17,              /*!< 17 RTC1                                                                   */
    0x00  QDEC_IRQn                 =  18,              /*!< 18 QDEC                                                                   */
    0xE0  COMP_LPCOMP_IRQn          =  19,              /*!< 19 COMP_LPCOMP                                                            */
    0xE0  SWI0_EGU0_IRQn            =  20,              /*!< 20 SWI0_EGU0                                                              */
    0xE0  SWI1_EGU1_IRQn            =  21,              /*!< 21 SWI1_EGU1                                                              */
    0xE0  SWI2_EGU2_IRQn            =  22,              /*!< 22 SWI2_EGU2                                                              */
    0x00  SWI3_EGU3_IRQn            =  23,              /*!< 23 SWI3_EGU3                                                              */
    0x00  SWI4_EGU4_IRQn            =  24,              /*!< 24 SWI4_EGU4                                                              */
    0x80  SWI5_EGU5_IRQn            =  25,              /*!< 25 SWI5_EGU5                                                              */
    0x00  TIMER3_IRQn               =  26,              /*!< 26 TIMER3                                                                 */
    0x00  TIMER4_IRQn               =  27,              /*!< 27 TIMER4                                                                 */
    0x00  PWM0_IRQn                 =  28,              /*!< 28 PWM0                                                                   */
    0x00  PDM_IRQn                  =  29,              /*!< 29 PDM                                                                    */
    0x00  MWU_IRQn                  =  32,              /*!< 32 MWU                                                                    */
    0x00  PWM1_IRQn                 =  33,              /*!< 33 PWM1                                                                   */
    0x20  PWM2_IRQn                 =  34,              /*!< 34 PWM2                                                                   */
    0x00  SPIM2_SPIS2_SPI2_IRQn     =  35,              /*!< 35 SPIM2_SPIS2_SPI2                                                       */
    0x00  RTC2_IRQn                 =  36,              /*!< 36 RTC2                                                                   */
    0x00  I2S_IRQn                  =  37,              /*!< 37 I2S                                                                    */
    0xE0  FPU_IRQn                  =  38               /*!< 38 FPU                                                                    */

  • Hi.

    Can you provide me with your project? The behavior is not consistent with how one should expect the SoftDevice to work, so if you could provide me with your project so that I can reproduce the issue and take a look at it that would be great.

    I can of course make this case private if you don't wish to share your project with anyone else.

    Best regards,

    Andreas

  • Hi Andreas,

     I don't think that will be too productive -- the underlying problem is that I cannot reproduce the issue here -- and in takes days between the events. I'm only receiving reports from units in the field.

    Can you suggest anything? Or maybe tell me what the instruction "svc 255" is supposed to be doing? I'm speculating the underlying cause could be some sort of memory corruption, and knowing what should happen at that instruction might help me narrow down the possibilities.

    Thanks

  • Hi.

    How many DK's do you have? Can you try to implement the workaround for Errata 108? The workaround is implemented in MDK version 8.9.0 and newer versions, so if you have a newer version it is already implemented.

    Apply the following code after any reset:

    *(volatile uint32_t *)0x40000EE4 = (*(volatile uint32_t *)0x10000258 & 0x0000004F);
    

     This workaround increases the I_RAM current per 4 KB section from 20nA to 30nA.

    If you have many DK's, do you experience this issue on other DK's?

    Best regards,

    Andreas

Reply
  • Hi.

    How many DK's do you have? Can you try to implement the workaround for Errata 108? The workaround is implemented in MDK version 8.9.0 and newer versions, so if you have a newer version it is already implemented.

    Apply the following code after any reset:

    *(volatile uint32_t *)0x40000EE4 = (*(volatile uint32_t *)0x10000258 & 0x0000004F);
    

     This workaround increases the I_RAM current per 4 KB section from 20nA to 30nA.

    If you have many DK's, do you experience this issue on other DK's?

    Best regards,

    Andreas

Children
  • Hi Andreas,

     Thanks for the ideas. The units have had the errata 108 workaround applied and are using MDK 8.17.0; I have verified that the line shown above is implemented.

    I do have a couple of DKs here and I'll try to reproduce the issue with them.

    Some users have reported that the issue happened in a very dense RF environment (e.g., concert). Will the SoftDevice use a lot more power in that situation? I'm wondering if it could be a power supply related problem.

    Does the SoftDevice share the application stack? If so it may be a stack overflow I'm chasing.

    Thanks

  • Hi.

    It would be great if you can try to reproduce the issue on them.

    Could you list all of them with version number and date and if they have the issue or not?

    After you have tested them, can you try to do the same test, but this time you turn off all sd_app_evt_wait, WFE(); and system off?

    Best regards,

    Andreas

Related