adding Softdevice BLE (S140) with GPIO attached coprocessor to project without softdevice. hardfault calling existing code after Softdevice active

I am extending one of Qorvo's UWB platforms.. adding Softdevice BLE functions..  (which work ok) (this is on sdk 17_1_0)


the coprocessor uses a GPIO connected signaling mechanism,  using interrupts..

calling the existing code after BLE is active causes a hardfault.. 

this is running freetos,  we are in a freetos task. 

I've debugged (using ozone) it to  attempting to setup the interrupt handler for the GPIO pins. 

the existing code calls the nrfx libs 

       return qgpio_pin_irq_configure(&qm33_irq, QGPIO_IRQ_DISABLED);  //according to the doc, a gpio interrupt disable shouldn't impact softdevice.. 

```

enum qerr qgpio_pin_irq_configure(const struct qgpio *qgpio_pin, uint32_t flags)
{
nrfx_err_t r;
enum qerr err;
struct qgpio_cb_data *cb_data;
nrfx_gpiote_in_config_t trigger_config;
uint32_t abs_pin = NRF_GPIO_PIN_MAP(qgpio_pin->port, qgpio_pin->pin_number);  // this sets abs_pin = 25 

/* Init GPIOTE at least once. */
if (!nrfx_gpiote_is_init()) {   // this causes the hardfault.   
   r = nrfx_gpiote_init();
   if (r)
   return QERR_EBUSY;
}

if (flags & QGPIO_IRQ_DISABLED) {
   nrfx_gpiote_in_event_disable(abs_pin);
   nrfx_gpiote_in_uninit(abs_pin);
   return QERR_SUCCESS;
}
```
hardfault window
```
The target stopped in HardFault exception state.

Reason: A fault with configurable priority has been escalated to a HardFault exception at 0x00000000.
```

I don't see any mechanism to cause hardfault on a GPIO pin  with SD active. 
if SD is not active this code works as written 

what am I missing

if I run this code BEFORE setup of softdevice, it works, but softdevice init fails. 

Parents
  • Hi Sam, 

    Could you let me know more about "but I can't tell which interrupt invoked the handler."  ? 

    Have you checked how the interrupt for SPIM is enabled ? 

    It's in this struct when nrfx_spim_init() is called:

    typedef struct
    {
        uint8_t sck_pin;      ///< SCK pin number.
        uint8_t mosi_pin;     ///< MOSI pin number (optional).
                              /**< Set to @ref NRFX_SPIM_PIN_NOT_USED
                               *   if this signal is not needed. */
        uint8_t miso_pin;     ///< MISO pin number (optional).
                              /**< Set to @ref NRFX_SPIM_PIN_NOT_USED
                               *   if this signal is not needed. */
        uint8_t ss_pin;       ///< Slave Select pin number (optional).
                              /**< Set to @ref NRFX_SPIM_PIN_NOT_USED
                               *   if this signal is not needed. */
        bool ss_active_high;  ///< Polarity of the Slave Select pin during transmission.
        uint8_t irq_priority; ///< Interrupt priority.
        uint8_t orc;          ///< Overrun character.
                              /**< This character is used when all bytes from the TX buffer are sent,
                                   but the transfer continues due to RX. */
        nrf_spim_frequency_t frequency; ///< SPIM frequency.
        nrf_spim_mode_t      mode;      ///< SPIM mode.
        nrf_spim_bit_order_t bit_order; ///< SPIM bit order.
    #if NRFX_CHECK(NRFX_SPIM_EXTENDED_ENABLED) || defined(__NRFX_DOXYGEN__)
        uint8_t              dcx_pin;     ///< D/CX pin number (optional).
        uint8_t              rx_delay;    ///< Sample delay for input serial data on MISO.
                                          /**< The value specifies the delay, in number of 64 MHz clock cycles
                                           *   (15.625 ns), from the the sampling edge of SCK (leading edge for
                                           *   CONFIG.CPHA = 0, trailing edge for CONFIG.CPHA = 1) until
                                           *   the input serial data is sampled. */
        bool                 use_hw_ss;   ///< Indication to use software or hardware controlled Slave Select pin.
        uint8_t              ss_duration; ///< Slave Select duration before and after transmission.
                                          /**< Minimum duration between the edge of CSN and the edge of SCK and minimum
                                           *   duration of CSN must stay inactive between transactions.
                                           *   The value is specified in number of 64 MHz clock cycles (15.625 ns).
                                           *   Supported only for hardware-controlled Slave Select. */
    #endif
    } nrfx_spim_config_t;

    Have you narrowed it down to exactly SPIM interrupt handling causing the crash ? You mentioned "I get an access fault trying to write to the flash in code space" what is it about ? Is it the internal flash or external flash ? 

    Regarding the interrupt priority check, please note that the softdevice crash most likely not because of the new interrupt handler added but more likely of the ones that already configured (but violate the softdevice requirement) so you may want to post the whole list of interrupt priority configuration here.

  • thanks..

    the config data looks like 20035CC0

    10 11 17 14 00 03 FF 00  00 00 00 40 00 00 00 00  00 00 00 00 00 00 00 01

    sck = 10

    mosi = 11

    miso = 17

    ss_pin = 14

    ss_active_high = 00

    irq_priority = 03, which matches the define used,  CONFIG_SPI_UWB_IRQ_PRIORITY=3

    fill char  = FF

    the hard fault is a write access fault at 75B0C , app_error_fault_handler id=1001, pc=75B0C, info =1

    from the link map that is 

    irq_handler
    0x0000000000075aec 0x5c Nordic/libSDK.a(nrfx_spim.c.obj)  <----- here 
    .text.SPIM0_SPIS0_TWIM0_TWIS0_SPI0_TWI0_IRQHandler
    0x0000000000075b48 0x10 Nordic/libSDK.a(nrfx_spim.c.obj)

    disassembly window

    00075AFE MOV.W R3, #0
    anomaly_198_disable();
    static void anomaly_198_disable(void)
    *((volatile uint32_t *)0x40000E00) = m_anomaly_198_preserved_value;
    00075B02 IT EQ
    00075B04 STREQ.W R3, [R2, #0x0E00]
    nrf_spim_event_clear(p_spim, NRF_SPIM_EVENT_END);
    __STATIC_INLINE void nrf_spim_event_clear(NRF_SPIM_Type * p_reg,
    *((volatile uint32_t *)((uint8_t *)p_reg + (uint32_t)event)) = 0x0UL;
    00075B08 STR.W R3, [R0, #0x0118]
    volatile uint32_t dummy = *((volatile uint32_t *)((uint8_t *)p_reg + (uint32_t)event));
    00075B0C LDR.W R3, [R0, #0x0118]  <---------------------  here 
    00075B10 LDRB R0, [R1, #31]
    00075B12 STR R3, [SP, #4]
    (void)dummy;

    r3 at the time is 0x00047335, which is in the flash section of the image 

    ```

    RAM 0x0000000020013000 0x000000000002d000 xrw
    FLASH 0x0000000000027000 0x000000000007b000 xrw   <----- here 
    CALIB_SHA 0x00000000000fc000 0x0000000000001000 rw
    CALIB 0x00000000000fd000 0x0000000000001000 rw
    *default* 0x0000000000000000 0xffffffffffffffff

    ```

    I added a call to log entry to this function
    static enum qerr spi_config_master(struct qspi *const spi, const struct qspi_config *config)
    {
    nrfx_err_t r = NRFX_ERROR_INVALID_PARAM;
    QLOGD("spi_config_master.");
    but don't see that in the ozone terminal window
    in the disassembly window that function is not present in memory, but the  one following it is
    here are the IRQ priorities as defined in one of the make files
    ./Projects/DW3_QM33_SDK/FreeRTOS/Type2AB_EVB-new/ProjectDefinition/uwb_stack_llhw.cmake: NRFX_CLOCK_CONFIG_IRQ_PRIORITY=6
    ./Projects/DW3_QM33_SDK/FreeRTOS/Type2AB_EVB-new/ProjectDefinition/uwb_stack_llhw.cmake: CLOCK_CONFIG_IRQ_PRIORITY=6
    ./Projects/DW3_QM33_SDK/FreeRTOS/Type2AB_EVB-new/ProjectDefinition/uwb_stack_llhw.cmake: TIMER_DEFAULT_CONFIG_IRQ_PRIORITY=7
    ./Projects/DW3_QM33_SDK/FreeRTOS/Type2AB_EVB-new/ProjectDefinition/uwb_stack_llhw.cmake: NRFX_TIMER_DEFAULT_CONFIG_IRQ_PRIORITY=7
    ./Projects/DW3_QM33_SDK/FreeRTOS/Type2AB_EVB-new/ProjectDefinition/uwb_stack_llhw.cmake: NRFX_RTC_DEFAULT_CONFIG_IRQ_PRIORITY=7
    ./Projects/DW3_QM33_SDK/FreeRTOS/Type2AB_EVB-new/ProjectDefinition/uwb_stack_llhw.cmake: RTC_DEFAULT_CONFIG_IRQ_PRIORITY=7
    ./Projects/DW3_QM33_SDK/FreeRTOS/Type2AB_EVB-new/ProjectDefinition/uwb_stack_llhw.cmake: WDT_CONFIG_IRQ_PRIORITY=7
    ./Projects/DW3_QM33_SDK/FreeRTOS/Type2AB_EVB-new/ProjectDefinition/uwb_stack_llhw.cmake: NRFX_WDT_CONFIG_IRQ_PRIORITY=7
    ./Projects/DW3_QM33_SDK/FreeRTOS/Type2AB_EVB-new/ProjectDefinition/uwb_stack_llhw.cmake: NRFX_USBD_CONFIG_IRQ_PRIORITY=5
    ./Projects/DW3_QM33_SDK/FreeRTOS/Type2AB_EVB-new/ProjectDefinition/uwb_stack_llhw.cmake: USBD_CONFIG_IRQ_PRIORITY=5

    the sdk_config.h only has IRQ priority 6 or 7 as selected 
    none of those appear to conflict with SD

    the code before SD inclusion used timer0, but that was changed to timer1 as timer0 is used by SD. 
    here is the macros at the end of the ld for overflow checking, etc 
    .heap 0x0000000020038480 0x0
    0x0000000020038480 __HeapBase = .
    0x0000000020038480 __end__ = .
    0x0000000020038480 PROVIDE (end = .)
    *(.heap*)
    .heap 0x0000000020038480 0x0 Nordic/libSDK.a(gcc_startup_nrf52840.S.obj)
    0x0000000020038480 __HeapLimit = .

    .stack_dummy 0x0000000020038480 0x4000
    *(.stack*)
    .stack 0x0000000020038480 0x4000 Nordic/libSDK.a(gcc_startup_nrf52840.S.obj)
    0x0000000020040000 __StackTop = (ORIGIN (RAM) + LENGTH (RAM))
    0x000000002003c000 __StackLimit = (__StackTop - SIZEOF (.stack_dummy))
    0x0000000020040000 PROVIDE (__stack = __StackTop)
    0x0000000000000001 ASSERT ((__StackLimit >= __HeapLimit), region RAM overflowed with stack)
    0x0000000000000fb0 DataInitFlashUsed = (__bss_start__ - __data_start__)
    0x0000000000067ba4 CodeFlashUsed = (__etext - ORIGIN (FLASH))
    0x0000000000068b54 TotalFlashUsed = (CodeFlashUsed + DataInitFlashUsed)
    0x0000000000000001 ASSERT ((TotalFlashUsed <= LENGTH (FLASH)), region FLASH overflowed with .data and user data)
    0x0000000000000020 CONFIG_SECURE_PARTITIONS_UWB_L1_CONFIG_SHA256_SIZE = 0x20
    0x0000000000001000 CONFIG_SECURE_PARTITIONS_UWB_L1_CONFIG_SIZE = 0x1000
  • this IS one of the functions I called out before as having code optimization problems however

    this line of code generated  a pointer of 2 for p_cb


        spim_control_block_t * p_cb = &m_cb[p_instance->drv_inst_idx];

    I have to change it like this 

     spim_control_block_t * p_cb = m_cb+p_instance->drv_inst_idx;

    I reported this in Qorvo forums, 

    p_instance->drv_inst_idx = 2

    rfx_err_t nrfx_spim_xfer(nrfx_spim_t const * const p_instance,
    nrfx_spim_xfer_desc_t const * p_xfer_desc,
    uint32_t flags)
    {
    spim_control_block_t * p_cb = &m_cb[p_instance->drv_inst_idx]; <---- this sets the p_cb pointer to 2!!

    change to 
    
    spim_control_block_t * p_cb = m_cb+p_instance->drv_inst_idx;

    https://forum.qorvo.com/t/dw3-qm33-sdk-bug-and-some-code-optimization-questions/24569

    I have fixed all of those m_cb pointers in nrfx_spim.c, but same value in r3 at fault time
    and its repeatable,  across buils, so its not a random value. 

Reply
  • this IS one of the functions I called out before as having code optimization problems however

    this line of code generated  a pointer of 2 for p_cb


        spim_control_block_t * p_cb = &m_cb[p_instance->drv_inst_idx];

    I have to change it like this 

     spim_control_block_t * p_cb = m_cb+p_instance->drv_inst_idx;

    I reported this in Qorvo forums, 

    p_instance->drv_inst_idx = 2

    rfx_err_t nrfx_spim_xfer(nrfx_spim_t const * const p_instance,
    nrfx_spim_xfer_desc_t const * p_xfer_desc,
    uint32_t flags)
    {
    spim_control_block_t * p_cb = &m_cb[p_instance->drv_inst_idx]; <---- this sets the p_cb pointer to 2!!

    change to 
    
    spim_control_block_t * p_cb = m_cb+p_instance->drv_inst_idx;

    https://forum.qorvo.com/t/dw3-qm33-sdk-bug-and-some-code-optimization-questions/24569

    I have fixed all of those m_cb pointers in nrfx_spim.c, but same value in r3 at fault time
    and its repeatable,  across buils, so its not a random value. 

Children
  • Hi Sam, 
    I'm looking at this 

     app_error_fault_handler id=1001, pc=75B0C, info =1

    Error ID=1001 means NRF_FAULT_ID_APP_MEMACC. Here is the description for this error: 

    Application invalid memory access. The info parameter will contain 0x00000000,
    in case of SoftDevice RAM access violation. In case of SoftDevice peripheral
    register violation the info parameter will contain the sub-region number of
    PREGION[0], on whose address range the disallowed write access caused the
    memory access fault.

    The info =  1 so it match with bit number 0 in the table: https://docs.nordicsemi.com/bundle/ps_nrf52840/page/memory.html#topic

    I suspect it either CLOCK control or POWER control. You can try to disable the protection of the subregion to see it's actually the cause of the fault : 


    NRF_MWU->PREGION[0].SUBS &= ~(MWU_PREGION_SUBS_SR0_Include << MWU_PREGION_SUBS_SR0_Pos);
    __DSB(); // barrier to ensure register is set before accessing NVMC or ACL.

    Both CLOCK and POWER is restricted access when the softdevice is active (see 7.1 in the softdevice SDS .pdf file) . You may want to check if you have SOFTDEVICE_PRESENT defined in your preprocessor definitions. 

  • I went thru all the code, found one place setting power state, only used for SPI, not SPIM, also not loaded in binary 

    I put a stop of the code that fails, 

    using the table link above 

    R0 = 0x4002F000

    which says SPIM3

    if I run from there, another stop at the same place, like the interrupt is not cleared. 

    if I disable the stop, I get a hard fault. 

    so, added more debug info..  i save the spim object address and as the interrupts are redirected to a common interrupt handler I save the spim number all prior to the code causing the fault.

    the number is 3 (spim3) 

    and the saved spim pointer is the same as above , in LSB format 00 F0 02 40

    anyhow.. still confused.. and in the latest fault it was saving the spim pointer address that faulted..

    interesting..   directly before the fault location is this in nrfx_spim.c

    ```

    #if NRFX_CHECK(NRFX_SPIM3_NRF52840_ANOMALY_198_WORKAROUND_ENABLED)
     if (p_spim == NRF_SPIM3)
     {
     anomaly_198_disable();
     }
    #endif
    ```
    I commented it out.. and now no fault..  I can run my BLE operations.. 

    devzone.nordicsemi.com/.../enabling-workaround-for-anomaly-198-crashes-softdevice

    I
     have changed the define in sdk_config.h to 0 instead of 1 

Related