adding Softdevice BLE (S140) with GPIO attached coprocessor to project without softdevice. hardfault calling existing code after Softdevice active

I am extending one of Qorvo's UWB platforms.. adding Softdevice BLE functions..  (which work ok) (this is on sdk 17_1_0)


the coprocessor uses a GPIO connected signaling mechanism,  using interrupts..

calling the existing code after BLE is active causes a hardfault.. 

this is running freetos,  we are in a freetos task. 

I've debugged (using ozone) it to  attempting to setup the interrupt handler for the GPIO pins. 

the existing code calls the nrfx libs 

       return qgpio_pin_irq_configure(&qm33_irq, QGPIO_IRQ_DISABLED);  //according to the doc, a gpio interrupt disable shouldn't impact softdevice.. 

```

enum qerr qgpio_pin_irq_configure(const struct qgpio *qgpio_pin, uint32_t flags)
{
nrfx_err_t r;
enum qerr err;
struct qgpio_cb_data *cb_data;
nrfx_gpiote_in_config_t trigger_config;
uint32_t abs_pin = NRF_GPIO_PIN_MAP(qgpio_pin->port, qgpio_pin->pin_number);  // this sets abs_pin = 25 

/* Init GPIOTE at least once. */
if (!nrfx_gpiote_is_init()) {   // this causes the hardfault.   
   r = nrfx_gpiote_init();
   if (r)
   return QERR_EBUSY;
}

if (flags & QGPIO_IRQ_DISABLED) {
   nrfx_gpiote_in_event_disable(abs_pin);
   nrfx_gpiote_in_uninit(abs_pin);
   return QERR_SUCCESS;
}
```
hardfault window
```
The target stopped in HardFault exception state.

Reason: A fault with configurable priority has been escalated to a HardFault exception at 0x00000000.
```

I don't see any mechanism to cause hardfault on a GPIO pin  with SD active. 
if SD is not active this code works as written 

what am I missing

if I run this code BEFORE setup of softdevice, it works, but softdevice init fails. 

  • so, more investigation

    i'm pretty sure this is an interrupt priority problem, but I can't find id. 

    if U run the routine that fails after SD init,  BEFORE Sd init, it starts, no problem.

    when I get to the next step on using the backend chip

    I still get the same nrf_spim_event_clear write protection crash. 

    in the nrfx_spim.c , in the irq_handler

    the call stack in Ozone doesn't tell me where the code was before.. I can see it thru the pc address

    but I can't tell which interrupt invoked the handler. 

    init_log: Backends initialized
    create_log_processing_task: init_log completed
    create_log_processing_task: Stack allocated
    create_log_processing_task: Task created
    TEST RTT OUTPUT
    board_interface_init: START
    board_interface_init: About to init BLE
    Role: RoleController
    WARNING: UWB MAC initialization skipped - incompatible with SoftDevice
    Calling ble_stack_manager_init...


    [BLE] === SCANNING ALL INTERRUPTS FOR CONFLICTS ===
    [BLE] IRQ 0: priority=6, enabled=1
    [BLE] IRQ 9: priority=7, enabled=1

    [Fira Init] qplatform_init ok  <--- this is the code that failed AFTER sd was started
    [Fira Init] l1_config_init ok
    [Fira Init] llhw_init ok
    [Fira Init] done rc=0


    [BLE] ble_stack_manager_init: 1-nrf_sdh_enable_request
    [BLE] ble_stack_manager_init: 2-Configuring SoftDevice for concurrent connections
    [BLE] nrf_sdh_ble_default_cfg_set returned: 0, ram_start=0x20013000
    [BLE] sd_ble_cfg_set(CONN_CFG_GAP) returned: 0, ram_start=0x20013000 (conn_count=8)
    [BLE] ble_stack_manager_init: 3-nrf_sdh_ble_enable
    [BLE] Enabling central role: 1 peripheral + 7 central = 8 total connections
    [BLE] sd_ble_cfg_set(ROLE_COUNT) returned: 0, ram_start=0x20013000
    [BLE] nrf_sdh_ble_enable returned: 0 (0x0)
    [BLE] gap_params_init: COMPLETE
    [BLE] BLE stack initialized successfully - observer priority=3
    ble_stack_manager_init returned: TRUE
    BLE init SUCCESS - starting advertising
    [BLE] Using manual advertising data encoding...
    [BLE] *** BUILDING SCAN RESPONSE: role=1, name='CN_Controller', len=13 ***
    [BLE] Advertising data: 3 bytes, Scan response: 15 bytes (type=0x09, name='CN_Controller')
    [BLE] sd_ble_gap_adv_start returned: 0 (handle=0x00)
    [BLE] BLE advertising started successfully!

    advertising_start called  <--- I can see the BLE device in nrf_connect app 

    now start up the UWB device 


    Initializing UCI BLE commands...
    [UCI_BLE] uci_ble_commands_init() CALLED
    [UCI_BLE] CONTROLLER: Creating UCI manager task...
    [UCI_BLE] CONTROLLER: UCI manager task created successfully
    UCI BLE commands initialized successfully
    === BLE Init Complete ===
    [CTRL_TASK] *** TASK STARTED ***
    [CTRL_TASK] Mutex initialized
    [CTRL_TASK] Entering main loop, signal=2002C510
    [TASK_UCI] calling uci_open_backends
    in open_backends
    in open_backends uwbmac init ok
    in open_backends uci_init ok
    in open_backends init coordinator

    [00:00:00.021,847] <debug> nrf_sdh: State request: 0x00000000
    [00:00:00.021,854] <debug> nrf_sdh: State change: 0x00000000
    [00:00:00.279,301] <debug> nrf_sdh: State change: 0x00000001


    if the app_error_handler goes all the way thru 

    [00:01:01.659,109] <error> app: Fatal error

    if I don't execute that open_backends

    I see this from SD (and can connect thru NRF_Connect) 

    [CTRL_TASK] Entering main loop, signal=2002C4C0
    [TASK UCI] on entry
    [UCI_BLE] *** UCI MANAGER TASK STARTED (CONTROLLER MODE) ***
    [UCI_BLE] Performing late initialization in UCI manager task...
    [UCI_BLE] Late initialization: Creating UCI BLE queues and tasks...
    [UCI_BLE] Notification task started
    [UCI_BLE] Ranging continuation task created successfully
    [UCI_BLE] Ranging queue and task initialization complete
    [UCI_BLE] Ranging continuation task started - will process queued ranging requests
    log_processing_task: Started
    [UCI_BLE] Beacon notification semaphore created
    [UCI_BLE] UCI execution queue created successfully at 20030E58
    [UCI_BLE] CONTROLLER: UCI execution task already running (created in init)
    [UCI_BLE] Task handle: 2002BE30
    [UCI_BLE] Free heap: 14704 bytes
    [UCI_BLE] ***** UCI execution queue initialization COMPLETE *****
    [UCI_BLE] Late initialization complete - queue=20030E58
    [UCI_BLE] *** UCI_EXEC task ready to process messages from queue ***
    [UCI_BLE] Delaying 50ms before entering main loop...
    [UCI_BLE] *** ENTERING MAIN LOOP - will check queue and flags every 100ms ***
    [BLE] scan_evt_handler called! evt_id=6
    [BLE] QWR module assigned to connection handle 0x0007
    [BLE] BEACON: Initializing system attributes (for QWR)
    [BLE] BEACON: System attributes initialized successfully
    [BLE] CONTROLLER: Connection accepted from phone app
    [BLE] Conn handle: 0x0007, Peer address: 7B:D3:06:09:B6:4E (type 2)
    [BLE] Connection allowed - waiting for commands from phone
    [BLE] WRITE EVENT: handle=0x000D, op=1, len=2
    [BLE] Command char value=0x0010, CCCD=0x0011

    so, SD is up and running, and the IRQ setup BEFORE sd worked. 
    but turning it on , causes the fault in spim. 

    I added an u=interrupt enabled checker before and after the one call

    ```

    for (int irq = 0; irq < 48; irq++) { // nRF52840 has 48 interrupts
    uint32_t priority = NVIC_GetPriority((IRQn_Type)irq);
    uint32_t enabled1 = NVIC_GetEnableIRQ((IRQn_Type)irq);
    // Only report enabled interrupts or those with problematic priorities
    if (enabled1){ //} (priority == 0 || priority == 1 || priority == 4)) {
    SEGGER_RTT_printf(0,"IRQ %d: priority=%d, enabled=%d %s\r\n",
    //QLOGD("IRQ %d: priority=%d, enabled=%d %s",
    irq, priority, enabled1,
    (priority == 0 || priority == 1 || priority == 4) ? "*** CONFLICT ***" : "");
    }
    }
    ```


    [BLE] before 1 === SCANNING ALL INTERRUPTS FOR CONFLICTS ===
    IRQ 0: priority=6, enabled=1
    IRQ 9: priority=7, enabled=1


    [Fira Init] qplatform_init ok
    [Fira Init] l1_config_init ok
    [Fira Init] llhw_init ok
    [Fira Init] done rc=0


    [BLE] after === SCANNING ALL INTERRUPTS FOR CONFLICTS ===
    IRQ 0: priority=6, enabled=1


    IRQ 9: priority=7, enabled=1

    all these are newly enabled, but very low priority


    IRQ 6: priority=7, enabled=1

    IRQ 10: priority=7, enabled=1

    IRQ 27: priority=7, enabled=1
    IRQ 36: priority=7, enabled=1
    IRQ 39: priority=5, enabled=1

  • Hi Sam, 

    Could you let me know more about "but I can't tell which interrupt invoked the handler."  ? 

    Have you checked how the interrupt for SPIM is enabled ? 

    It's in this struct when nrfx_spim_init() is called:

    typedef struct
    {
        uint8_t sck_pin;      ///< SCK pin number.
        uint8_t mosi_pin;     ///< MOSI pin number (optional).
                              /**< Set to @ref NRFX_SPIM_PIN_NOT_USED
                               *   if this signal is not needed. */
        uint8_t miso_pin;     ///< MISO pin number (optional).
                              /**< Set to @ref NRFX_SPIM_PIN_NOT_USED
                               *   if this signal is not needed. */
        uint8_t ss_pin;       ///< Slave Select pin number (optional).
                              /**< Set to @ref NRFX_SPIM_PIN_NOT_USED
                               *   if this signal is not needed. */
        bool ss_active_high;  ///< Polarity of the Slave Select pin during transmission.
        uint8_t irq_priority; ///< Interrupt priority.
        uint8_t orc;          ///< Overrun character.
                              /**< This character is used when all bytes from the TX buffer are sent,
                                   but the transfer continues due to RX. */
        nrf_spim_frequency_t frequency; ///< SPIM frequency.
        nrf_spim_mode_t      mode;      ///< SPIM mode.
        nrf_spim_bit_order_t bit_order; ///< SPIM bit order.
    #if NRFX_CHECK(NRFX_SPIM_EXTENDED_ENABLED) || defined(__NRFX_DOXYGEN__)
        uint8_t              dcx_pin;     ///< D/CX pin number (optional).
        uint8_t              rx_delay;    ///< Sample delay for input serial data on MISO.
                                          /**< The value specifies the delay, in number of 64 MHz clock cycles
                                           *   (15.625 ns), from the the sampling edge of SCK (leading edge for
                                           *   CONFIG.CPHA = 0, trailing edge for CONFIG.CPHA = 1) until
                                           *   the input serial data is sampled. */
        bool                 use_hw_ss;   ///< Indication to use software or hardware controlled Slave Select pin.
        uint8_t              ss_duration; ///< Slave Select duration before and after transmission.
                                          /**< Minimum duration between the edge of CSN and the edge of SCK and minimum
                                           *   duration of CSN must stay inactive between transactions.
                                           *   The value is specified in number of 64 MHz clock cycles (15.625 ns).
                                           *   Supported only for hardware-controlled Slave Select. */
    #endif
    } nrfx_spim_config_t;

    Have you narrowed it down to exactly SPIM interrupt handling causing the crash ? You mentioned "I get an access fault trying to write to the flash in code space" what is it about ? Is it the internal flash or external flash ? 

    Regarding the interrupt priority check, please note that the softdevice crash most likely not because of the new interrupt handler added but more likely of the ones that already configured (but violate the softdevice requirement) so you may want to post the whole list of interrupt priority configuration here.

  • thanks..

    the config data looks like 20035CC0

    10 11 17 14 00 03 FF 00  00 00 00 40 00 00 00 00  00 00 00 00 00 00 00 01

    sck = 10

    mosi = 11

    miso = 17

    ss_pin = 14

    ss_active_high = 00

    irq_priority = 03, which matches the define used,  CONFIG_SPI_UWB_IRQ_PRIORITY=3

    fill char  = FF

    the hard fault is a write access fault at 75B0C , app_error_fault_handler id=1001, pc=75B0C, info =1

    from the link map that is 

    irq_handler
    0x0000000000075aec 0x5c Nordic/libSDK.a(nrfx_spim.c.obj)  <----- here 
    .text.SPIM0_SPIS0_TWIM0_TWIS0_SPI0_TWI0_IRQHandler
    0x0000000000075b48 0x10 Nordic/libSDK.a(nrfx_spim.c.obj)

    disassembly window

    00075AFE MOV.W R3, #0
    anomaly_198_disable();
    static void anomaly_198_disable(void)
    *((volatile uint32_t *)0x40000E00) = m_anomaly_198_preserved_value;
    00075B02 IT EQ
    00075B04 STREQ.W R3, [R2, #0x0E00]
    nrf_spim_event_clear(p_spim, NRF_SPIM_EVENT_END);
    __STATIC_INLINE void nrf_spim_event_clear(NRF_SPIM_Type * p_reg,
    *((volatile uint32_t *)((uint8_t *)p_reg + (uint32_t)event)) = 0x0UL;
    00075B08 STR.W R3, [R0, #0x0118]
    volatile uint32_t dummy = *((volatile uint32_t *)((uint8_t *)p_reg + (uint32_t)event));
    00075B0C LDR.W R3, [R0, #0x0118]  <---------------------  here 
    00075B10 LDRB R0, [R1, #31]
    00075B12 STR R3, [SP, #4]
    (void)dummy;

    r3 at the time is 0x00047335, which is in the flash section of the image 

    ```

    RAM 0x0000000020013000 0x000000000002d000 xrw
    FLASH 0x0000000000027000 0x000000000007b000 xrw   <----- here 
    CALIB_SHA 0x00000000000fc000 0x0000000000001000 rw
    CALIB 0x00000000000fd000 0x0000000000001000 rw
    *default* 0x0000000000000000 0xffffffffffffffff

    ```

    I added a call to log entry to this function
    static enum qerr spi_config_master(struct qspi *const spi, const struct qspi_config *config)
    {
    nrfx_err_t r = NRFX_ERROR_INVALID_PARAM;
    QLOGD("spi_config_master.");
    but don't see that in the ozone terminal window
    in the disassembly window that function is not present in memory, but the  one following it is
    here are the IRQ priorities as defined in one of the make files
    ./Projects/DW3_QM33_SDK/FreeRTOS/Type2AB_EVB-new/ProjectDefinition/uwb_stack_llhw.cmake: NRFX_CLOCK_CONFIG_IRQ_PRIORITY=6
    ./Projects/DW3_QM33_SDK/FreeRTOS/Type2AB_EVB-new/ProjectDefinition/uwb_stack_llhw.cmake: CLOCK_CONFIG_IRQ_PRIORITY=6
    ./Projects/DW3_QM33_SDK/FreeRTOS/Type2AB_EVB-new/ProjectDefinition/uwb_stack_llhw.cmake: TIMER_DEFAULT_CONFIG_IRQ_PRIORITY=7
    ./Projects/DW3_QM33_SDK/FreeRTOS/Type2AB_EVB-new/ProjectDefinition/uwb_stack_llhw.cmake: NRFX_TIMER_DEFAULT_CONFIG_IRQ_PRIORITY=7
    ./Projects/DW3_QM33_SDK/FreeRTOS/Type2AB_EVB-new/ProjectDefinition/uwb_stack_llhw.cmake: NRFX_RTC_DEFAULT_CONFIG_IRQ_PRIORITY=7
    ./Projects/DW3_QM33_SDK/FreeRTOS/Type2AB_EVB-new/ProjectDefinition/uwb_stack_llhw.cmake: RTC_DEFAULT_CONFIG_IRQ_PRIORITY=7
    ./Projects/DW3_QM33_SDK/FreeRTOS/Type2AB_EVB-new/ProjectDefinition/uwb_stack_llhw.cmake: WDT_CONFIG_IRQ_PRIORITY=7
    ./Projects/DW3_QM33_SDK/FreeRTOS/Type2AB_EVB-new/ProjectDefinition/uwb_stack_llhw.cmake: NRFX_WDT_CONFIG_IRQ_PRIORITY=7
    ./Projects/DW3_QM33_SDK/FreeRTOS/Type2AB_EVB-new/ProjectDefinition/uwb_stack_llhw.cmake: NRFX_USBD_CONFIG_IRQ_PRIORITY=5
    ./Projects/DW3_QM33_SDK/FreeRTOS/Type2AB_EVB-new/ProjectDefinition/uwb_stack_llhw.cmake: USBD_CONFIG_IRQ_PRIORITY=5

    the sdk_config.h only has IRQ priority 6 or 7 as selected 
    none of those appear to conflict with SD

    the code before SD inclusion used timer0, but that was changed to timer1 as timer0 is used by SD. 
    here is the macros at the end of the ld for overflow checking, etc 
    .heap 0x0000000020038480 0x0
    0x0000000020038480 __HeapBase = .
    0x0000000020038480 __end__ = .
    0x0000000020038480 PROVIDE (end = .)
    *(.heap*)
    .heap 0x0000000020038480 0x0 Nordic/libSDK.a(gcc_startup_nrf52840.S.obj)
    0x0000000020038480 __HeapLimit = .

    .stack_dummy 0x0000000020038480 0x4000
    *(.stack*)
    .stack 0x0000000020038480 0x4000 Nordic/libSDK.a(gcc_startup_nrf52840.S.obj)
    0x0000000020040000 __StackTop = (ORIGIN (RAM) + LENGTH (RAM))
    0x000000002003c000 __StackLimit = (__StackTop - SIZEOF (.stack_dummy))
    0x0000000020040000 PROVIDE (__stack = __StackTop)
    0x0000000000000001 ASSERT ((__StackLimit >= __HeapLimit), region RAM overflowed with stack)
    0x0000000000000fb0 DataInitFlashUsed = (__bss_start__ - __data_start__)
    0x0000000000067ba4 CodeFlashUsed = (__etext - ORIGIN (FLASH))
    0x0000000000068b54 TotalFlashUsed = (CodeFlashUsed + DataInitFlashUsed)
    0x0000000000000001 ASSERT ((TotalFlashUsed <= LENGTH (FLASH)), region FLASH overflowed with .data and user data)
    0x0000000000000020 CONFIG_SECURE_PARTITIONS_UWB_L1_CONFIG_SHA256_SIZE = 0x20
    0x0000000000001000 CONFIG_SECURE_PARTITIONS_UWB_L1_CONFIG_SIZE = 0x1000
  • this IS one of the functions I called out before as having code optimization problems however

    this line of code generated  a pointer of 2 for p_cb


        spim_control_block_t * p_cb = &m_cb[p_instance->drv_inst_idx];

    I have to change it like this 

     spim_control_block_t * p_cb = m_cb+p_instance->drv_inst_idx;

    I reported this in Qorvo forums, 

    p_instance->drv_inst_idx = 2

    rfx_err_t nrfx_spim_xfer(nrfx_spim_t const * const p_instance,
    nrfx_spim_xfer_desc_t const * p_xfer_desc,
    uint32_t flags)
    {
    spim_control_block_t * p_cb = &m_cb[p_instance->drv_inst_idx]; <---- this sets the p_cb pointer to 2!!

    change to 
    
    spim_control_block_t * p_cb = m_cb+p_instance->drv_inst_idx;

    https://forum.qorvo.com/t/dw3-qm33-sdk-bug-and-some-code-optimization-questions/24569

    I have fixed all of those m_cb pointers in nrfx_spim.c, but same value in r3 at fault time
    and its repeatable,  across buils, so its not a random value. 

  • Hi Sam, 
    I'm looking at this 

     app_error_fault_handler id=1001, pc=75B0C, info =1

    Error ID=1001 means NRF_FAULT_ID_APP_MEMACC. Here is the description for this error: 

    Application invalid memory access. The info parameter will contain 0x00000000,
    in case of SoftDevice RAM access violation. In case of SoftDevice peripheral
    register violation the info parameter will contain the sub-region number of
    PREGION[0], on whose address range the disallowed write access caused the
    memory access fault.

    The info =  1 so it match with bit number 0 in the table: https://docs.nordicsemi.com/bundle/ps_nrf52840/page/memory.html#topic

    I suspect it either CLOCK control or POWER control. You can try to disable the protection of the subregion to see it's actually the cause of the fault : 


    NRF_MWU->PREGION[0].SUBS &= ~(MWU_PREGION_SUBS_SR0_Include << MWU_PREGION_SUBS_SR0_Pos);
    __DSB(); // barrier to ensure register is set before accessing NVMC or ACL.

    Both CLOCK and POWER is restricted access when the softdevice is active (see 7.1 in the softdevice SDS .pdf file) . You may want to check if you have SOFTDEVICE_PRESENT defined in your preprocessor definitions. 

Related