Problem upgrading from SDK v15.3.0 to v15.0.0

I’m working on a product that has been in the field and working for many years with nRF5 SDK v15.0.0 / softdevice 6.0.0. Now we want to add extended BLE advertisements and need to upgrade the SDK and softdevice. I have attemped to upgrade to v15.3.0 / SD 6.1.1, but when I do the firmware no longer runs.

If I step through the code with a debugger, it hangs after enabling the high-frequency clock (i.e. after calling hfclk_start()). If I break the code in the debugger, the PC is always at the clock IRQ handler (0x75A, from vector at 0x40). If I comment out the call to hfclk_start(), then I can continue to step through the code though of course things don’t run properly because I don’t have a clock. All of this same code works with SDK v15.3.0/SD 6.0.0.

I spent a good deal of time comparing changes to the sdk_config.h file examples from v15.0.0 to v15.3.0. The two changes I made were I had to change CLOCK_CONFIG_ENABLE to NRF_CLOCK_CONFIG_ENABLE and I changed the IRQ interrupt priorities from 7 to 6 to match the sample sdk_config.h changes. I also spent a good deal of time stepping through and comparing the nrfx_clock.c code that gets compiled for v15.0.0 and v15.3.0 and am convinced there are no differences once if added the NRF_CLOCK_CONFIG_ENABLE. I’ve kind of run out of ideas and looking for any help.

Thanks.

Parents
  • Hi Elinar,

    Yes, it is hfclk_start() in nrf_drv_clock.c.

    The program runs in several different modes. In one mode it uses an older proprietary protocol similar to shockburst and leaves the softdevice disabled. In other modes it enables the softdevice and uses BLE. The mode can be switched via commands over a SPI interface. When the program starts up out of reset, it starts with the softdevice disabled. With SDK v15.3.0 the program hangs before the softdevice has been enabled.

    The code did not change while porting from v15.0 to v15.3 other than necessary sdk_config.h changes. (i.e. changing CLOCK_ENABLE to NRF_CLOCK_ENABLE) and changing IRQ priorities from 7 to 6.

    As for changes to the SDK, that is a very good question. I didn't write the code and it had not occurred to me to check if the SDK had been modified, but turns out the answer is yes! There are a couple of minor changes. I went ahead and ported those changes to SDK v15.3 but it did not fix the problem. More details on that at the end of this post.

    Here is our main() function:

    int main()
    {
        ret_code_t err_code;
    
        // keep the hf/lf clocks running even if the softdevice is disabled
        nrf_drv_clock_init();
        nrf_drv_clock_lfclk_request(NULL);
        nrf_drv_clock_hfclk_request(NULL);
    
        err_code = app_timer_init();
        APP_ERROR_CHECK(err_code);
    
        s_init_task_handle = xTaskCreateStatic(init_task,
                                               "INIT",
                                               INIT_TASK_STACK_SIZE_WORDS,
                                               NULL,
                                               1,
                                               s_init_task_stack,
                                               &s_init_task_internal);
        if (s_init_task_handle == NULL) {
            APP_ERROR_HANDLER(NRF_ERROR_NO_MEM);
        }
    
        vTaskStartScheduler();
    
        while (true) {
            // something went horribly wrong
            APP_ERROR_HANDLER(NRF_ERROR_FORBIDDEN);
        }
    
        return -1;
    }
    

    The program hangs after calling hfclk_start() inside of nrf_drv_clock_hfclk_request(NULL). So you can see that the program does not get very far at all.

    As for the modifications that had been made to v15.0, there were two that I duplicated in v15.3. They were:

    1. NRF_BLE_FREERTOS_SDH_TASK_STACK was increased from 256 to 512 in file components/softdevice/common/nrf_sdh_freertos.c

    2. Line 238 of components/softdevice/common/nrf_sdh.c was changed from:

        m_nrf_sdh_enabled = (ret_code == NRF_SUCCESS);

    to

    m_nrf_sdh_enabled = (ret_code == NRF_SUCCESS || ret_code == NRF_ERROR_INVALID_STATE);

    As best I can tell, none of this code is used prior to the program hanging, so I am not surprised that adding those modifications did not fix the problem.

  • Hi,

    I agree that the change in nrf_sdh.c does not look relevant. However, I wonder if you have checked where in the ISR the program hangs? Also, how do you determine that that is where the issue is? (For instance with reading out the program counter and checkign with addr2line or similar)? Is it in a fault handler, so that you can backtrack from there? Or somewhere else? Could it be a FreeRTOS API call called from an ISR that is not handled properly? I am unable to think of much else, so I suspect more debugging and details of the failure is needed.

  • Ok, I spent some time more carefully tracing through what is happening with a debugger. First, I will explain what happens with SDK v15.0/SD6.0.0 where everything works...

    When the POWER_CLOCK_IRQ interrupt fires, the vector table at 0 points it to an IRQ handler in the soft device. This IRQ handler grabs the value at 0x20000000 (start of RAM) which appears to be a second vector table and uses this table to branch to a new address.

    When I run the code with SDK v15.0/SD6.0.0, the value at 0x20000000 is 0x1000 which is a second vector table inside of the soft device and everything works properly.

    When I run the code with SDK v15.3/SD6.1.1, the value at 0x20000000 is 0. Thus, when the IRQ handler grabs this value and uses it as a second vector table, it ends up jumping back to itself and the code is stuck in a small endless loop.

    Now this part I am not 100% certain of but best I can tell is that soft device 6.1.1 is initializing 0x20000000 to 0 rather than just leaving it uninitialized. I have determined this by running experiments where I fill the RAM with FFs and also by stepping through soft device assembly from a reset.

    So, the question is why is 0x20000000 improperly set to 0 with SD6.1.1, but 0x1000 with SD6.0.0? Is there some new config file value I need to set, or has the initialization code been changed in some way that makes our initialization order not work with the new soft device?  Or some other cause?

Reply
  • Ok, I spent some time more carefully tracing through what is happening with a debugger. First, I will explain what happens with SDK v15.0/SD6.0.0 where everything works...

    When the POWER_CLOCK_IRQ interrupt fires, the vector table at 0 points it to an IRQ handler in the soft device. This IRQ handler grabs the value at 0x20000000 (start of RAM) which appears to be a second vector table and uses this table to branch to a new address.

    When I run the code with SDK v15.0/SD6.0.0, the value at 0x20000000 is 0x1000 which is a second vector table inside of the soft device and everything works properly.

    When I run the code with SDK v15.3/SD6.1.1, the value at 0x20000000 is 0. Thus, when the IRQ handler grabs this value and uses it as a second vector table, it ends up jumping back to itself and the code is stuck in a small endless loop.

    Now this part I am not 100% certain of but best I can tell is that soft device 6.1.1 is initializing 0x20000000 to 0 rather than just leaving it uninitialized. I have determined this by running experiments where I fill the RAM with FFs and also by stepping through soft device assembly from a reset.

    So, the question is why is 0x20000000 improperly set to 0 with SD6.1.1, but 0x1000 with SD6.0.0? Is there some new config file value I need to set, or has the initialization code been changed in some way that makes our initialization order not work with the new soft device?  Or some other cause?

Children
No Data
Related