This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Urgently!! About flash access

Hi, I use s110_6.0.0 and sdk_5.2.1 in my old product. I found that I can't write flash sometimes. Timer1 and Timer2 are used, and their interrupt priority level is 1(high_priority). If I change the priority to 3(low-level), it runs well. I think interrupts of the timers disturbed the flash operation. But I don't want to change the interrupt priority of the timers, for it may result in some other problems. So are there any solutions to fix the problem? The problem is urgent. Give me some suggestion Please. Thank you!

Parents
  • Hi mansfield

    The timer frequency is rather high and I suspect this is a CPU utilization problem. The flash operation most likely can not be scheduled with such high frequency Timer interrupts. Ongoing flash operation blocks the CPU.

    The softdevice calls sd_flash_page_erase, sd_flash_protect and sd_flash_write are handled with ARM priority 2. When you have the timer interrupt priority high (arm priority 1) it has higher priority than the flash operation. When you have the timer interrupt priority low (arm priority 3) it has lower priority than the flash operation, which means a timer interrupt can't preempt a flash operation. The execution of the timer interrupt handler is therefore delayed until a flash operation completes. You could verify if this is the case by connecting a logical analyzer to a GPIO pin and toggle the GPIO pin every time the timer interrupt handler starts executing. If you see that it pauses for some time, that is the pstorage operation at work.

    Update 18.5.2016 The conclusion is that there is a bug in pstorage library in SDK 8.1.0 and earlier SDKs that causes a race condition between the thread that calls the pstorage operation and the pstorage callback. The result is that occasionally the pstorage callback handler is not called by the pstorage module. This is evident when the pstorage calling thread is in the main context and CPU load is high and/or interrupt activity is high. Large BLE activity creates both high CPU and interrupt activity and can cause the bug to appear.

    A workaround is to simply use pstorage module from nRF51 SDK 9.0.0 or more recent. You dont necessarily need to update your whole project to SDK 9.0.0+, you can just replace the pstorage module (pstorage.h, pstorage.c and pstorage_platform.h files) with pstorage module from SDK 9.0.0+. The pstorage module was refactored in SDK 9.0.0, which apparently removed the bug mentioned above. Pstorage is pretty much isolated module. It is not depending on other modules, so migrating only the pstorage module to SDK 9.0.0+ should not be risky. As far as I see, it is only interacting with the two softdevice API flash functions, sd_flash_page_erase and sd_flash_write.

    Note: pstorage_platform.h is pstorage configuration file normally located in your project. If you have configured the pstorage_platform.h file in your project you must configure the new pstorage_platform.h from SDK 9.0.0+ accordingly.

    Bug details:

    Occasionally, callback was not received from pstorage module, making the pstorage module hang. I found out that the callback was actually received by the pstorage_sys_event_handler in pstorage.c, but still the cb_handler was never called in flash.c. That is because occasionally there was no flash access registered when he callback arrived, i.e.

    m_cmd_queue.flash_access == false
    

    making the callback handler think there is no ongoing flash operation, so the handler did nothing. The fault is that in cmd_process function in pstorage.c, which processes pstorage task in the pstorage queue, calls the relevant flash operation (e.g. sd_flash_write) and then sets the flash access flag at the bottom of the function. I suspect this causes a race condition between the cmd_process function to set

     m_cmd_queue.flash_access = true;
    

    before the pstorage_sys_event_handler checks for the same flash access flag, i.e. with

    if (m_cmd_queue.flash_access == true)
    {
        // Process the callback
    }
    

    because if the flash access flag is not true, the pstorage_sys_event_handler will do nothing and the pstorage will hang.

    I suspect this deadlock can only happen when the caller to the pstorage function is in the main context, i.e. with the lowest priority. When the caller has lowest priority, the flash operation callback can preempt the caller and check the flash access flag before the caller can set it. This scenario is more likely to happen when there is increased interrupt activity, i.e. from timers, BLE, or other, as all interrupts can preempt the caller if it runs in the main context.

  • No it is not fixed in 8.x.x. However, there is one more factor and that is the softdevices. For softdevices prior to S110 8.0.0, CPU is blocked during the whole BLE radio event, with S110 8.0.0 and later, CPU can be used by the application during the BLE radio event. This means that S110 8.0.0 provides better CPU utilization than its predecessors, and could make a difference in your case, but it could depend on your BLE connection/advertitising interval. It also depends on if you are doing a flash write only or if you are doing also flash erase or flash update. Flash update is a pstorage operation and includes flash erase operation among other things.

    For details on CPU blocking, see my answer on this thread below the dotted line.

Reply Children
No Data
Related