This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

SPI activity causes ble_advertising_start() to silently fail. SDK 15, soft device s132 6.0.0

I am using a SPI peripheral on the nRF52832.  If the SPI bus is busy when I call

err_code = ble_advertising_start(&m_advertising, BLE_ADV_MODE_FAST);

the routine returns success, but advertising does NOT start.

the peripheral is a SPI flash memory, where I log data.  I have one process (starting from a timer and dontinuing from driver callbacks) that periodically reads sensors and writes logging data to the flash, another that lets a BLE central read the data back from the flash.

I'm developing on SDK 15.0 with soft device S132 v 6.0.0

I'm using SPI2 and easyDMA via the nrf_drv_spi driver.

During startup I do a page erase on the first page of the flash.  This is done with several writes and reads and a state machine.

The first bus write of an operation is done by nrf_drv_spi_transfer(),  From then each additional read or write is launched from the completion callback handler, until the operation is complete.  For erase and write this involves some one-state polling loops, repeatedly reading a byte in the flash chip until a done bit flips.  So the SPI interface runs continuously until the operation is complete.  For page erase, which takes a while, this involves a spin of about five thousand reads.

The erase is LAUNCHED by the logging application, but the operation is still continuing after the service's init() returns.  When the main() eventually calls advertising start, the start routine silently fails as described above.

I can get the advertising to start by calling the start routine again after the erase has completed.  I can also get it to start by starting it before I init the service that does the erase.

But this is not a solution.  The SPI bus will be running - reading flash, writing flash, and erasing pages of flash - when the device is logging data and also when it is reading data back for transfer over BLE.  After a BLE connection disconnects, advertising has to be restarted.  (Also:  When a connection is made, non-connectable advertising has to be started, so the tag remains audible to other centrals than the one connected.) If the SPI bus happens to be busy at that time, the tag will stop advertising - and not restart.  This effectively bricks it, because without advertising it can no longer be heard or connections made to it.

Can you tell me what is going wrong and/or how to work around it?

  • It sounds like there is a priority conflict in which SPI driver is working and the context in which ble_advertising_start() is being called. Can you try to tune the priorities of these two contexts to see if it helps.

    If not, could you help me reproduce this on my desk, so that i can try to understand the execution path of the driver when this error occurs.

  • I found the problem:

    It was another piece of the "fstorage was only debugged with backends for the onboard flash" issue.

    • If writes or erases are going on ON THE INTERNAL FLASH, it might delay program execution and blow the timing of the BLE stack.
    • So ble_advertising_start(), (in components/ble/ble_advertising/ble_advertising.c), via flash_access_in_progress(), calls nrf_fstorage_is_busy(NULL) to check for such activity.
    • If the return says there is flash activity, ble_advertising_start() defers setting up advertising.  Instead it sets p_advertising->advertising_start_pending (to remember to start it later) and returns NRF_SUCCESS.  (This is a claim that it WILL be set up later.)
    • ble_advertising.c is a system event observer.  When an internal flash operation ends, its ble_advertising_on_sys_evt() gets a NRF_EVT_FLASH_OPERATION_SUCCESS or NRF_EVT_FLASH_OPERATION_ERROR event.  If advertising_start_pending is set, it clears it and retries the ble_advertising_start() call.
    • But nrf_fstorage_is_busy() is in the fstorage FRONTend.  A NULL argument asks "Is there ANY flash activity in progress?"  So ...is_busy() polls all the fstorage instances, regardless of backend, and returns true if ANY of them are initialized and busy.
    • So if the EXTERNAL flash (supported by my new nrf_storage_spi backend) happened to be busy, ble_advertising_start() would defer staring the advertising until the INTERNAL flash reported completion.  But the internal flash didn't have anything going on, so it waited forever.
    • (And if I'd hacked it by starting advertising before doing the erase, it would eventually hang later, when a connection ended, calling ble_advertising_start() to resume advertising, when the external flash again happened to be busy.)
    • This effectively "bricks" the tag, because without advertisements it can't be seen or connected to.  Physical contact is needed to straighten it out.

    The problem is that nrf_fstorage_is_busy(NULL) asks "Is ANY flash activity going on?". For this function we need to ask "Is any activity going on on the INTERNAL flash?".  The external flash doesn't interfere with program or radio operation, so advertising needn't be held off if it's active.

    The clients of nrf_fstorage_is_busy() in the SDK are:

    • flash_fstorage/main.c: wait_for_flash_ready()
    • ble_advertising.c: ble_advertising_start()
    • scan_start() in eight /ble_central{,_and_peripheral/} applications
    • bootloader/dfu/nrf_dfu_req_handler.c: on_data_obj_execute_request_sched()

    Flash_fstorage is waiting for a particular flash instance to complete, while scan_start(), like ble_advertising_start(), is avoiding BLE stack response time failures due to running directly from a busy INTERNAL flash.  But the bootloader is waiting for all flash writes to be completed, internal or external, before killing the current application (or itself, writing a fresh download to flash) and launching a new one.  So we need BOTH the "all internal flash instances are idle" and "all flash instances of any type are idle" semantics.

    =====

    If you want to be compatible with my code if/when you add a non-internal-flash backend, here's what I did:

    1) Added a new magic-number argument to nrf_fstorage_is_busy():

    #define NRF_FSTORAGE_ALL_ONBOARD ((nrf_fstorage_t *) sizeof(uint32_t))

    (Note that this is the magic address 4, where an fstorage instance can't exist.)

    The new argument has the "any INTERNAL flash instance is busy" semantics, so the bootloader is unchanges but ble_advertising_start() and scan_start() need to be modified.

    2) Added to the end of nrf_fstorage_info_t the additional member:

    bool is_onboard; //!< The device is the onboard flash memory.

    (This needs to be set during initialization by each backend and tested by nrf_fstorage_is_busy().)

    3) To make the code safe for effective multithreading from different interrupt routines:  In ble_advertising_start() wrapped a CRITICAL_REGION around the call to flash_access_in_progress() and setting of p_advertising->advertising_start_pending.  (You can't return from inside the CRITICAL_REGION, so I also passed out a logical variable (DIFFERENT from advertising_start_pending:  I called it "defer") to trigger the return of NRF_SUCCESS once the CRITICAL_REGION has been exited.  Doing this avoids a race with the system event callback if a flash operation completes at the perfectly wrong moment when ble_advertising_start() had been called from a lower priority.

  • "bool is_onboard; //!< The device is the onboard flash memory."

    "onboard" may be a misnomer.  If QSPI device can map the flash in for read and a cache miss would delay execution while more data is brought in over the device bus (long enough to foul things like ble_advertising_start(), especially if a write or erase was already in progress), a QSPI backend that supported this feature would also have to set it to "true"

  • Michael, sorry for the delayed response, I was away being sick.

    I think your point of looking at the fstorage module to make it thread safe should be something we should officially support. As of now, the fstorage does not claim that it is thread safe. We should have atleast add a note on this, if it is not already there.

    Thanks for your insights, The last 3 points are very valid, I will pass this on to the team.

  • Thanks.

    Please be sure to let them not that it's not just thread safety I'm after, but also a supported backend for external flash via SPI.

    That includes (especially) a SPIM backend (for the nRF82132).  That can't map the read into internal address space, so the read() interface needs to be augmented to include a callback.

    (Of course a mapping backend for the nRF82140 and others with the necessary QSPI peripheral would also be good to support.  But you really need SPIM for chips that don't have QSPI)

Related