This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Bonding with FDS and DFU sometimes fail

Hi

I am using nRF52DK, SDK 15.0, SD 6.0.0, multiperipheral based example with secure bootloader, FDS (7 FDS_VIRTUAL_PAGES). Uploading new software works (not tested thorougly).

Project building with SES, then flashing with a script:

SET family= NRF52
SET application= C:\repo\pikflometr\multiperipheral\pca10040\s132\ses\Output\Release\Exe\ble_app_multiperipheral_pca10040_s132.hex
SET application-version= 1
SET soft_device= C:\repo\pikflometr\lib\wszystkoCzegoPotrzebujesz\s132_nrf52_6.0.0_softdevice.hex
SET bootloader= C:\repo\pikflometr\secure_bootloader\pca10040_ble\ses\Output\Release\Exe\secure_bootloader_ble_s132_pca10040.hex
SET bootloader-version= 1
SET bl-settings-version= 1
SET settings= settings_1.hex
SET dfu_client= dfu_client_1.hex
SET nrfutil_location= C:\repo\pikflometr\lib
cd %nrfutil_location%
nrfjprog.exe --family %family% --program %soft_device% --chiperase --verify
nrfutil settings generate --family %family% --application %application% --application-version %application-version% --bootloader-version %bootloader-version% --bl-settings-version %bl-settings-version% %settings%
mergehex -m %application% %settings% -o %dfu_client%
nrfjprog.exe --family %family% --program %dfu_client% --verify
nrfjprog.exe --reset --family %family% --program %bootloader% --verify

then F5 to begin debugging with SES. 

Sometimes (maybe 15% of times) when I try to pair (using master Androd device) I get:

<error> app: ERROR 2 [NRF_ERROR_SOFTDEVICE_NOT_ENABLED] at C:\repo\pikflometr\multiperipheral\main.c:1808 PC at: 0x00033AA7

which is from PM_EVT_ERROR_UNEXPECTED case

static void pm_evt_handler(pm_evt_t const * p_evt)
{
    ret_code_t err_code;
    switch(p_evt->evt_id)
    {

        case PM_EVT_ERROR_UNEXPECTED:
        NRF_LOG_RAW_INFO("PM_EVT_ERROR_UNEXPECTED \n");
            // Assert.
            APP_ERROR_CHECK(p_evt->params.error_unexpected.error);
            
...}

It tells me that NRF_ERROR_SOFTDEVICE_NOT_ENABLED  which is silly because I have already connected with an advertising device, so BLE stack is up and running.

After some digging I have found out that peer_data_delete_process() calls send_unexpected_error()

  

So it looks like sometimes the FDS module does not initialize.

I added some logs and here is what I got after booting:

FAILING CASE:

<info> app: __________________start
<info> app: ~~ ZMIANA NAZWY (na v1) W: gap_params_init
<info> app: Setting vector table to bootloader: 0x00078000
<info> app: Setting vector table to main app: 0x00026000
<info> app: zostaje domyslny adv_name
<error> app: bzdurne wartosci ts:0, cnt:0
<info> app: nie ma w ogole pomiarow offline, bo secret_word to: 26DB1 wobec 0xAAAAAAAA
<info> app: epoch 0
<info> app: battery_meas handler rusza
<info> app: reset_reason: 0x00000004.

WORKING CASE:

<info> app: __________________start
<info> app: ~~ ZMIANA NAZWY (na v1) W: gap_params_init
<info> app: Setting vector table to bootloader: 0x00078000
<info> app: Setting vector table to main app: 0x00026000
.......fds..FDS_EVT_INIT
.......fds..FDS_EVT_WRITE
.......fds..FDS_EVT_UPDATE
.......fds..FDS_EVT_DEL_RECORD
<info> app: zostaje domyslny adv_name
<error> app: bzdurne wartosci ts:0, cnt:0
<info> app: nie ma w ogole pomiarow offline, bo secret_word to: 26DB1 wobec 0xAAAAAAAA
<info> app: epoch 0
<info> app: battery_meas handler rusza
<info> app: reset_reason: 0x00000001.

So all the difference is the FDS event logs don't appear occassionaly.

The logs that makes difference between failing and working case are in peer_data_storage.c:

static void fds_evt_handler(fds_evt_t const * const p_fds_evt)
{
    pm_evt_t pds_evt =
    {
        .peer_id = file_id_to_peer_id(p_fds_evt->write.file_id)
    };

    switch (p_fds_evt->id)
    {   
        case FDS_EVT_INIT:
            NRF_LOG_RAW_INFO(".......fds..FDS_EVT_INIT\n");
        case FDS_EVT_WRITE:
            NRF_LOG_RAW_INFO(".......fds..FDS_EVT_WRITE\n");
        case FDS_EVT_UPDATE:
            NRF_LOG_RAW_INFO(".......fds..FDS_EVT_UPDATE\n");
        case FDS_EVT_DEL_RECORD:
            NRF_LOG_RAW_INFO(".......fds..FDS_EVT_DEL_RECORD\n");
            
.....}

---

Furthermore I have added those logs to observe the state of flash memory:

     NRF_LOG_INFO("pages_available %d open_records %d valid_records %d",
                   stat.pages_available,stat.open_records,stat.valid_records);            
     NRF_LOG_INFO("dirty_records %d words_reserved %d words_used %d",
                  stat.dirty_records,stat.words_reserved,stat.words_used);            
     NRF_LOG_INFO("largest_contig %d freeable_words %d corruption %d",
                  stat.largest_contig,stat.freeable_words,stat.corruption); 

And here is the output (called after booting before trying to pair):

FAILING CASE:

<info> app: pages_available 0 open_records 0 valid_records 0
<info> app: dirty_records 0 words_reserved 0 words_used 0
<info> app: largest_contig 0 freeable_words 0 corruption 0

WORKING CASE:

<info> app: pages_available 7 open_records 0 valid_records 0
<info> app: dirty_records 0 words_reserved 0 words_used 12
<info> app: largest_contig 1022 freeable_words 0 corruption 0

What more info do you need to guide me to solve this?

Thanks

Parents
  • Hi,

    I agree that it looks to be a problem with the FDS initialization. Have you tried to reserve the 7 flash pages in the bootloader so the can't get overwritten during DFU(Memory layout)? FDS may fail to free the pages aftewards if not. 

  • Hi

    Thank you for response.

    If you are talking about DFU_APP_DATA_RESERVED then yes, I have reserved it.

    I shouldn't have any negative effect but I changed if from 7 to 8 -both on app side and bootloader side. Now there is 

        #define FDS_VIRTUAL_PAGES 8

    in application project

    and

        #define DFU_APP_DATA_RESERVED               (CODE_PAGE_SIZE * 8)

    in bootloader project.

    The situation with not-initializing the FDS still happens occassionally, but resetting the board with RESET button does the job and with the next boot-up the FDS works.

    I thought there may be some kind of misconfiguration with the FDS or DFU. Right after (successfull) flashing and booting the fds_stat() shows:

     0> <info> app: pages_available 8 open_records 0 valid_records 0
     0> <info> app: dirty_records 0 words_reserved 0 words_used 14
     0> <info> app: largest_contig 1022 freeable_words 0 corruption 0

    after bonding with one device it shows:

     0> <info> app: pages_available 8 open_records 0 valid_records 3
     0> <info> app: dirty_records 0 words_reserved 0 words_used 52
     0> <info> app: largest_contig 1022 freeable_words 0 corruption 0

    then I prepared .zip package. New app differs with TWO bytes, "v2" instead of "v1" in log content and "board-v2" instead of "board-v1" in #define DEVICE_NAME - just so I know the new software is up and running.

    SET family= NRF52
    SET application= C:\repo\pikflometr\multiperipheral\pca10040\s132\ses\Output\Release\Exe\ble_app_multiperipheral_pca10040_s132.hex
    SET application-version=2
    SET hardware-version= 52
    SET sd-req= 0xA8
    SET private_key= C:\repo\pikflometr\lib\\private_key.pem
    SET nrfutil_location= C:\repo\pikflometr\lib\
    SET zipfile_name= p6-2.zip
    cd %nrfutil_location%
    nrfutil pkg generate --application %application% --application-version %application-version% --hw-version %hardware-version% --sd-req %sd-req% --key-file %private_key% %zipfile_name%

    after bluetooth DFU OTA sometimes I see:

    <info> app: pages_available 8 open_records 0 valid_records 4
    <info> app: dirty_records 3 words_reserved 0 words_used 82
    <info> app: largest_contig 1022 freeable_words 26 corruption 0

    or sometimes

    <info> app: pages_available 8 open_records 0 valid_records 4
    <info> app: dirty_records 2 words_reserved 0 words_used 78
    <info> app: largest_contig 1022 freeable_words 22 corruption 0

    Is it desired behaviour that number of words_used is bigger after the update? So do valid_records number?

    I made no writes to the flash memory. Just the:

    programming -> fds_stat() -> bonding -> fds_stat() ->  update -> fds_stat()

  • Hi,

    Ok, thanks for confirming. I think it's likely that the Softdevice fails to schedule the flash operations due to BLE advertising and connections (Flash memory API). Maybe you have a re-connect immediately on startup that triggers this scenario? 

    Do you start advertising directly or do you use ble_advertising_start() which should delay it until there are no pending flash operations? 

    uint32_t ble_advertising_start(ble_advertising_t * const p_advertising,
                                   ble_adv_mode_t            advertising_mode)
    {
        uint32_t ret;
    
        if (p_advertising->initialized == false)
        {
            return NRF_ERROR_INVALID_STATE;
        }
    
        p_advertising->adv_mode_current = advertising_mode;
    
        // Delay starting advertising until the flash operations are complete.
        if (flash_access_in_progress())
        {
            p_advertising->advertising_start_pending = true;
            return NRF_SUCCESS;
        }

     

    Pawel Niedbala said:
    I haven't tried using stack guard, is it really that helpful? Also for SDK15 it is nrf_mpu_init() not nrf_mpu_lib_init().

     It can help you detect stack overflows. Probably not necessary if you've already profiled the stack usage or know that you have plenty of space left.  

  • Indeed, I start advertising directly, advertising_start() which calls sd_ble_gap_adv_start(...). I do not even use advertising module. Also sd_ble_gap_connect(...) is not being called at all in my application.

    If your guess is correct the return code does not explain the reason properly and the error caught by APP_ERROR_CHECK leads to a misunderstanding.

  • Also sd_ble_gap_connect(...) is not being called at all in my application

    Yes, but assume you're starting connectable advertising so a nearby central could chose to send a  connect request? I'd recommend to postpone advertising until pending flash operations are complete like the adv. module. 

    If your guess is correct the return code does not explain the reason properly and the error caught by

    Return code from which function? A NRF_SUCCESS from the  sd_flash APIs only means the task got accepted into the flash queue. The result of the operation will be reported to the fds callback. 

  • Oh yes of course, it is connectable advertising. And it advertises until given number of masters are connected (multiperipheral). 

    You suggest postponing starting advertising so flash initialization can be performed completely because it is essential. But what about potential collision between normal periodical advertising packet and occassional flash write? High probability of flash operation success means that there is a tiny risk of FDS operation failure we should accept?

    By return code I meant NRF_ERROR_SOFTDEVICE_NOT_ENABLED when trying to pair.

  • Ok, so that may explain why it's not failing consistently. See "SoftDevice timing-activities and priorities" for details on how the scheduler priorities the different events/tasks. 

    Pawel Niedbala said:
    High probability of flash operation success means that there is a tiny risk of FDS operation failure we should accept?

     The app may retry the flash operation later. Writing small chunks of data should not be a problem, but flash erase (garbage collection) is harder to schedule.

    Pawel Niedbala said:
    By return code I meant NRF_ERROR_SOFTDEVICE_NOT_ENABLED when trying to pair.

     It's definitely misleading. That is why the FDS errors now use a different number range. 

Reply
  • Ok, so that may explain why it's not failing consistently. See "SoftDevice timing-activities and priorities" for details on how the scheduler priorities the different events/tasks. 

    Pawel Niedbala said:
    High probability of flash operation success means that there is a tiny risk of FDS operation failure we should accept?

     The app may retry the flash operation later. Writing small chunks of data should not be a problem, but flash erase (garbage collection) is harder to schedule.

    Pawel Niedbala said:
    By return code I meant NRF_ERROR_SOFTDEVICE_NOT_ENABLED when trying to pair.

     It's definitely misleading. That is why the FDS errors now use a different number range. 

Children
No Data
Related