Beware that this post is related to an SDK in maintenance mode
More Info: Consider nRF Connect SDK for new designs
This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Inconsistency in peer_manager

Hello,

I have a small test BLE peripheral application with a few functions I can invoke over a UART dialog. With one function, I can print the first peer in the peer manager, by calling pm_next_peer_id_get() and then retrieve the details of the bond using pm_peer_data_bonding_load(). With an other function, I can invoke pm_peers_delete(). With a third function, I can set previously stored bonding data, using pm_peer_new().

My test procedure goes like this:

- Bond the device to a GATT client

- store that bonding information to somewhere

- Delete the bond information

- Reset the device

- add previously stored bonding information, using pm_peer_new()

- Print existing bonding information (pm_next_peer_id_get() / pm_peer_data_bonding_load())

In the last step, pm_next_peer_id_get() returns a peer ID, with is not PM_PEER_ID_INVALID. However, pm_peer_data_bonding_load() returns NRF_ERROR_NOT_FOUND. I know, that the bonding data is not available shortly after it was inserted into the database. But I can wait for minutes between the call to pm_peer_new() and pm_peer_data_bonding_load() and this delay does not solve the inconsistency.

What could be the reason for the inconsistency?

best regards,

Torsten

I haven't found any examples of the usage of pm_peer_new() in the SDK examples. SDK version is 16.0.0, SD is S113. All return values to SDK function calls are systematically checked.

Parents
  • Hello,

    If you look at the implementation of pm_next_peer_id_get() it will return the peer ID that you put into that call if it can't find any existing peers. What do you feed into pm_next_peer_id_get()?

    If the last thing you did was to delete all peers (bonding information), then you wouldn't have any peers, right?

    Or do you expect to find the peer that you just stored?

    If so, then there are a couple of things that may have happened:

    1: You do the waiting in an interrupt, not allowing the FDS to actually store the data. You must release the interrupt and then look for it later.

    2: If you are not waiting within an interrupt, then perhaps you are not able to store the peer information because you are in a connected state with a too short connection interval. If so, then the softdevice will not be able to schedule the FDS writes. To check if this is the issue, you can try to store the peer data, then disconnect and wait a few ms before you try to read it back out.

    BR,

    Edvin

  • Hi Edvin,

    thank you very much for the fast reply! 

    - I feed `PM_PEER_ID_INVALID` to pm_next_peer_id_get():

        const pm_peer_id_t peer_id = pm_next_peer_id_get(PM_PEER_ID_INVALID);
    
        if (peer_id != PM_PEER_ID_INVALID) {
            pm_peer_data_bonding_t data;
            CHECK_ERROR(pm_peer_data_bonding_load(peer_id, &data));
    
            output.add("\r\nOK %d ", int(peer_id));
            print_bond_information_impl(data, output);
        } else {
            output.add("\r\nERR No bond found\r\n");
        }
    

    pm_next_peer_id_get() returns something that is not equal to `PM_PEER_ID_INVALID` (I think it was 0) but the following call to `pm_peer_data_bonding_load()` fails then (even after waiting for hour after adding the bonding data, using `pm_peer_new()`).

    - The third step in my test procedure (out of 6) is to delete the existing bonds. Yes, I've expect to see the bonding information that I've stored somewhere else (put aside) before deleting all bonds and then put back in the 5th step of the procedure above, using `pm_peer_new()`.

    The test application is totally asynchronous, no busy waiting anywhere and it doesn't matter how long I wait after calling `pm_peer_new()`, there is this inconsistency between the call to `pm_next_peer_id_get()` and the `PM_PEER_ID_INVALID` result of the following call to `pm_peer_data_bonding_load()`.

    best regards,

    Torsten

  • Edvin said:
    2: If you are not waiting within an interrupt, then perhaps you are not able to store the peer information because you are in a connected state with a too short connection interval. If so, then the softdevice will not be able to schedule the FDS writes. To check if this is the issue, you can try to store the peer data, then disconnect and wait a few ms before you try to read it back out.

     Did you check this?

     

    Torsten Robitzki said:
    The third step in my test procedure (out of 6) is to delete the existing bonds. Yes, I've expect to see the bonding information that I've stored somewhere else (put aside) before deleting all bonds and then put back in the 5th step of the procedure above, using `pm_peer_new()`.

     Perhaps you are not storing it properly, so that something goes wrong during pm_peer_new()?

     

    Torsten Robitzki said:
    before deleting all bonds and then put back in the 5th step of the procedure above, using `pm_peer_new()`.

     How exactly do you do this? are you sure the peer data is stored properly? if pm_peer_data_bonding_load() returns 5 it usually means that there is no peer stored with that peer_id. Have you tried to debug into pm_peer_data_bonding_load() and see why it returns 5?

    BR,

    Edvin

Reply
  • Edvin said:
    2: If you are not waiting within an interrupt, then perhaps you are not able to store the peer information because you are in a connected state with a too short connection interval. If so, then the softdevice will not be able to schedule the FDS writes. To check if this is the issue, you can try to store the peer data, then disconnect and wait a few ms before you try to read it back out.

     Did you check this?

     

    Torsten Robitzki said:
    The third step in my test procedure (out of 6) is to delete the existing bonds. Yes, I've expect to see the bonding information that I've stored somewhere else (put aside) before deleting all bonds and then put back in the 5th step of the procedure above, using `pm_peer_new()`.

     Perhaps you are not storing it properly, so that something goes wrong during pm_peer_new()?

     

    Torsten Robitzki said:
    before deleting all bonds and then put back in the 5th step of the procedure above, using `pm_peer_new()`.

     How exactly do you do this? are you sure the peer data is stored properly? if pm_peer_data_bonding_load() returns 5 it usually means that there is no peer stored with that peer_id. Have you tried to debug into pm_peer_data_bonding_load() and see why it returns 5?

    BR,

    Edvin

Children
  • Hello Edvin,

    yes, I’ve checked that I’m not waiting busy anywhere. The test application receives commands asynchronously though an UART. So „waiting" mean, I sit in front of the terminal and wait until I send the command to execute a next test step.

    I’ve debuged this a little bit more in detail. `pm_peer_new` allocates a new ID by calling `pds_peer_id_allocate()` and then queues the write operations (fds_record_write() -> write_enqueue() …). I’ve added log statements to the fds.c module, the write operations goes successfully through the states FDS_OP_WRITE_HEADER_BEGIN and FDS_OP_WRITE_RECORD_ID, without causing any error. The Next step FDS_OP_WRITE_DATA is never executed. Instead, I see an PM_EVT_PEER_DATA_UPDATE_FAILED event being logged. The error code is: 0x8600, which seems to be FDS_ERR_OPERATION_TIMEOUT.

    Searching through the code base for occurrences of FDS_ERR_OPERATION_TIMEOUT, shows that several functions return that values. I’ve added an `assert(false)` to all of them to see which code path caused that error. Turns out, this internal error is caused by a soft device event NRF_EVT_FLASH_OPERATION_ERROR. Is there anything I can do to inspect what causes this event?

    I also checked, that the p_bonding_data passed to `pm_peer_new()` does not point to a stack allocated object (which might cause problems, when I got the internal workings of the peer manager right).

    Directly after the attempt to store bonding information (using pm_peer_new()), the test application starts advertising. A comment in peer_manager_handler.c regarding PM_EVT_PEER_DATA_UPDATE_FAILED says: "This can happen if the SoftDevice is too busy with BLE operations.“. Could this be a problem?

    I did a quick test and it seems that this internal error does not occur, when I wait after adding the bonding data before starting advertising???!!! This looks very weird. Is this expected behavior?

    best regards,

    Torsten

  • Torsten Robitzki said:
    yes, I’ve checked that I’m not waiting busy anywhere. The test application receives commands asynchronously though an UART. So „waiting" mean, I sit in front of the terminal and wait until I send the command to execute a next test step.

     I am not sure you read the text I quoted now.

    "If you are not waiting within an interrupt, then perhaps you are not able to store the peer information because you are in a connected state with a too short connection interval. If so, then the softdevice will not be able to schedule the FDS writes. To check if this is the issue, you can try to store the peer data, then disconnect and wait a few ms before you try to read it back out."

    What is the connection interval you are using? If you are connected to a phone, then you can't really control this, since it is always the central that decides the connection interval. If you are connected with a very short connection interval, you can wait as long as you like, but the softdevice doesn't have time to perform any flash operations. Perhaps you can try to disconnect the phones BLE.

     

    Torsten Robitzki said:
    Directly after the attempt to store bonding information (using pm_peer_new()), the test application starts advertising. A comment in peer_manager_handler.c regarding PM_EVT_PEER_DATA_UPDATE_FAILED says: "This can happen if the SoftDevice is too busy with BLE operations.“. Could this be a problem?

     Yes, this is exactly what I meant by the quoted test. In fact, a quite common thing to do is to store the bonding information in the disconnect event, because at this point, you will not be busy using the softdevice. Perhaps you can try to call pm_peer_new() from the disconnected event before you start advertising again?

     

    Torsten Robitzki said:
    I did a quick test and it seems that this internal error does not occur, when I wait after adding the bonding data before starting advertising???!!! This looks very weird. Is this expected behavior?

     Yes.

    Best regards,

    Edvin

  • Hi Edvin,

    yes I read everything you wrote very carefully. I'm well aware that you and your colleagues have plenty of requests to answer and thus I try to investigate as much as possible before asking here.

    The internal error occurs without any connection being established. The test application calls `pm_peer_new()` and then starts advertising and _no_ central connects to the peripheral in this scenario.

    We started our application by using the peer_manager (who stores the bond information in flash) and are now moving to store the bonding informations somewhere else (of the chip). So, the startup sequence is to retrieve the bonding information from somewhere else and update the bonding information (if necessary) and then start advertising.

    We've decided to kick out the bond manager and to implement the bonding / pairing directly on top of the softdevice.

    Thank you very much for your patient!

    Best regards,

    Torsten

Related