nRF5 SDK 14.2.0 - Secure Bootloader DFU always fail after adding new variable to a characteristic

Hello! 

I am currently working in an temperature sensor which has bluetooth features based in the nRF52832 and I am currently using the nRF5 SDK 14.2.0. 

While I was working in a new feature, I was looking to add a new variable to a known characteristic, however, since I added a new byte (uint8_t new_var) to the characteristic / service, I have been unable to do secure bootloader DFU's anymore... Is this something related with some memory management issue? 

Something interesting in the code is the following, when older developers detailed which characteristics/services where going to be placed in the microcontroller, they added some documentation to advice future developers (like me) about any changes made in that section:

/**** Service ****/
static const ble_char_config_t k_ble_cw_char_config[] = {

    ////////////////////////////////////////////////////////////////////
    //
    // VERY IMPORTANT NOTE
    //
    // Before you make any changes here, make sure you read and
    // understand this.
    //
    // The order in which add characteristics affects the order in
    // which the soft device allocates handles. Controllers (definitely
    // iOS) may cache these handles. So if you add a characteristic to
    // a service, all characteristics added after it, whether in this
    // service or another, will see their handles go up. This means
    // they will be out of sync with the controller's cache. In development
    // it's a pain, but not fatal because we know how to clear the
    // iOS cache by turning BT off and then on again in settings. But for
    // regular users it will be confusing and might make them think
    // firmware updates broke or bricked their burners.
    //
    // It's not entirely clear how we get around this, but we might
    // have to change the id of the service as a whole when we upgrade,
    // and make sure that the upgrde can only be done from a device with
    // a new enough iOS app to know to use the new service. But that
    // wil break other phones that use the same burner if they are not
    // upgraded.
    //
    // Within the service, let's at least try to keep these in
    // monotonically increasing order of UUID, as they are now.
    //
    // Need to think long and hard about this and have a bulletproof
    // solution before we put burners out in the wild.
    ////////////////////////////////////////////////////////////////////

    /*                  characteristic UUID, READ,    WRITE,      NOTIFY,        INDICATE, REQUEST_ID, SEND_AS_EVENT,                               SIZE,                   MAX_SIZE, WRITE_TYPE,              HANDLE_STATE VARIABLE */
    {        COOKWARE_UUID_CHAR_HANDLE_DATA, READ, NO_WRITE, AUTO_NOTIFY,     NO_INDICATE,         NO,            NO,     sizeof(ble_cw_cookware_data_t),                          0,      SHORT,    COOKWARE_VARIABLE_COOKWARE_DATA },
    {        COOKWARE_UUID_CHAR_HANDLE_HWID, READ, NO_WRITE,   NO_NOTIFY,     NO_INDICATE,         NO,            NO,     sizeof(ble_cw_cookware_hwid_t),                          0,      SHORT,          COOKWARE_VARIABLE_INVALID },
    {             COOKWARE_UUID_CHAR_EVENTS, READ, NO_WRITE,   NO_NOTIFY, DEMAND_INDICATE,        YES,            NO,             sizeof(ble_cw_event_t), BLE_GATT_MAX_PACKET_LENGTH,      SHORT,          COOKWARE_VARIABLE_INVALID },
    {     COOKWARE_UUID_CHAR_GE_HANDLE_DATA, READ, NO_WRITE, AUTO_NOTIFY,     NO_INDICATE,         NO,            NO,     sizeof(ble_ge_cookware_data_t),                          0,      SHORT, COOKWARE_VARIABLE_GE_COOKWARE_DATA },
    {           COOKWARE_UUID_CHAR_COMMANDS, READ,    WRITE,   NO_NOTIFY,     NO_INDICATE,        YES,            NO, sizeof(ble_cookware_instruction_t), BLE_GATT_MAX_PACKET_LENGTH,      SHORT,          COOKWARE_VARIABLE_INVALID },
};



However, since I added that new variable, every attempt to do DFU is failing, even by deleting bond information + restarting the Bluetooth module in the phone, it's always showing me these logs in the nrf connect app (android and iOS).



Things I've tried so far without any luck:
- Remove the variable previously added to recover the state of where the device was, even removing the variable from the code, I still I am unable to do DFU's
- Trying to flash the microcontroller directly, deleting all memory in the microcontroller before flashing it again with new code (which does not have the variable I was looking to add at first) and try DFU again, DFU is still failing. 


UPDATE:
- By flashing manually to a version without the variable I wanted to add in the 1st place, I was able to get the DFU working. So far this is the only way to make it go back to an stable version that is able to work with DFU. 

So, my question really is: 
- How does this SDK version works in terms of how is it possible to add safely new characteristic data without harming any code in the process for DFU? Is it possible to do a safe DFU back without needing to do a manual flash over USB? This samples will be sealed and will not be able to have physical access to the HW.

Parents
  • Hi fabocode,

    Could you please clarify in detailed what you did by "adding a variable?"

    Are you changing the maximum size of a Characteristic Value? Or are you adding a new Characteristic?

    I also feel there are some conflicts in the information you are providing.

    - Remove the variable previously added to recover the state of where the device was, even removing the variable from the code, I still I am unable to do DFU's
    - Trying to flash the microcontroller directly, deleting all memory in the microcontroller before flashing it again with new code (which does not have the variable I was looking to add at first) and try DFU again, DFU is still failing. 

    Do you mean you remove it from the image you are downloading?
    In this case, is the image on the device still having your modification?

    The issue is most likely due to the firmware running the download, so the thing on the device, not the image being downloaded at all. Changing the image being download doesn't help.

    - By flashing manually to a version without the variable I wanted to add in the 1st place, I was able to get the DFU working. So far this is the only way to make it go back to an stable version that is able to work with DFU. 

    And thus this observation makes sense.

    Hieu

  • Hi  

    I would like to add more details about the problem I have. 

    I have a struct which is part of one of the characteristics is currently running in the microcontroller.

    typedef struct __attribute__ ((__packed__)) {
        software_revision_t  version;   // Device firmware version
        uint16_t  product_id;           // Device Product ID
        uint8_t   battery_v;            // Battery voltage in 0.01 V increments
        uint16_t  temp_degF;            // Sensor reading in 0.1 F increments
        int8_t    cjt_degC;             // Cold-Junction Temperature (thermocouple ADC temp offset)
        uint8_t   status;               // Indicates error state with thermocouple or connection
        accel_event_data_t motion;      // Indicates motion & gesture detection
        uint8_t   wakeup_status;        // Indicates if device has woke up from deep sleep <------ This is the variable I added to the struct
    } ge_cookware_data_t;
    
    
    typedef struct __attribute__ ((__packed__)) {
        uint32_t    timestamp;      // time in nrf ticks (generally called milliseconds, but actually 1/1024 second)
        uint8_t     counter;        // increments by 1 every time data contents change (rolls over when reaches 0xff)
        uint8_t     packet_id;      // Advertising packet version (adv_packet_id_t)
        ge_cookware_data_t ge_cookware;
    } ge_adv_packet_t;
    
    typedef ge_adv_packet_t   ble_ge_cookware_data_t;
    

    And here's the struct created for the service/characteristic definition

    /**** Service ****/
    static const ble_char_config_t k_ble_cw_char_config[] = {
        /*                  characteristic UUID, READ,    WRITE,      NOTIFY,        INDICATE, REQUEST_ID, SEND_AS_EVENT,                               SIZE,                   MAX_SIZE, WRITE_TYPE,              HANDLE_STATE VARIABLE */
      /* 1st */  {        COOKWARE_UUID_CHAR_HANDLE_DATA, READ, NO_WRITE, AUTO_NOTIFY,     NO_INDICATE,         NO,            NO,     sizeof(ble_cw_cookware_data_t),                          0,      SHORT,    COOKWARE_VARIABLE_COOKWARE_DATA },
      /* 2nd */  {        COOKWARE_UUID_CHAR_HANDLE_HWID, READ, NO_WRITE,   NO_NOTIFY,     NO_INDICATE,         NO,            NO,     sizeof(ble_cw_cookware_hwid_t),                          0,      SHORT,          COOKWARE_VARIABLE_INVALID },
      /* 3rd */  {             COOKWARE_UUID_CHAR_EVENTS, READ, NO_WRITE,   NO_NOTIFY, DEMAND_INDICATE,        YES,            NO,             sizeof(ble_cw_event_t), BLE_GATT_MAX_PACKET_LENGTH,      SHORT,          COOKWARE_VARIABLE_INVALID },
      /* 4th */  {     COOKWARE_UUID_CHAR_GE_HANDLE_DATA, READ, NO_WRITE, AUTO_NOTIFY,     NO_INDICATE,         NO,            NO,     sizeof(ble_ge_cookware_data_t) /* here's the struct used by the characteristic */,                          0,      SHORT, COOKWARE_VARIABLE_GE_COOKWARE_DATA }, 
      /* 5th */  {           COOKWARE_UUID_CHAR_COMMANDS, READ,    WRITE,   NO_NOTIFY,     NO_INDICATE,        YES,            NO, sizeof(ble_cookware_instruction_t), BLE_GATT_MAX_PACKET_LENGTH,      SHORT,          COOKWARE_VARIABLE_INVALID },
    };

    That's what I mean when I say "I added a variable". And I am currently testing to replicate as much scenarios as possible to recover the FW from DFU to a downgraded version. I can see that the problem resides in the firmware running in the device itself, not the build being downloaded, but I am not understanding what's the real issue that's making fail all the attempts to do DFU, even to versions which share the same property (the new variable in the struct). 

    as an update, I see now that if I forget the device from the iphone I have running nrf connect, and restarting the bluetooth in the phone, the error I keep seeing in the logs from nrf connect is what I attached to this post "Error 14: Peer removed pairing information".


     

Reply
  • Hi  

    I would like to add more details about the problem I have. 

    I have a struct which is part of one of the characteristics is currently running in the microcontroller.

    typedef struct __attribute__ ((__packed__)) {
        software_revision_t  version;   // Device firmware version
        uint16_t  product_id;           // Device Product ID
        uint8_t   battery_v;            // Battery voltage in 0.01 V increments
        uint16_t  temp_degF;            // Sensor reading in 0.1 F increments
        int8_t    cjt_degC;             // Cold-Junction Temperature (thermocouple ADC temp offset)
        uint8_t   status;               // Indicates error state with thermocouple or connection
        accel_event_data_t motion;      // Indicates motion & gesture detection
        uint8_t   wakeup_status;        // Indicates if device has woke up from deep sleep <------ This is the variable I added to the struct
    } ge_cookware_data_t;
    
    
    typedef struct __attribute__ ((__packed__)) {
        uint32_t    timestamp;      // time in nrf ticks (generally called milliseconds, but actually 1/1024 second)
        uint8_t     counter;        // increments by 1 every time data contents change (rolls over when reaches 0xff)
        uint8_t     packet_id;      // Advertising packet version (adv_packet_id_t)
        ge_cookware_data_t ge_cookware;
    } ge_adv_packet_t;
    
    typedef ge_adv_packet_t   ble_ge_cookware_data_t;
    

    And here's the struct created for the service/characteristic definition

    /**** Service ****/
    static const ble_char_config_t k_ble_cw_char_config[] = {
        /*                  characteristic UUID, READ,    WRITE,      NOTIFY,        INDICATE, REQUEST_ID, SEND_AS_EVENT,                               SIZE,                   MAX_SIZE, WRITE_TYPE,              HANDLE_STATE VARIABLE */
      /* 1st */  {        COOKWARE_UUID_CHAR_HANDLE_DATA, READ, NO_WRITE, AUTO_NOTIFY,     NO_INDICATE,         NO,            NO,     sizeof(ble_cw_cookware_data_t),                          0,      SHORT,    COOKWARE_VARIABLE_COOKWARE_DATA },
      /* 2nd */  {        COOKWARE_UUID_CHAR_HANDLE_HWID, READ, NO_WRITE,   NO_NOTIFY,     NO_INDICATE,         NO,            NO,     sizeof(ble_cw_cookware_hwid_t),                          0,      SHORT,          COOKWARE_VARIABLE_INVALID },
      /* 3rd */  {             COOKWARE_UUID_CHAR_EVENTS, READ, NO_WRITE,   NO_NOTIFY, DEMAND_INDICATE,        YES,            NO,             sizeof(ble_cw_event_t), BLE_GATT_MAX_PACKET_LENGTH,      SHORT,          COOKWARE_VARIABLE_INVALID },
      /* 4th */  {     COOKWARE_UUID_CHAR_GE_HANDLE_DATA, READ, NO_WRITE, AUTO_NOTIFY,     NO_INDICATE,         NO,            NO,     sizeof(ble_ge_cookware_data_t) /* here's the struct used by the characteristic */,                          0,      SHORT, COOKWARE_VARIABLE_GE_COOKWARE_DATA }, 
      /* 5th */  {           COOKWARE_UUID_CHAR_COMMANDS, READ,    WRITE,   NO_NOTIFY,     NO_INDICATE,        YES,            NO, sizeof(ble_cookware_instruction_t), BLE_GATT_MAX_PACKET_LENGTH,      SHORT,          COOKWARE_VARIABLE_INVALID },
    };

    That's what I mean when I say "I added a variable". And I am currently testing to replicate as much scenarios as possible to recover the FW from DFU to a downgraded version. I can see that the problem resides in the firmware running in the device itself, not the build being downloaded, but I am not understanding what's the real issue that's making fail all the attempts to do DFU, even to versions which share the same property (the new variable in the struct). 

    as an update, I see now that if I forget the device from the iphone I have running nrf connect, and restarting the bluetooth in the phone, the error I keep seeing in the logs from nrf connect is what I attached to this post "Error 14: Peer removed pairing information".


     

Children
  • Hi fabocode,

    The code you share is your company's custom implementation to add BLE characteristic. ble_char_config_t is not a nRF5 SDK v14.2.0 type. My guess is there would be a helper function/library that read that array and add the characteristic.

    My next guess is that the size of ble_char_config_t changed and somehow mess up the DFU service. Did you check if ble_ge_cookware_data_t increases after your change?

    One possible cause of DFU issues after a change to the service table is exactly what is written in the "VERY IMPORTANT NOTE" comment in your code. However, your change should not cause this...
    To rule out this possibility, please repeat the test with a phone having forgotten the device, and a device that has been fully erased, then programmed with the new application

    I have another idea, but for now please help me with my above question and suggestion first.

    fabocode said:
    as an update, I see now that if I forget the device from the iphone I have running nrf connect, and restarting the bluetooth in the phone, the error I keep seeing in the logs from nrf connect is what I attached to this post "Error 14: Peer removed pairing information"

    If you forget the device on the phone, you also need to clear the bond information on the device. The central (phone) will not be able to pair to the peripheral if the peripheral still remembers the old bond.


    On a tangential note, a way to address the problem in "VERY IMPORTANT NOTE" is to use the Service Changed characteristic. Does your application configure NRF_SDH_BLE_SERVICE_CHANGED=1?

Related