What can cause a four byte shift in the data read by bt_mesh_model_cb.settings_set?

Dear NCS experts,

I'm working with NRF Connect SDK (1.7.1) on a proprietary Bluetooth Mesh model. When the model receives its configuration, it stores it in flash by calling bt_mesh_model_data_store(). Next time the model is powered on, it loads its configuration through bt_mesh_model_cb.settings_set callback.
 
Now the problem is, that the data read in bt_mesh_model_cb.settings_set start with four leading bytes (could be an int) that do not belong to the model's configuration. The model's actual configuration start at the fifth byte of the read data - but is four bytes short.
 
It kinda looks as if bt_mesh_model_cb.settings_set is reading from an address that's sizeof(int) before the address where the actual data is stored. Could also be that bt_mesh_model_data_store() writes to an address that's four bytes behind - haven't been able to debug deep down yet. What's a little remarkable is the value of the first four bytes which, as integer translate to: 0x20000710 - which matches the address where the bt_mesh_model structure is stored in memory. Coincidence? Maybe...
The question now is, what could cause this behavior and how to fix it?
 
Now, honestly, I doubt that a bug like this would go unnoticed in Zephyr, so let's better have a look at my code:
 
This is how the model's user data looks like:
 

    #define MAX_VALUES 16
    #define MAX_NAME_LENGTH 30
 
    struct model_value
    {
        uint16_t time;
        uint8_t value;
    };
 
    /*
      * Size in memory:              104 bytes: (10 + 64 + 30)
      * Required buffer (packed):     88 bytes: (10 + 48 + 30)
      */
    struct my_configuration
    {
        uint32_t id;
        int32_t uptime_correction;
        uint8_t max_in_percent;  
        uint8_t value_cnt;
        struct model_value values[MAX_VALUES];
        char name[MAX_NAME_LENGTH];
    };
 
    struct my_model_user_data
    {
        struct bt_mesh_model *model;
        struct my_configuration model_cfg;
        struct k_mutex cfg_mutex;
    };
 
 
 
The "set configuration" opcode handler (below) just parses the incoming data, validates it, and saves it to flash.
Note: At the time the mutex is released, new_cfg (as well as user_data->model_cfg) contain a valid configuration.
 
 
static int handle_message_set_config(struct bt_mesh_model *model, struct bt_mesh_msg_ctx *ctx, struct net_buf_simple *buf) {
    struct model_user_data *user_data = model->user_data;
    struct my_configuration new_cfg;
 
    parse_config(&new_cfg, buf);
 
    if (!is_configuration_valid(&new_cfg)) {
        return -1;
    }
 
    if (k_mutex_lock(&user_data->cfg_mutex, K_MSEC(WAIT_FOR_MUTEX)) == 0) {
        memcpy(&user_data->model_cfg, &new_cfg, sizeof(struct my_configuration));
        bt_mesh_model_data_store(model, true, DATA_STORE_KEY, &new_cfg, sizeof(struct my_configuration));
        k_mutex_unlock(&user_data->cfg_mutex);
    }
 
    return reply_status(model, ctx, MSG_SET_CFG_STATUS);
}

 
Edit: Looking at the above code, it kinda looks like bt_mesh_model_data_store() ignores the fourth parameter and uses the address of the model's my_model_user_data structure instead. Just a gut feeling... Can somebody with some in depth knowledge please confirm? That would perfectly explain the behavior.
 

The method to read the data during boot looks like this:
 

static int my_model_settings_set(struct bt_mesh_model *model, const char *name, size_t len_rd, settings_read_cb read_cb, void *cb_arg)
{
    struct my_model_user_data *user_data = model->user_data;
 
    if (NULL == name) {
        return -ENOENT;
    }
 
    struct my_configuration configuration;
    ssize_t bytes_read = read_cb(cb_arg, &configuration, sizeof(struct my_configuration));
 
    /*
     * Here's where the problem appears:
     * 1. bytes_read matches sizeof(my_configuration)
     * 2. first 4 bytes read are invalid, making the whole content of my_configuration invalid.  
     * 3. Actual configuration starts at the fifth byte read.
     * 4. Because of the four invalid bytes at the beginning, all values are stored four bytes off and the last four bytes are missing.
     */
    if (bytes_read == sizeof(struct my_configuration) && is_configuration_valid(&configuration)) {
        memcpy(&user_data->model_cfg, &configuration, sizeof(struct my_configuration));
    } else {
        // We end up here because is_configuration_valid() recognizes the data corruption.
    }
 
    return 0;
}

 
Simple code... Still I would appreciate if you could lend a hand. Do you see anything that's obviously wrong? Your help is very much appreciated,

Thank you,
Michael.

P.S.: To whomever is responsible for this forum: Please fix it! It took 10+ reloads until the "edit" button became visible...

 

Parents
  • Hello Mike!

    Sorry about the delay and thank you for your patience.

    I am afraid that I don't immediately see the issue. Though I agree with your thinking, in that the issue is likely in the application code, and that it isn't a coincidence how the four byte offset matches the address the structure is stored on.

    Did you base this model on any example in particular?

    Best regards,

    Elfving

Reply
  • Hello Mike!

    Sorry about the delay and thank you for your patience.

    I am afraid that I don't immediately see the issue. Though I agree with your thinking, in that the issue is likely in the application code, and that it isn't a coincidence how the four byte offset matches the address the structure is stored on.

    Did you base this model on any example in particular?

    Best regards,

    Elfving

Children
  • Hi Elfving,

    Thanks for your help, and all the best 2022!

    My model is/was based on the chat example, but other than the structure, there's not much left of the original code.

    I've worked around the issue by placing the configuration data at the beginning of the model's user data, which seems to work, however, I can't hand a hack like that over to my client. That would be... dangerous. Very!

    Do you have any other ideas?

  • Hey Mike!

    I think that hacky solution hints at what the issue is. You get the address before the configuration because that is what is the first part of the my_model_user_data struct, which is what you are probably using instead of the configuration struct itself. The wrong struct is being used.

    Here you are using struct model_user_data,

    static int handle_message_set_config(struct bt_mesh_model *model, struct bt_mesh_msg_ctx *ctx, struct net_buf_simple *buf) {
        struct model_user_data *user_data = model->user_data;

    though it might be that the model struct is of the type "struct my_model_user_data". I don't have its implementation, but if that is the case it might explain a part of the problem.

    Best regards,

    Elfving

Related