This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Too many method calls in serial interface (or any other) callback causes strange behavior in nRF Mesh SDK

Hi,

I'm using SDK v3.2 and I'm extending the lighting example. I'm developing an extension to the provisioner, to inform a back-end about the configuration of the subnet it is part of. For that, I use the existing serial interface handler. However, I also want to know the Network ID derived from the NetKey. For that, I use the Application serial command. Within that command, I define a more specific opcode, which leads to the underlying code:

static void serial_interface_cb(const uint8_t* p_data, uint32_t length)
{
    NRF_MESH_ASSERT(length > 0);

    switch(p_data[0])
    {
        case BLUESS_OPCODE_NETWORK_ID_DISCOVERY_SERIAL_COMMAND:
            ; // Empty statement
            mesh_key_index_t p_key_list[DSM_SUBNET_MAX_LIMIT];
            uint32_t p_count = DSM_SUBNET_MAX_LIMIT;

            dsm_subnet_get_all(p_key_list, &p_count);

            uint32_t length = sizeof(bluess_serial_opcode_t) + p_count * NRF_MESH_NETID_SIZE;
            uint32_t index = 0;

            uint8_t p_serial[length];

            p_serial[index] = BLUESS_OPCODE_NETWORK_ID_DISCOVERY_SERIAL_EVENT;
            index += sizeof(bluess_serial_opcode_t);


            for(uint32_t i = 0 ; i < p_count ; i++){
                nrf_mesh_beacon_secmat_t * p_secmat;
               
                uint8_t key = 0;
                uint32_t status = dsm_subnet_key_get(dsm_net_key_index_to_subnet_handle(p_key_list[i]), &key);
                if(status == NRF_SUCCESS){
                    status = nrf_mesh_keygen_beacon_secmat(&key, p_secmat);
                }
                
                memcpy(&p_serial[index], &p_secmat->net_id[0], NRF_MESH_NETID_SIZE);
                index += NRF_MESH_NETID_SIZE;
            }
            nrf_mesh_serial_tx(p_serial, length);
            break;
    }
}

As the provisioner is only part of one subnet, p_count will become equal to 1. This is the case from the start of the for-loop. However, I notice that during the for-loop, suddenly p_count becomes much larger, causing the loop to run endlessly. When commenting out the whole code within the for-loop, the callback is completed as it should, without p_count changing in between. At first glance, it seems that to many method calls within a callback, causes issues... Is there any explanation for this?

Kind regards,

Mathias

Parents
  • Hi,

    There is absolutely no reason why p_count should change, as it is a variable local to the function and it is never assigned to anything else than it's initial value of (the constant) DSM_SUBNET_MAX_LIMIT. If you see it changing, there must be something with the debugging session (e.g. you are debugging a Release build, or otherwise optimized build that lacks debug info.)

    In general you should not do too much in a callback, as callbacks tend to be called from code running in higher interrupt priority (and so the callback function itself is also run in that higher interrupt priority.) Depending on the calling context, other parts of the stack may misbehave as a result. Also, there may be restrictions to what API calls can be called from interrupt context. In the nRF5 SDK for Mesh documentation, Interrupt priority levels are thoroughly documented.

    Regards,
    Terje

    PS: For readability of the code, I recommend that you rename the variable p_count, as there is a convention that the "p_" prefix indicates a pointer. E.g. the variable name "p_data" means "pointer to data".

  • There is absolutely no reason why p_count should change, as it is a variable local to the function and it is never assigned to anything else than it's initial value of (the constant) DSM_SUBNET_MAX_LIMIT..

    Why not? I call dsm_subnet_get_all(p_key_list, &p_count);, which will put the correct keys in the p_key_list and updates p_count to the correct amount of keys in that list, no?

    In general you should not do too much in a callback, as callbacks tend to be called from code running in higher interrupt priority (and so the callback function itself is also run in that higher interrupt priority.)

    Okay, but where should I then do the necessary parsing, handling, etc. of received serial data in the callback? Should I save it in some kind of buffer, set a timer and then put a an if-statement in the global main while loop to handle the contents of that buffer? Is that the correct way? Or is there another, better way?

    PS: For readability of the code, I recommend that you rename the variable p_count, as there is a convention that the "p_" prefix indicates a pointer. E.g. the variable name "p_data" means "pointer to data".

    Thanks for this tip! :)

Reply
  • There is absolutely no reason why p_count should change, as it is a variable local to the function and it is never assigned to anything else than it's initial value of (the constant) DSM_SUBNET_MAX_LIMIT..

    Why not? I call dsm_subnet_get_all(p_key_list, &p_count);, which will put the correct keys in the p_key_list and updates p_count to the correct amount of keys in that list, no?

    In general you should not do too much in a callback, as callbacks tend to be called from code running in higher interrupt priority (and so the callback function itself is also run in that higher interrupt priority.)

    Okay, but where should I then do the necessary parsing, handling, etc. of received serial data in the callback? Should I save it in some kind of buffer, set a timer and then put a an if-statement in the global main while loop to handle the contents of that buffer? Is that the correct way? Or is there another, better way?

    PS: For readability of the code, I recommend that you rename the variable p_count, as there is a convention that the "p_" prefix indicates a pointer. E.g. the variable name "p_data" means "pointer to data".

    Thanks for this tip! :)

Children
  • Hi,

    Sorry, yes, p_count gets set by dsm_subnet_get_all(). I do not know why I overlooked that. Still, after being set there, it is not set to any new value anywhere. It is a variable local to the particular instance of serial_interface_cb and so it should not magically change within the for loop.

    Note that most (if not all) API functions return an error code. You should save the error code returned from dsm_subnet_get_all() as well as any other API call, as you do for e.g. dsm_subnet_key_get():

    uint32_t status = dsm_subnet_key_get(dsm_net_key_index_to_subnet_handle(p_key_list[i]), &key);

    Further, you should take appropriate action if the error code is not NRF_SUCCESS.

    If an API call fails, you risk something going horribly wrong further down the line if your code flow continues as if everything is prefectly fine, and those bugs can be quite nasty to track down. You will find examples of error code handling throughout the nRF5 SDK for Mesh, in particular in the example projects.

    Regards,
    Terje

  • Hi,

    So even when doing 'too much' in the callback, it would be not be because of other interrupts or something that this p_count's memory would be freed magically and thus suddenly contain a different value? Because that's what I'm seeing... But I guess this falls under 'Depending on the calling context, other parts of the stack may misbehave as a result.' ?

    Yes, thanks for the tip about error handling. I mostly try to print it or put an if, but I should be more consistent with that :)

  • Hi,

    By "stack" in that context I was referring to the BT mesh stack, and not the call stack. (I should have specified.) But yes, each invocation of the function will get its own stack frame on the call stack, and so there should be no mangling of values there.

    An exception is if you have the local variable declared static within a function, in which case all invocations of that same function will use the same memory for the variable. Then the variable is not stored in the stack frame for the particular invocation of the function, but rather in the static part of memory (which is kept at a predetermined location throughout program execution.) But that is clearly not the case here.

    Another exception is if you take a pointer to the variable, but that should not be the case here either. (While you do pass the address of (e.g. pointer to) the variable to dsm_subnet_get_all(), that function only uses the pointer for returning the count. It does not store it for other parts of the mesh stack to mess with it.)

    I would be surprised if the key to solving this issue is not either in an error code returned from an API call, or the callback function lasting for too long.

    Regards,
    Terje

  • Hi Terje,

    Probably it'll be the callback function lasting for too long then.

    Could it be that when it lasts to long, other parts of the stack behave strangely and for some reason (part of) the heap gets cleared? That could be an explanation.

    Kind regards,

    Mathias

  • Hi,

    Trouble is we don't use a heap, so the stack growing into the (non-existing) heap shouldn't happen. Also, the stack is shared between SoftDevice and application, so no colliding stacks... Not quite sure what happens on a stack overflow, though.

    I am more curious as to how you determined p_count changed. How did you read that / output it?

    Regards,
    Terje

Related