This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Possible memory leak in remote provisioning implementation on SDK for Mesh v3.2

Hi,

I'm trying to implement remote provisioning on an altered version of the light switch example in the SDK for Mesh v3.2, on nRF52832 development kits. The code I'm using complements the native light switch client and server implementation with 2 additional models on their primary elements, related to application specific monitoring and neighbor discovery. Due to the lockdown situation, we need to implement remote provisioning in order to provision our testbed network while working from home. So I've further complemented the code with either a remote provisioning client on the primary element of the (static) Provisioner node and a remote provisioning server node on both the primary element of the light switch client and the servers. Native provisioning from the Provisioner node itself is still working as it should. Further, I'm able to change the publish address of the remote provisioning client model on the Provisioner node, to a provisioned and configured light switch client or server's element that contains a remote provisioning server model, and I'm able to instruct this remote provisioning server model to scan for incoming provisionee beacons and forward them to the Provisioner node (i.e. to the remote provisioning client model). However, when I attempt to utilize the already provisioned and configured light switch server to act as intermediary in the provisioning process of another light switch server (which is still provisionee), the provisioning process fails with a Memfault on the remote provisioning server model of the light switch server intermediary. After checking the registers, I see that it is a IACCVIOL Memfault and after debugging the code further with breakpoints and logging, I find that this function pointer call

uint32_t status = p_ctx->p_prov_bearer->p_interface->link_open(p_ctx->p_prov_bearer, p_uuid,  NRF_MESH_PROV_LINK_TIMEOUT_MIN_US);

causes the Memfault assert. The call to this function pointer is done on the line

static pb_remote_server_state_t local_link_open(pb_remote_server_t * p_ctx, pb_remote_server_event_t * p_evt, const uint8_t * p_uuid)

which is used to open a native provisioning link to the provisionee from this remote provisioning server model after being instructed to do so from the remote provisioning client model. So I started to debug where this function pointer's memory address starts to change, since this is what causes the ultimate memfault on calling the function pointer. I found that upon receiving a UUID of a provisionee's beacon on the remote provisioner server model and thus attempting to send that information back to the remote provisioning client model, that the memory address suddenly changes in the call of the method 

send_reliable_msg(p_ctx, PB_REMOTE_OP_SCAN_UUID_REPORT, PB_REMOTE_OP_SCAN_REPORT_STATUS, sizeof(pb_remote_msg_scan_uuid_report_t));

which is called to send the UUID back to the remote provisioning client model. More specifically before and after this line of code

p_ctx->reliable.message.p_buffer = m_packet.buffer;

the memory address of the function pointer p_ctx->p_prov_bearer->p_interface->link_open changes to another memory address. More specifically it changes from 3C02B to 570E1EF4.

send_reliable_msg BEFORE p_ctx->p_prov_bearer->p_interface->link_open addr: 3C02B
send_reliable_msg AFTER p_ctx->p_prov_bearer->p_interface->link_open addr: 570E1EF4

I'm at a complete loss on how to solve this issue... I've already tried to change the stack & heap sizes as well as the flash page amount for the access layer and device state manager (which should actually cause an assert if it is too low if I’m correct?), to no positive effect.

If I alter the function pointer call to a manual hardcoded call to the function that the pointer actually points to (i.e. static uint32_t prov_bearer_adv_link_open), the code does go a bit further, but again asserts at the next function pointer which has the same issue (i.e. a function pointer of the p_callbacks: p_pb_adv->prov_bearer.p_callbacks->opened(&p_pb_adv->prov_bearer)). Thus it seems that both the p_callbacks’s and p_interface’s function pointers ‘s memory addresses are altered during execution... Is there something that I missed regarding memory usage, declaration of required RAM for certain layers or certain features, etc.?

Thanks in advance for any pointers or advice on how to further debug this problem.

Kind regards,

Mathias

Parents
  • Hi.

    Before spending a lot of time trying to solve this in an older version of the nRF5 SDK for Mesh, I need to ask if you have the possibility to test this using the latest versio of the nRF5 SDK for Mesh (currently v.5.0.0).

    There's been done some significant work to the remote provisioning feature in the later releases, so this might not be an issue when using a later version.

    Br,
    Joakim

  • Hi Joakim,

    I've tried your suggestion, migrating my code from v3.2 to v4.0 and from v4.0 to v5.0:

    • v3.2 to v4.0 was successful but the same memory leakage problem persists.
    • v4.0 to v5.0 hasn't been successful yet. I got the provisioner and client working (among others, by increasing FLASH_MANAGER_POOL_SIZE in nrf_mesh_config_core.h, not sure if this is a good idea to do since this is an application-agnostic value?), but the server asserts during mesh initialization. I've been able to pinpoint the assert to NRF_MESH_ASSERT(size_guard <= UINT16_MAX); in the for loop (after a few iterations) in mesh_config_backend_init(). As of now, I haven't been able to solve this / understand why the code crashes there.

    I've also tried the opposite, copying the remote provisioning models from v5.0 (now as a proprietary feature instead of experimental in that release) to my v3.2 stack and tried utilizing those implementations of the models in my code instead. This fully compiles and flashes, but still gives the exact same memory issue. So I'm not sure if my problem is related to an older version. Especially by finding these constants like FLASH_MANAGER_POOL_SIZE, could some of those kind of defines related to memory be related to the memory leakage problem I'm having? Some insufficient model related memory that doesn't give an error but still gives memory leakage later? Or are there other known things in the stack that could cause memory leakage?

    Thanks!

    Mathias

Reply
  • Hi Joakim,

    I've tried your suggestion, migrating my code from v3.2 to v4.0 and from v4.0 to v5.0:

    • v3.2 to v4.0 was successful but the same memory leakage problem persists.
    • v4.0 to v5.0 hasn't been successful yet. I got the provisioner and client working (among others, by increasing FLASH_MANAGER_POOL_SIZE in nrf_mesh_config_core.h, not sure if this is a good idea to do since this is an application-agnostic value?), but the server asserts during mesh initialization. I've been able to pinpoint the assert to NRF_MESH_ASSERT(size_guard <= UINT16_MAX); in the for loop (after a few iterations) in mesh_config_backend_init(). As of now, I haven't been able to solve this / understand why the code crashes there.

    I've also tried the opposite, copying the remote provisioning models from v5.0 (now as a proprietary feature instead of experimental in that release) to my v3.2 stack and tried utilizing those implementations of the models in my code instead. This fully compiles and flashes, but still gives the exact same memory issue. So I'm not sure if my problem is related to an older version. Especially by finding these constants like FLASH_MANAGER_POOL_SIZE, could some of those kind of defines related to memory be related to the memory leakage problem I'm having? Some insufficient model related memory that doesn't give an error but still gives memory leakage later? Or are there other known things in the stack that could cause memory leakage?

    Thanks!

    Mathias

Children
Related