BLE MESH : Replay Cache limitations

Howdy all,

I have noticed that there are some limitations regarding the replay cache in BLE mesh devices.

Environment:

1. nrf52840

2. MESH_SDK V5.00

I have developed a PC application with a GUI interface which enables us to interact with Mesh devices. We have a USB device which is based on the NRF52840 and contains roughly 17 client models. The PC application together with this device can perform gang provisioning and configuration as well. What I have noticed is that the device stops receiving responses once I get to device number 40. After doing some digging, I found that the replay cache is full and therefore messages never make it to the access layer.

Also, if I provision a device with say unicast address 0x0020, then unprovision that device and reprovision it with the exact same unicast address, the previous data for that unicast address is still in the replay cache of the provisioner. Now I have to send a certain message a few times and wait for a response, eventually the sequence number of the message is higher than the one stored in the replay cache, and the message is processed..

I have been through the devzone, and some suggestions were to simply just clear the replay cache, but that poses a security risk. Also, we are currently on like 99% of RAM usage, so simply just increasing the number of maximum cache message (REPLAY_CACHE_ENTRIES) is not that simple. I mean I could just do a memory optimization on the firmware and increase that number, but it won't resolve the issue mentioned regarding the provisioner...

Perhaps the solution for the provisioner is never to reuse a previous unicast address? (I understand you cannot have duplicate addresses, that is not the case here... I am talking about one device on a network being provisioned, reset, and reprovisioned with the same unicast address).

Please can you advise on:

1. Is the proper solution to simply do a memory optimization and ramp the size of the replay cache up to the maximum number the memory allows for (REPLAY_CACHE_ENTRIES)? This will not solve the provisioner issue I am experiencing though... But we will be able to communicate with more devices...

2. In terms of good practice, should I maybe prevent the provisioner from using previously used unicast addresses, even if there is no node on the network using that unicast? This does not make sense to me at all and feels a bit "hacky"...

I am not sure what the answer is here, if you were to simply clear that replay cache then your device will not be mesh compliant and will pose a security risk.

Regards

Chris

  • Hi, 

    Mesh Profile Specification, section 3.10.7 Node Removal procedure, paragraph 3:
    "After a node is removed from a network, its unicast addresses may be reused by a Provisioner. A Provisioner shall only reuse these addresses after the current IV Index (at the time of removal) has been updated (see Section 3.10.5) in order to enable the SEQ numbers to be reused."

    The spec requires the provisioner to wait with reusing unicast addresses until there has been a full IV Index Update cycle. In other words, you should not reuse that 0x20 address before the IV Index has been updated.

    Regards,
    Amanda
  • lHi Amanda,

    Perfect, thanks so much.

    Then just back to enabling a device to communicate with more unicast addresses:

    I have done a memory optimization and I have increased REPLAY_CACHE_ENTRIES to say 1030. That seems to work fine. I am experiencing 2 issues though:

    1. When increasing REPLAY_CACHE_ENTRIES to a much larger number, say 4095, I get a mesh assert in mesh_config_backend.c:

    line 110:  NRF_MESH_ASSERT(size_guard <= UINT16_MAX);

    Probably because the config entry for the replay cache is too large?

    I tried working around this by setting:

    #define REPLAY_CACHE_STORAGE_STRATEGY MESH_CONFIG_STRATEGY_NON_PERSISTENT


    Now I am getting an assert in packet_buffer.c:

    line 249: NRF_MESH_ASSERT(m_get_packet(p_buffer, p_buffer->head)->packet_state !=
    PACKET_BUFFER_MEM_STATE_RESERVED);

    What is the right way of expanding the replay cache so that the node can communicate with more devices?

  • Hi,

    You can increase this number, the trade-off is the bigger size of the m_replay_cache table ( 8 bytes/entry) and the longer time it takes for the node to check the list to avoid a replay attack. The size is limited by the free RAM memory your application has.

    -Amanda

  • I understand that it is limited by RAM, that is why I have mentioned that I have done a memory optimization. What I am trying to show to you are two problems:

    "1. When increasing REPLAY_CACHE_ENTRIES to a much larger number, say 4095, I get a mesh assert in mesh_config_backend.c:

    line 110:  NRF_MESH_ASSERT(size_guard <= UINT16_MAX);

    Probably because the mesh config entry for the replay cache is too large?"

    "2. I tried working around this by setting:

    #define REPLAY_CACHE_STORAGE_STRATEGY MESH_CONFIG_STRATEGY_NON_PERSISTENT


    Now I am getting an assert in packet_buffer.c:

    line 249: NRF_MESH_ASSERT(m_get_packet(p_buffer, p_buffer->head)->packet_state !=
    PACKET_BUFFER_MEM_STATE_RESERVED);
    "

    Somehow the replay cache in RAM and the mesh configuration (FLASH) is not scaling correctly? Maybe somewhere I should increase the number of flash pages the mesh config uses? Also, when I am explicitly stating:

    #define REPLAY_CACHE_STORAGE_STRATEGY MESH_CONFIG_STRATEGY_NON_PERSISTENT

    I would expect that the replay cache will never be stored in flash (mesh config), so the size of REPLAY_CACHE_ENTRIES should only be dependent on RAM, but now I am getting mesh asserts? Do you see what I am trying to point out?

  • Hi, 

    1. Yes, The max size of backend part for the file cannot be more than `UINT16_MAX`. That means (sizeof(replay_cache_entry_t) + szizeof(uint16_t)) * REPLAY_CACHE_ENTRIES <= UINT16_MAX.

    2. Using NON_PERSISTENT for replay cache is not according to the mesh spec. 

    -Amanda

Related