This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Mesh LPN RX race condition

Suppose node L is in a friendship with node F. Consider the following: 

1. Some other node C sends a message to L. 

2. F hears the message, and saves the message into L's friend queue.

3. L also happens to be scanning (e.g. L is currently in a Friend Poll receive window), and hears the message itself.

4. F sends the recorded message to L as the Friend Poll reply.

5. However, when L receives the recorded message from F, it will discard it on the Network layer due to message cache hit (it has already seen the message itself in step 2). This happens in the 3rd if statement in `deobfuscated_header_is_valid`.

6. Therefore, the LPN stack on L never receives L's Friend Poll reply. It thus never updates fsn, and will keep requesting the same message from L. 

7. After 5 Friend Polls, the LPN stack on L will terminate the friendship. 

I think in this case the LPN stack should still somehow increase its fsn despite of the message cache hit. 

To reproduce this, I had C continuously sending messages to L (3 messages every second). It usually reproduces within the first 20 seconds or so. 

Thanks! 

  • I added the following lines:

    diff --git a/mesh_sdk/mesh/core/src/lpn.c b/mesh_sdk/mesh/core/src/lpn.c
    index 9d8d0f6a..717d48be 100755
    --- a/mesh_sdk/mesh/core/src/lpn.c
    +++ b/mesh_sdk/mesh/core/src/lpn.c
    @@ -655,6 +655,7 @@ static void receive_delay_handle(timestamp_t timestamp, void * p_context)
     {
         (void)timestamp;
    
    +    __LOG(LOG_SRC_APP, LOG_LEVEL_ERROR, "Scanning...");
         scanner_enable();
         m_lpn.timeout_scheduler.cb = timeout_handle;
         timestamp_t new_timestamp = timer_now() + ((uint32_t) p_context);
    @@ -946,6 +947,7 @@ void mesh_lpn_rx_notify(const network_packet_metadata_t * p_net_metadata)
    
         if (m_lpn.p_subman_data == NULL)
         {
    +        __LOG(LOG_SRC_APP, LOG_LEVEL_ERROR, "FSN incremented.");
             m_lpn.fsn++;
         }
    
    diff --git a/mesh_sdk/mesh/core/src/net_packet.c b/mesh_sdk/mesh/core/src/net_packet.c
    index c195104b..9d451633 100755
    --- a/mesh_sdk/mesh/core/src/net_packet.c
    +++ b/mesh_sdk/mesh/core/src/net_packet.c
    @@ -127,6 +127,7 @@ static inline bool deobfuscated_header_is_valid(const network_packet_metadata_t
          * the packet here, but since the decryption would have failed anyway, it doesn't matter. */
         if (msg_cache_entry_exists(p_net_metadata->src, p_net_metadata->internal.sequence_number))
         {
    +        __LOG(LOG_SRC_APP, LOG_LEVEL_ERROR, "Packet dropped: Message cache.");
             __INTERNAL_EVENT_PUSH(INTERNAL_EVENT_PACKET_DROPPED, PACKET_DROPPED_NETWORK_CACHE, net_packet_len, p_net_packet);
             return false;
         }
    @@ -135,6 +136,7 @@ static inline bool deobfuscated_header_is_valid(const network_packet_metadata_t
         nrf_mesh_address_t dummy;
         if (nrf_mesh_rx_address_get(p_net_metadata->src, &dummy))
         {
    +        __LOG(LOG_SRC_APP, LOG_LEVEL_ERROR, "Packet dropped: Self-sent.");
             return false;
         }

    Here's the log:

    [      8.104] [   INFO    ] Received a Friend Update.
    [      8.104] [   INFO    ] Friend poll completed.
    [     14.029] [   INFO    ] User button pressed.
    [     14.029] [***********] TX: [aop: 0x8203]
    [     14.030] [***********] TX: Msg: 0100
    [     14.030] [***********] TX: [aop: 0x8203]
    [     14.030] [***********] TX: Msg: 0100
    [     14.030] [***********] TX: [aop: 0x8203]
    [     14.030] [***********] TX: Msg: 0100
    [     14.030] [   INFO    ] Received LED set request: ON
    [     16.074] [!! ERROR !!] Scanning...
    [     16.204] [!! ERROR !!] Packet dropped: Self-sent.
    [     16.228] [!! ERROR !!] Packet dropped: Self-sent.
    [     16.430] [!! ERROR !!] Scanning...
    [     16.564] [!! ERROR !!] Packet dropped: Self-sent.
    [     16.584] [!! ERROR !!] Packet dropped: Self-sent.
    [     16.785] [!! ERROR !!] Scanning...
    [     16.905] [***********] TX: [aop: 0x0004]
    [     16.905] [***********] TX: Msg: 00FFFF
    [     16.923] [!! ERROR !!] Packet dropped: Self-sent.
    [     16.944] [!! ERROR !!] Packet dropped: Self-sent.
    [     17.141] [!! ERROR !!] Scanning...
    [     17.284] [!! ERROR !!] Packet dropped: Self-sent.
    [     17.308] [!! ERROR !!] Packet dropped: Self-sent.
    [     17.497] [!! ERROR !!] Scanning...
    [     17.644] [!! ERROR !!] Packet dropped: Self-sent.
    [     17.664] [!! ERROR !!] Packet dropped: Self-sent.
    [     17.852] [!! ERROR !!] Scanning...
    [     18.003] [!! ERROR !!] Packet dropped: Self-sent.
    [     18.108] [!! ERROR !!] Friendship terminated with 0x0021. Reason: 2.

    At 14.029 I pressed the button, which sends a Generic OnOff Set message to 0xCAFE. The LPN node also subscribes to 0xCAFE, and hears the message directly (probably through a local loopback?) at 14.030. In the following Friend Poll process the Friend Poll replies are all rejected as self-sent (4th if-statement), and FSN is never incremented. 

  • BTW the log above is for the similar scenario I described in my comment: "A similar thing can happen when L sends to a group address G which L also subscribes to itself. In this case the recorded Friend Poll reply will get discarded in the 4th if statement in `deobfuscated_header_is_valid`.", which repros deterministically every time.

  • Thanks Isundaylee for the detailed report. 

    We have checked and found an issue with our implementation in SDK v3.1 

    We will try to come up with a fix and let you know if we have one. 

  • Just checking if this was fixed in mesh sdk 3.2.0?

Related