This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Mesh LPN RX race condition

Suppose node L is in a friendship with node F. Consider the following: 

1. Some other node C sends a message to L. 

2. F hears the message, and saves the message into L's friend queue.

3. L also happens to be scanning (e.g. L is currently in a Friend Poll receive window), and hears the message itself.

4. F sends the recorded message to L as the Friend Poll reply.

5. However, when L receives the recorded message from F, it will discard it on the Network layer due to message cache hit (it has already seen the message itself in step 2). This happens in the 3rd if statement in `deobfuscated_header_is_valid`.

6. Therefore, the LPN stack on L never receives L's Friend Poll reply. It thus never updates fsn, and will keep requesting the same message from L. 

7. After 5 Friend Polls, the LPN stack on L will terminate the friendship. 

I think in this case the LPN stack should still somehow increase its fsn despite of the message cache hit. 

To reproduce this, I had C continuously sending messages to L (3 messages every second). It usually reproduces within the first 20 seconds or so. 

Thanks! 

Parents
  • Hi Isundaylee, 

    Thanks for reporting this. However, by design, our LPN should not receive packet to itself when it's in friendship. You can have a look at the LPN life cycle figure here.

    But we suspect it might be due to another bug we found in SDK v3.0 and fixed in SDK v3.1. Could you reproduce the issue on SDK v3.1 ? 

    In addition, could you explain a little bit on how you figured out that the LPN discard packet from friend, do you have some log ? 

  • I added the following lines:

    diff --git a/mesh_sdk/mesh/core/src/lpn.c b/mesh_sdk/mesh/core/src/lpn.c
    index 9d8d0f6a..717d48be 100755
    --- a/mesh_sdk/mesh/core/src/lpn.c
    +++ b/mesh_sdk/mesh/core/src/lpn.c
    @@ -655,6 +655,7 @@ static void receive_delay_handle(timestamp_t timestamp, void * p_context)
     {
         (void)timestamp;
    
    +    __LOG(LOG_SRC_APP, LOG_LEVEL_ERROR, "Scanning...");
         scanner_enable();
         m_lpn.timeout_scheduler.cb = timeout_handle;
         timestamp_t new_timestamp = timer_now() + ((uint32_t) p_context);
    @@ -946,6 +947,7 @@ void mesh_lpn_rx_notify(const network_packet_metadata_t * p_net_metadata)
    
         if (m_lpn.p_subman_data == NULL)
         {
    +        __LOG(LOG_SRC_APP, LOG_LEVEL_ERROR, "FSN incremented.");
             m_lpn.fsn++;
         }
    
    diff --git a/mesh_sdk/mesh/core/src/net_packet.c b/mesh_sdk/mesh/core/src/net_packet.c
    index c195104b..9d451633 100755
    --- a/mesh_sdk/mesh/core/src/net_packet.c
    +++ b/mesh_sdk/mesh/core/src/net_packet.c
    @@ -127,6 +127,7 @@ static inline bool deobfuscated_header_is_valid(const network_packet_metadata_t
          * the packet here, but since the decryption would have failed anyway, it doesn't matter. */
         if (msg_cache_entry_exists(p_net_metadata->src, p_net_metadata->internal.sequence_number))
         {
    +        __LOG(LOG_SRC_APP, LOG_LEVEL_ERROR, "Packet dropped: Message cache.");
             __INTERNAL_EVENT_PUSH(INTERNAL_EVENT_PACKET_DROPPED, PACKET_DROPPED_NETWORK_CACHE, net_packet_len, p_net_packet);
             return false;
         }
    @@ -135,6 +136,7 @@ static inline bool deobfuscated_header_is_valid(const network_packet_metadata_t
         nrf_mesh_address_t dummy;
         if (nrf_mesh_rx_address_get(p_net_metadata->src, &dummy))
         {
    +        __LOG(LOG_SRC_APP, LOG_LEVEL_ERROR, "Packet dropped: Self-sent.");
             return false;
         }

    Here's the log:

    [      8.104] [   INFO    ] Received a Friend Update.
    [      8.104] [   INFO    ] Friend poll completed.
    [     14.029] [   INFO    ] User button pressed.
    [     14.029] [***********] TX: [aop: 0x8203]
    [     14.030] [***********] TX: Msg: 0100
    [     14.030] [***********] TX: [aop: 0x8203]
    [     14.030] [***********] TX: Msg: 0100
    [     14.030] [***********] TX: [aop: 0x8203]
    [     14.030] [***********] TX: Msg: 0100
    [     14.030] [   INFO    ] Received LED set request: ON
    [     16.074] [!! ERROR !!] Scanning...
    [     16.204] [!! ERROR !!] Packet dropped: Self-sent.
    [     16.228] [!! ERROR !!] Packet dropped: Self-sent.
    [     16.430] [!! ERROR !!] Scanning...
    [     16.564] [!! ERROR !!] Packet dropped: Self-sent.
    [     16.584] [!! ERROR !!] Packet dropped: Self-sent.
    [     16.785] [!! ERROR !!] Scanning...
    [     16.905] [***********] TX: [aop: 0x0004]
    [     16.905] [***********] TX: Msg: 00FFFF
    [     16.923] [!! ERROR !!] Packet dropped: Self-sent.
    [     16.944] [!! ERROR !!] Packet dropped: Self-sent.
    [     17.141] [!! ERROR !!] Scanning...
    [     17.284] [!! ERROR !!] Packet dropped: Self-sent.
    [     17.308] [!! ERROR !!] Packet dropped: Self-sent.
    [     17.497] [!! ERROR !!] Scanning...
    [     17.644] [!! ERROR !!] Packet dropped: Self-sent.
    [     17.664] [!! ERROR !!] Packet dropped: Self-sent.
    [     17.852] [!! ERROR !!] Scanning...
    [     18.003] [!! ERROR !!] Packet dropped: Self-sent.
    [     18.108] [!! ERROR !!] Friendship terminated with 0x0021. Reason: 2.

    At 14.029 I pressed the button, which sends a Generic OnOff Set message to 0xCAFE. The LPN node also subscribes to 0xCAFE, and hears the message directly (probably through a local loopback?) at 14.030. In the following Friend Poll process the Friend Poll replies are all rejected as self-sent (4th if-statement), and FSN is never incremented. 

Reply
  • I added the following lines:

    diff --git a/mesh_sdk/mesh/core/src/lpn.c b/mesh_sdk/mesh/core/src/lpn.c
    index 9d8d0f6a..717d48be 100755
    --- a/mesh_sdk/mesh/core/src/lpn.c
    +++ b/mesh_sdk/mesh/core/src/lpn.c
    @@ -655,6 +655,7 @@ static void receive_delay_handle(timestamp_t timestamp, void * p_context)
     {
         (void)timestamp;
    
    +    __LOG(LOG_SRC_APP, LOG_LEVEL_ERROR, "Scanning...");
         scanner_enable();
         m_lpn.timeout_scheduler.cb = timeout_handle;
         timestamp_t new_timestamp = timer_now() + ((uint32_t) p_context);
    @@ -946,6 +947,7 @@ void mesh_lpn_rx_notify(const network_packet_metadata_t * p_net_metadata)
    
         if (m_lpn.p_subman_data == NULL)
         {
    +        __LOG(LOG_SRC_APP, LOG_LEVEL_ERROR, "FSN incremented.");
             m_lpn.fsn++;
         }
    
    diff --git a/mesh_sdk/mesh/core/src/net_packet.c b/mesh_sdk/mesh/core/src/net_packet.c
    index c195104b..9d451633 100755
    --- a/mesh_sdk/mesh/core/src/net_packet.c
    +++ b/mesh_sdk/mesh/core/src/net_packet.c
    @@ -127,6 +127,7 @@ static inline bool deobfuscated_header_is_valid(const network_packet_metadata_t
          * the packet here, but since the decryption would have failed anyway, it doesn't matter. */
         if (msg_cache_entry_exists(p_net_metadata->src, p_net_metadata->internal.sequence_number))
         {
    +        __LOG(LOG_SRC_APP, LOG_LEVEL_ERROR, "Packet dropped: Message cache.");
             __INTERNAL_EVENT_PUSH(INTERNAL_EVENT_PACKET_DROPPED, PACKET_DROPPED_NETWORK_CACHE, net_packet_len, p_net_packet);
             return false;
         }
    @@ -135,6 +136,7 @@ static inline bool deobfuscated_header_is_valid(const network_packet_metadata_t
         nrf_mesh_address_t dummy;
         if (nrf_mesh_rx_address_get(p_net_metadata->src, &dummy))
         {
    +        __LOG(LOG_SRC_APP, LOG_LEVEL_ERROR, "Packet dropped: Self-sent.");
             return false;
         }

    Here's the log:

    [      8.104] [   INFO    ] Received a Friend Update.
    [      8.104] [   INFO    ] Friend poll completed.
    [     14.029] [   INFO    ] User button pressed.
    [     14.029] [***********] TX: [aop: 0x8203]
    [     14.030] [***********] TX: Msg: 0100
    [     14.030] [***********] TX: [aop: 0x8203]
    [     14.030] [***********] TX: Msg: 0100
    [     14.030] [***********] TX: [aop: 0x8203]
    [     14.030] [***********] TX: Msg: 0100
    [     14.030] [   INFO    ] Received LED set request: ON
    [     16.074] [!! ERROR !!] Scanning...
    [     16.204] [!! ERROR !!] Packet dropped: Self-sent.
    [     16.228] [!! ERROR !!] Packet dropped: Self-sent.
    [     16.430] [!! ERROR !!] Scanning...
    [     16.564] [!! ERROR !!] Packet dropped: Self-sent.
    [     16.584] [!! ERROR !!] Packet dropped: Self-sent.
    [     16.785] [!! ERROR !!] Scanning...
    [     16.905] [***********] TX: [aop: 0x0004]
    [     16.905] [***********] TX: Msg: 00FFFF
    [     16.923] [!! ERROR !!] Packet dropped: Self-sent.
    [     16.944] [!! ERROR !!] Packet dropped: Self-sent.
    [     17.141] [!! ERROR !!] Scanning...
    [     17.284] [!! ERROR !!] Packet dropped: Self-sent.
    [     17.308] [!! ERROR !!] Packet dropped: Self-sent.
    [     17.497] [!! ERROR !!] Scanning...
    [     17.644] [!! ERROR !!] Packet dropped: Self-sent.
    [     17.664] [!! ERROR !!] Packet dropped: Self-sent.
    [     17.852] [!! ERROR !!] Scanning...
    [     18.003] [!! ERROR !!] Packet dropped: Self-sent.
    [     18.108] [!! ERROR !!] Friendship terminated with 0x0021. Reason: 2.

    At 14.029 I pressed the button, which sends a Generic OnOff Set message to 0xCAFE. The LPN node also subscribes to 0xCAFE, and hears the message directly (probably through a local loopback?) at 14.030. In the following Friend Poll process the Friend Poll replies are all rejected as self-sent (4th if-statement), and FSN is never incremented. 

Children
No Data
Related