This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Mesh LPN RX race condition

Suppose node L is in a friendship with node F. Consider the following: 

1. Some other node C sends a message to L. 

2. F hears the message, and saves the message into L's friend queue.

3. L also happens to be scanning (e.g. L is currently in a Friend Poll receive window), and hears the message itself.

4. F sends the recorded message to L as the Friend Poll reply.

5. However, when L receives the recorded message from F, it will discard it on the Network layer due to message cache hit (it has already seen the message itself in step 2). This happens in the 3rd if statement in `deobfuscated_header_is_valid`.

6. Therefore, the LPN stack on L never receives L's Friend Poll reply. It thus never updates fsn, and will keep requesting the same message from L. 

7. After 5 Friend Polls, the LPN stack on L will terminate the friendship. 

I think in this case the LPN stack should still somehow increase its fsn despite of the message cache hit. 

To reproduce this, I had C continuously sending messages to L (3 messages every second). It usually reproduces within the first 20 seconds or so. 

Thanks! 

Parents
  • Hi Isundaylee, 

    Thanks for reporting this. However, by design, our LPN should not receive packet to itself when it's in friendship. You can have a look at the LPN life cycle figure here.

    But we suspect it might be due to another bug we found in SDK v3.0 and fixed in SDK v3.1. Could you reproduce the issue on SDK v3.1 ? 

    In addition, could you explain a little bit on how you figured out that the LPN discard packet from friend, do you have some log ? 

Reply
  • Hi Isundaylee, 

    Thanks for reporting this. However, by design, our LPN should not receive packet to itself when it's in friendship. You can have a look at the LPN life cycle figure here.

    But we suspect it might be due to another bug we found in SDK v3.0 and fixed in SDK v3.1. Could you reproduce the issue on SDK v3.1 ? 

    In addition, could you explain a little bit on how you figured out that the LPN discard packet from friend, do you have some log ? 

Children
  • Thanks for the information! I did upgrade to v3.1.0 due to the other LPN bug, but this seems to be a separate one that still repros on v3.1.0.

    Can you elaborate a bit on "LPN should not receive packet to itself when it's in friendship"? Do you mean that LPN should not process packets that it hears directly (i.e. not from a Friend)? In this case what seems to happen is that: 1) LPN sends a Friend Poll, 2) LPN waits for Receive Delay, and starts scanning for replies to the Friend Poll, 3) while the scanner in on, the LPN node hears a direct message before hearing the Friend Poll response. Is there code in the SDK that instructs LPN to ignore all the packets it hears directly? 

    To confirm:

    1) I added log statements to the 3rd and 4th if-statements in `deobfuscated_header_is_valid`. They are hit. 

    2) I added a log statement in `mesh_lpn_rx_notify` right before `m_lpn.fsn++`. They are not hit, thus confirming that the Friend Poll reply messages never reached the LPN module.

    I can provide some log files if that is helpful. 

    Thank you!

  • I added the following lines:

    diff --git a/mesh_sdk/mesh/core/src/lpn.c b/mesh_sdk/mesh/core/src/lpn.c
    index 9d8d0f6a..717d48be 100755
    --- a/mesh_sdk/mesh/core/src/lpn.c
    +++ b/mesh_sdk/mesh/core/src/lpn.c
    @@ -655,6 +655,7 @@ static void receive_delay_handle(timestamp_t timestamp, void * p_context)
     {
         (void)timestamp;
    
    +    __LOG(LOG_SRC_APP, LOG_LEVEL_ERROR, "Scanning...");
         scanner_enable();
         m_lpn.timeout_scheduler.cb = timeout_handle;
         timestamp_t new_timestamp = timer_now() + ((uint32_t) p_context);
    @@ -946,6 +947,7 @@ void mesh_lpn_rx_notify(const network_packet_metadata_t * p_net_metadata)
    
         if (m_lpn.p_subman_data == NULL)
         {
    +        __LOG(LOG_SRC_APP, LOG_LEVEL_ERROR, "FSN incremented.");
             m_lpn.fsn++;
         }
    
    diff --git a/mesh_sdk/mesh/core/src/net_packet.c b/mesh_sdk/mesh/core/src/net_packet.c
    index c195104b..9d451633 100755
    --- a/mesh_sdk/mesh/core/src/net_packet.c
    +++ b/mesh_sdk/mesh/core/src/net_packet.c
    @@ -127,6 +127,7 @@ static inline bool deobfuscated_header_is_valid(const network_packet_metadata_t
          * the packet here, but since the decryption would have failed anyway, it doesn't matter. */
         if (msg_cache_entry_exists(p_net_metadata->src, p_net_metadata->internal.sequence_number))
         {
    +        __LOG(LOG_SRC_APP, LOG_LEVEL_ERROR, "Packet dropped: Message cache.");
             __INTERNAL_EVENT_PUSH(INTERNAL_EVENT_PACKET_DROPPED, PACKET_DROPPED_NETWORK_CACHE, net_packet_len, p_net_packet);
             return false;
         }
    @@ -135,6 +136,7 @@ static inline bool deobfuscated_header_is_valid(const network_packet_metadata_t
         nrf_mesh_address_t dummy;
         if (nrf_mesh_rx_address_get(p_net_metadata->src, &dummy))
         {
    +        __LOG(LOG_SRC_APP, LOG_LEVEL_ERROR, "Packet dropped: Self-sent.");
             return false;
         }

    Here's the log:

    [      8.104] [   INFO    ] Received a Friend Update.
    [      8.104] [   INFO    ] Friend poll completed.
    [     14.029] [   INFO    ] User button pressed.
    [     14.029] [***********] TX: [aop: 0x8203]
    [     14.030] [***********] TX: Msg: 0100
    [     14.030] [***********] TX: [aop: 0x8203]
    [     14.030] [***********] TX: Msg: 0100
    [     14.030] [***********] TX: [aop: 0x8203]
    [     14.030] [***********] TX: Msg: 0100
    [     14.030] [   INFO    ] Received LED set request: ON
    [     16.074] [!! ERROR !!] Scanning...
    [     16.204] [!! ERROR !!] Packet dropped: Self-sent.
    [     16.228] [!! ERROR !!] Packet dropped: Self-sent.
    [     16.430] [!! ERROR !!] Scanning...
    [     16.564] [!! ERROR !!] Packet dropped: Self-sent.
    [     16.584] [!! ERROR !!] Packet dropped: Self-sent.
    [     16.785] [!! ERROR !!] Scanning...
    [     16.905] [***********] TX: [aop: 0x0004]
    [     16.905] [***********] TX: Msg: 00FFFF
    [     16.923] [!! ERROR !!] Packet dropped: Self-sent.
    [     16.944] [!! ERROR !!] Packet dropped: Self-sent.
    [     17.141] [!! ERROR !!] Scanning...
    [     17.284] [!! ERROR !!] Packet dropped: Self-sent.
    [     17.308] [!! ERROR !!] Packet dropped: Self-sent.
    [     17.497] [!! ERROR !!] Scanning...
    [     17.644] [!! ERROR !!] Packet dropped: Self-sent.
    [     17.664] [!! ERROR !!] Packet dropped: Self-sent.
    [     17.852] [!! ERROR !!] Scanning...
    [     18.003] [!! ERROR !!] Packet dropped: Self-sent.
    [     18.108] [!! ERROR !!] Friendship terminated with 0x0021. Reason: 2.

    At 14.029 I pressed the button, which sends a Generic OnOff Set message to 0xCAFE. The LPN node also subscribes to 0xCAFE, and hears the message directly (probably through a local loopback?) at 14.030. In the following Friend Poll process the Friend Poll replies are all rejected as self-sent (4th if-statement), and FSN is never incremented. 

  • BTW the log above is for the similar scenario I described in my comment: "A similar thing can happen when L sends to a group address G which L also subscribes to itself. In this case the recorded Friend Poll reply will get discarded in the 4th if statement in `deobfuscated_header_is_valid`.", which repros deterministically every time.

Related