This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Friend node timeout when LPN is re-provisioned with same address

There seems to be a bug in the mesh stack friendship code: the friend node times out infinitely when an LPN node is re-provisioned with the same address.

Steps to reproduce:

1. Program an nRF52-DK with the stock SDK light_switch_server:  

ninja flash_light_switch_server_nrf52832_xxAA_s132_7.0.1

2. Provision the light switch server with the nRF Mesh App.

3. Program a second nRF52-DK with the stock SDK LPN example:

ninja flash_lpn_nrf52832_xxAA_s132_7.0.1

3. Connect to LPN node with JLinkRTTLogger.

4. Provision the LPN node with the nRF Mesh App:

<t:    1342412>, main.c,  141, Successfully provisioned
<t:    1342417>, main.c,  154, Node Address: 0x0004 

5. Press Button3 on the LPN node to establish a friendship with the light switch server:

<t:    8741395>, main.c,  337, Button 2 pressed
<t:    8741398>, main.c,  277, Initiating the friendship establishment procedure.
<t:    8745101>, main.c,  408, Received friend offer from 0x0003
<t:    8748460>, main.c,  455, Friendship established with: 0x0003

6. Reset the LPN node in the app, then re-provision it. Note that the app assigns the same address to the device:

<t:    6325479>, mai<t:          0>, main.c,  552, ----- BLE Mesh LPN Demo -----
<t:       8877>, main.c,  503, Initializing and adding models
<t:      13683>, mesh_app_utils.c,   65, Device UUID (raw): 6D6551FBA6ED1F4687A9A4973BE61F9A
<t:      13687>, mesh_app_utils.c,   70, Device UUID : FB51656D-EDA6-461F-87A9-A4973BE61F9A
<t:     420362>, ble_softdevice_support.c,  104, Successfully updated connection parameters
<t:     966607>, main.c,  141, Successfully provisioned
<t:     966612>, main.c,  154, Node Address: 0x0004 

7. Pressing Button3 on the LPN node now results in friendship timing out:

<t:    1562101>, main.c,  337, Button 2 pressed
<t:    1562104>, main.c,  277, Initiating the friendship establishment procedure.
<t:    1778877>, main.c,  441, Friend Request timed out

This friendship timeout continues:

- even when many attempts are made

- even after >20 minutes

8. When the light_switch_server is reset (turned off and back on), it will again accept friend requests:

<t:    1562101>, main.c,  337, Button 2 pressed
<t:    1562104>, main.c,  277, Initiating the friendship establishment procedure.
<t:    1778877>, main.c,  441, Friend Request timed out
<t:    7380842>, main.c,  337, Button 2 pressed
<t:    7380845>, main.c,  277, Initiating the friendship establishment procedure.
<t:    7597617>, main.c,  441, Friend Request timed out

# light_switch_server is reset here 

<t:   13608118>, main.c,  337, Button 2 pressed
<t:   13608121>, main.c,  277, Initiating the friendship establishment procedure.
<t:   13611518>, main.c,  408, Received friend offer from 0x0003
<t:   13618461>, main.c,  455, Friendship established with: 0x0003

Versions:

0. nRF52-DK PCA10040 == nRF52832

1. Mesh SDK version 4.0.0

2. SDK version 16.0.0

3. SoftDevice s132_7.0.1

Parents
  • Hi. 

    Thank you for the report. I'll try to reproduce this from the steps you described and investigate the issue. 

    Could you also tell me which version of the nRF5 SDK for Mesh you are working with?

    I'll get back to you with more information. 

    Best regards, 
    Joakim

    EDIT: 
    Just noticed that you listed the version at the bottom of your question. 

  • Thank you.

    Were you able to reproduce? Any update?

    Help much appreciated.

  • Hi. 

    Sorry about the delay. 

    Yes, I'm seeing the same behavior as you do. Investigating what is causing this. 

    Will get back to you with more information. 

    Regards, 
    Joakim

  • Hi again. 

    This is actually expected behavior. 
    When you are testing the LPN / Light switch server (LSS) examples: 
    If the LPN is reset / unprovisioned from the app the sequence number on the LPN node will reset to zeros. Therefore, when the LPN is re-provisioned with the same address, all the messages sent by the LPN will get filtered by the replay protection mechanism of the LSS. This will continue to happen until the LPN starts sending messages with a higher sequence number than the one stored in the LSS replay list. 

    The reason that you are able to re-establish the friendship with the LSS when it's power cycled / reset is that the replay list will be blank upon power cycle. 

    Best regards, 
    Joakim Jakobsen

  • I see, thank you for looking into this.

    So, how do we commission a device and have it be able to communicate from the start?

    Should we:

    - query LSS for the sequence number and then set this on the LPN when provisioning?

    - send a message to LSS to reset the sequence number?

    - some other mechanism?

    This is quite a noticeable bug when an LPN in a large network is re-provisioned, suddenly it "doesn't work" and the only solution currently is to turn off breakers and reset devices in the ceiling.

  • This can't actually be classified as a bug, as this is the expected behavior of the replay protection. If a device with the same address starts sending messages with a lower sequence number than expected, the replay protection should filter these messages. This does not only affect the LPN. 

    You can manually clear the replay list on the LSS. Note that this is not recommended as this will cause a security issue since a device will then allow all incoming messages – enabling an attacker to replay old messages.

    If you need to re-provision a device, I recommend that you provision it with a new address. This will allow your device to initiate the friendship.

  • I see what you are saying, but the net effect is:

    1. Device is removed and reprovisioned with nRF Mesh App

    2. Device can no longer talk to network

    I see how what the mesh stack is doing follows the specification, but the objective behavior for users is broken.

    Simply removing ONE device and adding a NEW, DIFFERENT device (which is given the same address by the nRF Mesh App) will trigger this bug - it is very easy to trigger in usual operations and results in "broken" behavior where the newly provisioned device cannot talk to the mesh.

    Perhaps the nRF Mesh App might:
    1. Set the correct sequence number on LPN when provisioning?

    2. Track all past addresses and only provision never-before-used addresses?

    3. Any other ideas?

    Thank you

Reply
  • I see what you are saying, but the net effect is:

    1. Device is removed and reprovisioned with nRF Mesh App

    2. Device can no longer talk to network

    I see how what the mesh stack is doing follows the specification, but the objective behavior for users is broken.

    Simply removing ONE device and adding a NEW, DIFFERENT device (which is given the same address by the nRF Mesh App) will trigger this bug - it is very easy to trigger in usual operations and results in "broken" behavior where the newly provisioned device cannot talk to the mesh.

    Perhaps the nRF Mesh App might:
    1. Set the correct sequence number on LPN when provisioning?

    2. Track all past addresses and only provision never-before-used addresses?

    3. Any other ideas?

    Thank you

Children
  • We do appreciate the feedback. and I'll forward this internally so that it can be considered for any future releases. 

    I would like to note that a power cycle of the device shouldn't clear the replay list. For optimal security with regards to the replay protection, this should be saved to flash. I do believe this is going to be changed in a future release of the nRF5 SDK for Mesh. 

    Also, the nRF Mesh app isn't actually supposed to be a used in a finalized product, but more as a development tool and a template for developing your own application. As a development tool, it might be good to have the option to provision a device with the same address. That way you can test that the replay protection actually works for your product.

    Best regards, 
    Joakim

  • Thank you, response much appreciated.

    I fear this problem is deeper than just "don't use the nRF Mesh App for production" (we are not, but reporting issues with another app in the past I have been asked to reproduce with you nRF Mesh App).

    As you have pointed out, once the mesh stack starts saving the replay list to flash, there will be NO way to re-provision a device with the same address and have it work reliably.

    The problem goes deeper: there seems to be no key in the underlying JSON format to track previously-used addresses.

    My understanding is that this JSON format follows a standard schema published by Buetooth SIG, yes?

    So, again, the question arises, we should be able to do something to preempt this replay list issue, should it be:

    1. Provisioner sets the correct sequence number on LPN when provisioning?

    2. Replay list is reset at provisioning-time somehow?

    3. JSON schema is extended to track all past addresses?

    4. Any other ideas?

    We are looking for a hint about the proper approach to tackle this from our end.

  • Thanks. 

    I'll forward this to our Mesh developers, so that they can comment on this. Will update the ticket when I get any feedback from them. 

    Br, 
    Joakim

  • Hello, any new information on this?

    Would be great to be able to resolve this issue, it shows up quite a bit when deploying mesh networks in the field.

  • Hi.

    I currently don't have any news about this.
    If you are having issues with this I would suggest to be sure to provision new devices using new addresses.
    If you need to use the same address you could manually clear the replay list, although it is not recommended due to security reasons.

    I will update the ticket if there is any new information.

Related