Mesh SDK: Unable to publish to nodes from interactive_pyaci without doing a reset

Hi,

I have provisioned some nodes using interactive_pyaci successfully. Functionality is as expected. As a test, I then shut interactive_pyaci down, rebooted the nRF module and I can successfully re-add all the nodes from the database. Still able to publish and subscribe to the nodes. This exercise was to emulate a gateway going down and coming back up.

Then I exited, waiting over the weekend before coming back and doing the same test. This time, it doesn't work. On the nodes, I see this on RTT:

00> <t:    7089681>, net_packet.c,  228, Unencrypted data: : 0011EE00C401A86E455CA8B264AACA685665

In interactive_pyaci, no response is received:

In [72]: cc.composition_data_get()

>>>>>>>>>>>>>>>>>> OpCode: 0xab
>>>>>>>>>>>>>>>>>> Data: 0x0e 0xab 0x09 0x00 0x01 0x00 0x01 0x00 0x08 0x00 0x00 0x00 0x80 0x08 0x00

<<<<<<<<<<<<<<<<<<<<< Event name:  CmdRsp
<<<<<<<<<<<<<<<<<<<<< OpCode: 0x84
<<<<<<<<<<<<<<<<<<<<< Payload:  {'opcode': 171, 'status': 0, 'data': bytearray(b'1\x00\x00\x00')}
In [73]: <<<<<<<<<<<<<<<<<<<<< Data field: 0x31 0x00 0x00 0x00

2022-02-15 10:00:29,807 - INFO - ttyAMA0: PacketSend: {'token': 49}
<<<<<<<<<<<<<<<<<<<<< Event name:  MeshTxComplete
<<<<<<<<<<<<<<<<<<<<< OpCode: 0xd2
<<<<<<<<<<<<<<<<<<<<< Payload:  {'token': 49}

2022-02-15 10:00:29,860 - INFO - ttyAMA0: {event: MeshTxComplete, data: {'token': 49}}

This probably means that the messages are received by the mesh but are not accepted by the nodes. I tried a reset on the node I'm publishing to and finally I get the output I expected.

I am thinking the IV index may be outdated but is there any other reason for this issue? IIRC IV indexes shouldn't be out of range until well over 20 days.

P.S. I am aware that interactive_pyaci is not meant to be used in a production setting for a gateway. I am monitoring its protocols and creating my own production gateway software.

Parents
  • Hi Muhammad, 

    I got some feedback from the team. To be able to rule out the issue with IV Index or sequence number, we would suggest to do the following: 

    To rule out the problem (or confirm it is the issue), it will great, if customer can print the IV index being used in PyACI, and also print the IV index being used on the nodes (by adding debug print here).
    If it is an IV problem, customer can use "NetStateGet" to get the old SeqNumber and IV index, and then use the "NetStateSet" to set the new IV index (and subsequent SEQ number block). This should solve the IV issue.
  • Hi,

    I've carried out logging as your engineer has recommended. Here are the logs:

    On the nodes:

    00> <t:    6705511>, network.c,  313, <t:   13118908>, net_packet.c,  228, Unencrypted data: : 001000A998F4BDBB6884

    It seems that it is not reaching the packet_send method in the first place but is receiving the unencrypted messages.

    On the interactive_pyaci module:

    00> <t:          0>, main.c,  211, ----- Bluetooth Mesh Serial Interface Application -----
    00> <t:       9116>, main.c,  199, Initializing serial interface...
    00> <t:       9120>, serial_handler_prov.c,  267, Generating encryption keypair...
    00> <t:      13993>, nrf_mesh_serial.c,   80, Serial Device Init...
    00> <t:      13997>, nrf_mesh_serial.c,   82, Serial Mesh Init...
    00> <t:      14001>, nrf_mesh_serial.c,   84, Serial DFU Init...
    00> <t:      14005>, main.c,  201, Done initializing serial interface...
    00> <t:      14008>, main.c,  226, Initialization complete!
    00> <t:      14012>, main.c,  231, Enabling serial interface...
    00> <t:      14016>, mesh_app_utils.c,   66, Device UUID (raw): 5DB126A8D95F4D7A89D047D531BCBC26
    00> <t:      14021>, mesh_app_utils.c,   67, Device UUID : 5DB126A8-D95F-4D7A-89D0-47D531BCBC26
    00> <t:      14039>, main.c,  238, Bluetooth Mesh Serial Interface Application started!
    00> <t:    1931675>, network.c,  312, IV Index: 0
    00> <t:    1931678>, network.c,  313, Seq num: 0
    00> <t:    2124489>, network.c,  312, IV Index: 0
    00> <t:    2124492>, network.c,  313, Seq num: 1
    00> <t:    2833957>, network.c,  312, IV Index: 0
    00> <t:    2833961>, network.c,  313, Seq num: 2
    00> <t:    3015344>, network.c,  312, IV Index: 0
    00> <t:    3015347>, network.c,  313, Seq num: 3
    00> <t:    3230553>, network.c,  312, IV Index: 0
    00> <t:    3230556>, network.c,  313, Seq num: 4
    00> <t:    3491807>, network.c,  312, IV Index: 0
    00> <t:    3491810>, network.c,  313, Seq num: 5
    00> <t:    3677322>, network.c,  312, IV Index: 0
    00> <t:    3677325>, network.c,  313, Seq num: 6
    00> <t:    3794876>, network.c,  312, IV Index: 0
    00> <t:    3794879>, network.c,  313, Seq num: 7
    00> <t:    4498817>, network.c,  312, IV Index: 0
    00> <t:    4498821>, network.c,  313, Seq num: 8

    I am running cc.composition_get() a few times and the packets are indeed being sent out.

    These are the commands I am using to re-initialize my nodes:

    db = MeshDB("database/example_database.json")
    p = Provisioner(device, db)
    device.send(cmd.DevkeyAdd(db.nodes[0].unicast_address, 0, db.nodes[0].device_key))
    device.send(cmd.AddrPublicationAdd(db.nodes[0].unicast_address))
    cc = ConfigurationClient(db)
    device.model_add(cc)
    cc.publish_set(8, 0)
    cc.composition_data_get() // No response 

    I printed the current IV index of the node and it still says 0. So maybe its a configuration problem I have made? But again doing a reset on the node solves it.

  • Hi Muhammad, 
    I suspect it's the sequence block issue. 
    So for anti replay attack the receiver will ignore packet that comes from a node but have older sequence number than the last packet from the same node. 

    Please do what the developer suggested, increase the sequence SEQ number block using NetStateSet. The sequence block to set is actually the sequence number (not the "block"). As long as it's bigger than the last sequence number used by the device it should be fine. 

    I'm not sure sequence number 8 in your log is before or after the reset. But if you reset you need to set it to higher number. I would suggest to make a jump of 8192 (That's what we do in normal mesh node, where we jump NETWORK_SEQNUM_FLASH_BLOCK_SIZE after each reset) 

    /**
    * Number of sequence numbers between every write to flash. In case of power failure, the device
    * will resume transmissions with the first sequence number in the next block. A larger block size
    * means that the device can do fewer resets between every IV Update, while a smaller will result
    * in a reduced lifetime for the flash hardware.
    */
    #ifndef NETWORK_SEQNUM_FLASH_BLOCK_SIZE
    #define NETWORK_SEQNUM_FLASH_BLOCK_SIZE 8192ul
    #endif

Reply
  • Hi Muhammad, 
    I suspect it's the sequence block issue. 
    So for anti replay attack the receiver will ignore packet that comes from a node but have older sequence number than the last packet from the same node. 

    Please do what the developer suggested, increase the sequence SEQ number block using NetStateSet. The sequence block to set is actually the sequence number (not the "block"). As long as it's bigger than the last sequence number used by the device it should be fine. 

    I'm not sure sequence number 8 in your log is before or after the reset. But if you reset you need to set it to higher number. I would suggest to make a jump of 8192 (That's what we do in normal mesh node, where we jump NETWORK_SEQNUM_FLASH_BLOCK_SIZE after each reset) 

    /**
    * Number of sequence numbers between every write to flash. In case of power failure, the device
    * will resume transmissions with the first sequence number in the next block. A larger block size
    * means that the device can do fewer resets between every IV Update, while a smaller will result
    * in a reduced lifetime for the flash hardware.
    */
    #ifndef NETWORK_SEQNUM_FLASH_BLOCK_SIZE
    #define NETWORK_SEQNUM_FLASH_BLOCK_SIZE 8192ul
    #endif

Children
  • Hi,

    I had to wait for this issue to crop up again before I could try your advice, so thank you for your patience. It works in general. After trying these commands, the network resumes where it left off:

    In [13]: device.send(cmd.NetStateGet())
    
    >>>>>>>>>>>>>>>>>> OpCode: 0xaf
    >>>>>>>>>>>>>>>>>> Data: 0x01 0xaf
    
    <<<<<<<<<<<<<<<<<<<<< Event name:  CmdRsp
    <<<<<<<<<<<<<<<<<<<<< OpCode: 0x84
    <<<<<<<<<<<<<<<<<<<<< Payload:  {'opcode': 175, 'status': 0, 'data': bytearray(b'\x00\x00\x00\x00\x00\x00\x00\x00 \x00\x00')}
    <<<<<<<<<<<<<<<<<<<<< Data field: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x20 0x00 0x00
    
    2022-03-14 05:28:04,814 - INFO - ttyAMA0: NetStateGet: {'iv_index': 0, 'iv_update_in_progress': 0, 'iv_update_timout_counter': 0, 'next_seqnum_block': 8192}
    In [14]: device.send(cmd.NetStateSet(0, 0, 0, 8192))
    
    >>>>>>>>>>>>>>>>>> OpCode: 0xae
    >>>>>>>>>>>>>>>>>> Data: 0x0c 0xae 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x20 0x00 0x00
    
    <<<<<<<<<<<<<<<<<<<<< Event name:  MeshIvEntrySetNotification
    <<<<<<<<<<<<<<<<<<<<< OpCode: 0xd9
    <<<<<<<<<<<<<<<<<<<<< Payload:  {'iv_index': 0, 'iv_update_in_progress': 0, 'iv_update_timeout_counter': 0}
    
    2022-03-14 05:29:17,625 - INFO - ttyAMA0: {event: MeshIvEntrySetNotification, data: {'iv_index': 0, 'iv_update_in_progress': 0, 'iv_update_timeout_counter': 0}}
    <<<<<<<<<<<<<<<<<<<<< Event name:  MeshIvUpdateNotification
    <<<<<<<<<<<<<<<<<<<<< OpCode: 0xd3
    <<<<<<<<<<<<<<<<<<<<< Payload:  {'iv_index': 0}
    
    2022-03-14 05:29:17,656 - INFO - ttyAMA0: {event: MeshIvUpdateNotification, data: {'iv_index': 0}}
    In [15]: <<<<<<<<<<<<<<<<<<<<< Event name:  MeshSeqnumEntrySetNotification
    <<<<<<<<<<<<<<<<<<<<< OpCode: 0xda
    <<<<<<<<<<<<<<<<<<<<< Payload:  {'next_block': 16384}
    
    2022-03-14 05:29:17,694 - INFO - ttyAMA0: {event: MeshSeqnumEntrySetNotification, data: {'next_block': 16384}}
    <<<<<<<<<<<<<<<<<<<<< Event name:  CmdRsp
    <<<<<<<<<<<<<<<<<<<<< OpCode: 0x84
    <<<<<<<<<<<<<<<<<<<<< Payload:  {'opcode': 174, 'status': 0, 'data': bytearray(b'')}
    
    2022-03-14 05:29:17,735 - INFO - ttyAMA0: Success

    One issue I find, is that I cannot issue NetStateSet command again without resetting the nrf module. Is this intended?

    In [14]: device.send(cmd.NetStateSet(0, 0, 0, 16384))
    >>>>>>>>>>>> Opcode: 0xae
    >>>>>>>>>>>> Data: 0x0c 0xae 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x40 0x00
    
    <<<<<<<<<<<< Event name: CmdRsp
    <<<<<<<<<<<< OpCode: 0x84
    <<<<<<<<<<<< Payload: {'opcode': 174, 'status': 131, 'data': bytearray(b'')}
    
    2022-03-14 06:56:43,550 - ERROR - ttyAMA0: None: ERROR_INVALID_STATE

  • Hi Muhammad, 

    It's correct. You can only set the net states externally once after the mesh stack is initialized. 
    Here is the description of the net states set handler in the stack: 


    /**
    * Sets the initial IV index and IV update state, and starts sequence number from the supplied block.
    *
    * @param[in] iv_index Initial IV index value.
    * @param[in] iv_update @c true if an IV index update is in progress.
    * @param[in] next_seqnum_block The next seqnum block to allocate.
    *
    * @retval NRF_SUCCESS Successfully set the IV index and IV update state.
    * @retval NRF_ERROR_INVALID_STATE The function was called when the IV index and update state was
    * already set.
    */
    uint32_t net_state_iv_index_and_seqnum_block_set(uint32_t iv_index, bool iv_update, uint32_t next_seqnum_block);

  • Great! One last question about this. I reset my gateway again, added the keys then set the gateway's sequence number to 8192 to talk to my nodes, but the nodes themselves are on sequence number 630+.

    I couldn't communicate to them again till I reset one more time and set my sequence number to 16384. I reset the gateway 5 minutes later and it worked with 16384. I tried it again an hour later, and this time had to set the sequence number to 24576.

    I know you said 

    But if you reset you need to set it to higher number. I would suggest to make a jump of 8192 (That's what we do in normal mesh node, where we jump NETWORK_SEQNUM_FLASH_BLOCK_SIZE after each reset) 

    so that explains the first part. But why does it work sometimes if I don't jump one block size of 8192?

  • Hi Muhammad, 


    I don't have an explanation for that. I would suggest to print out some log on the node's side to check why it sometimes works with older sequence number. It's handled in the replay_cache_has_elem() function. You can find this function is called inside transport_packet_in() in transport.c . If the a packet from the same source is already in the cache and if the sequence number is lower than the packet cached then it will be rejected. 

    My suspicion is that there was no packet from the same source is stored, could be due to that the node was reset earlier (replay cache is not stored into flash as far as I remember). 

Related