Bluetooth mesh configuration failure

board: nrf52832

nRF5 SDK version: v17.0.0

nRF5 SDK for Mesh version: v5.0.0

softdevice: S332

application: light switch client + light switch server + provisioner + ble_ant_app_hrm + ble_app_uart_coexist

My application have two characters to choose: Provisioner/self provisioned Client and Server, user can send UART command to select one of them to initialize on device,

and I delete normal BLE  (NUS service)

The problem is, when I set one device as Provisioner/self provisioned Client, and two devices as Server,

the first Server would be provisioned and configured successfully, and the provision of second Server would success, but the configuration of second Server failed occasionally

But, if I turn the first Server off before the provision process of second Server start, the provision and configuration of the second Server would all success

(Once Provisioner/self provisioned Client detect one unprovision Server, ANT+ channel will close automatically)

So I check configuration process, and found it often stuck in receiving opcode CONFIG_OPCODE_MODEL_APP_STATUS or CONFIG_OPCODE_APPKEY_STATUS, although Server receive opcode CONFIG_OPCODE_MODEL_APP_BIND or CONFIG_OPCODE_APPKEY_ADD and reply successfully (return NRF_SUCCESS),

but Provisioner/self provisioned Client didn't receive this reply, although Provisioner/self provisioned Client retry send APP_ADDKEY and wait for an ACK from Server for 2-3 times,

this situation keeps occur

Log:

Provisioner/Client print config_step_execute() in node_setup.c to check what opcode it receive (the value after ":" is the status of message)

Server print send_reply() in config_server.c to check the status of transmission and what opcode it send  (the first value is return value of send_reply(), the second value is opcode id it send)

  • the process of configuration of the first Server (success)        Provisioner/Client                                                                   the first Server                                                               

                          

  • the process of configuration of the second Server (fail)

                    Provisioner/Client                                                                                                     the second Server

                         

 

Then I check  function scanner_rx() in scanner.c of Provisioner/self provisioned Client, filter MAC address of the second Server to see if Provisioner/self provisioned Client receive messages from the second Server, I found Provisioner/self provisioned Client keeps receive messages of the second Server like below 

  • Provisioner/Client:

            one "Get" means get one message from the second Server ( recognized by MAC address) ( scanner_rx() in scanner.c)

            "receive opcode: (mesh_msg_handle() in access.c)            

            

  • the second Server

             "receive opcode: (mesh_msg_handle() in access.c)   

             "access_model_reply()" (send_reply() in config_server.c)             

            

  • the first Server:have been provisioned and configured, help me to print the payload of the second Server ( recognized by MAC address), 1-31 bytes are payload, the last byte is header type, '#' is the end of line  ( scanner_rx() in scanner.c)

            

it seems like Provisioner/self provisioned Client can receive messages from the second Server but the ACK of Configuration, it confuses me, is there any way to analyze the payload from the second Server?  or Is there any chance that the ACK message of Configuration is filtered out by application of Provisioner/Client?

BTW, if all devices have been provisioned and configured, the communication of two characters works well,

and I have tried increase advertising interval to 100 ms and this (change SCANNER_BUFFER_SIZE to 1024), but it didn't work to me.

Parents Reply
  • Hey Erin!

    Sorry this is taking so long. I have been asking internally about this, and our mesh team has been a bit busy lately.

    An ACK message getting filtered out doesn't seem like the most likely scenario to me. I would rather think that there is an issue with the provisioner+client node, or that the extra node leads to package collisions or that the message fails to arrive due to other reasons. 

    Could you try increasing the retransmit count? It is 0 by default. If this doesn't help we should take a closer look at the provisioner+client. 

    Best regards,

    Elfving

Children
  • Hi Elfving,
    Thank you for your reply, I've try increace the CONFIG_RETRANSMIT_COUNT_MAX (/models/foundation/config/include/config_messages.h), but it still fail ocasionally..

    To simplify this issue, I restart from downloading the SDK, and migrate example projects again.

    Version (Migration):
    PC: ubunrtu 16.04
    board: nrf52832
    nRF5 SDK version: nRF5_SDK_17.1.0_ddde560
    nRF5 SDK for Mesh version: v5.0.0
    softdevice: S332
    toolchain: armgcc

    The steps of example projects migration:
    [ ble_ant_app_hrm + sdk_coexist(light switch client) ] ===> clients (migration ver.)
    step 1. mesh SDK merge into nRF SDK
    step 2. /examples/sdk_coexist/ble_app_uart_coexist/pca10040/s132/config/sdk_config.h merge into /examples/multiprotocol/ble_ant_app_hrm/pca10040/s332/config/sdk_config.h
    step 3. nrf_mesh.h and hal.c add "definedS332"
    step 4. combain main.c of ble_ant_app_hrm + sdk_coexist
    step 5. makefile add relative .c .h CFLAGS
    step 6. create a folder "include", put /examples/sdk_coexist/ble_app_uart_coexist/nrf_mesh_config_app.h into it
    step 7. put /examples/sdk_coexist/ble_app_uart_coexist/mesh_main.h and /examples/sdk_coexist/ble_app_uart_coexist/mesh_main.c into project folder
    step 8. nrf_mesh_config_app.h add #define GENERIC_DTT_SERVER_INSTANCES_MAX (1), #define GENERIC_ONOFF_SERVER_INSTANCES_MAX (1) and #define SCENE_SETUP_SERVER_INSTANCES_MAX (0)
    step 9. put /examples/light_switch/server/include/app_config.h into "include" folder of project
    step 10. #define APP_TIMER_CONFIG_RTC_FREQUENCY 0, #define NRF_SDH_BLE_GATT_MAX_MTU_SIZE 69 (in sdk_config.h)
    step 11. #define NRF_SDH_BLE_GATTS_ATTR_TAB_SIZE 2400, #define NRF_SDH_BLE_VS_UUID_COUNT 1 (in sdk_config.h)
    step 12. RAM (rwx):ORIGIN=0x20002968, LENGTH=0xD698 (in ble_ant_hrm_gcc_nrf52.ld)
    step 13. add .nrf_mesh_ram and .nrf_mesh_flash (in ble_ant_hrm_gcc_nrf52.ld)
    step 14. #define ACCESS_ELEMENT_COUNT (2) (in nrf_mesh_config_app.h)
    step 15. coment out ANT+ channel open (in main.c)

    [ ble_ant_app_hrm + sdk_coexist(light switch client) + light switch server ] ===> server (migration ver.)
    step 1. base on clients (migration ver.), replace the content of mesh_main.c to the content of /examples/light_switch/server/main.c
    step 2. delete useless funtion and change function name which depands on mesh_main.h (mesh_main.c)

    [ ble_ant_app_hrm + sdk_coexist(light switch client) + provisioner ]
    step 1. put /examples/provisioner/src/node_setup.c and /examples/provisioner/src/provisioner_helper.c into project folder
    step 2. put all file in /examples/provisioner/include into "include folder of project folder, except app_config.h and nrf_mesh_config_app.h
    step 3. #define MODEL_ACKNOWLEDGED_TRANSACTION_TIMEOUT (SEC_TO_US(10)), #define ACCESS_DEFAULT_TTL (MAX_PROVISIONEE_NUMBER > NRF_MESH_TTL_MAX ? NRF_MESH_TTL_MAX : MAX_PROVISIONEE_NUMBER) (nrf_mesh_config_app.h)
    step 4. #define ACCESS_MODEL_COUNT(4) (nrf_mesh_config_app.h)
    step 5. #define CONFIG_SCENARIO_COMMON delete NODE_SETUP_CONFIG_APPKEY_BIND_HEALTH and NODE_SETUP_CONFIG_PUBLICATION_HEALTH (config_scenarios.h)
    step 6. #define CONFIG_SCENARIO_COMMON delete NODE_SETUP_CONFIG_APPKEY_BIND_HEALTH and NODE_SETUP_CONFIG_PUBLICATION_HEALTH (config_scenarios.h)
    step 7. #define CONFIG_SCENARIO_LIGHT_SWITCH_CLIENT_EXAMPLE delete the second CONFIG_ONOFF_CLIENT (config_scenarios.h)
    step 8. #define CONFIG_SCENARIO_LIGHT_SWITCH_SERVER_EXAMPLE delete the second CONFIG_ONOFF_SERVER (config_scenarios.h)
    step 9. #define APP_TIMER_KEEPS_RTC_ACTIVE 1 (in sdk_config.h)
    step 10. makefile add relative .c
    step 11. /examples/provisioner/src/main.c merge into mesh_main.c
    step 12. #define DSM_SUBNET_MAX(20), #define DSM_APP_MAX(35), #define DSM_DEVICE_MAX(35), #define DSM_VIRTUAL_ADDR_MAX(15), #define DSM_NONVIRTUAL_ADDR_MAX(15) (nrf_mesh_config_app.h)

    [ ble_ant_app_hrm + sdk_coexist(light switch client) + provisioner(light switch client self provision) ] ===> Provisioner/Client (migration ver.)
    step 1. in struct network_dsm_handles_data_volatile_t, replace health instance to generic_onoff_client_t client_instance (network_setup_types.h)
    step 2. app_default_models_bind_setup() add publish and subscription setting of light switch client (mesh_main.c)
    step 3. models_init_cb() add calback function setting of light switch client (mesh_main.c)
    step 4. mesh_main_button_event_handler() add message publish function of light switch client (mesh_main.c)
    step 5. coment out all normal BLE service (main.c)

    When use the Provisioner/Client (migration ver.) to provision and configure two devices, these situations would happen:
    (1)provision and configure two servers (migration ver.), configuration data transmition retry many times ===> might okay, but the process of the second server take time and fail ocasionally
    (2)after the second server (migration ver.) had provisioned and configured, the second server (migration ver.) delete provision and configuration and Provisioner/Client (migration ver.) immediately ===> might okay, and the process of provision and configuration will implement smoothly but fail ocasionally
    (3)provision and configure two clients (migration ver.) ===> totally ok, even the second client would be provisioned and configured smoothly
    (4)provision and configure two servers (migration ver.), and trun the first server off ===> totally ok

    And I also use the Provisioner/Client (migration ver.) to provision and configure the light switch example form Mesh SDK
    Version (light switch example form Mesh SDK):
    PC: ubunrtu 20.04
    board: nrf52832
    nRF5 SDK version: nRF5_SDK_17.0.2_d674dde
    nRF5 SDK for Mesh version: v5.0.0
    softdevice: S332
    toolchain: armgcc

    (5)provision and configure two servers (example form Mesh SDK ver.) ===> totally ok, even the second server would be provisioned and configured smoothly

    So, it seems like provisioner/client is work normally, but the server(migration ver.) have some unknow issue.
    Are there any configurations I didn't set? or do you have any suggestions for me to try?

    The archive file below is my Client (migration ver.), server (migration ver.), Provisioner/Client (migration ver.) project, hope it helps!

    4643.nordic_devzone_mesh_bluetooth_mesh_configuration_failure.zip

    Regards,
    Erin

  • Hey Erin!

    It seems that your provisioner is incrementing the address with 1, while there are two elements in the servers. Which makes the addresses overlap (one node gets one address along with the first element, and the second element of that node gets the next address).That might what is making a mess.

    In either case it seems like we are looking at address collisions here, which typically makes very undefined behavior. 

    Best regards,

    Elfving

  • Hi Elfving,

    Thanks you for your reply!

    I modify #define ACCESS_ELEMENT_COUNT from 2 to 1 (nrf_mesh_config_app.h) on server (migration ver), and it works!

    but in my project  ble_ant_app_hrm + sdk_coexist(light switch client) + provisioner(light switch client self provision), it contain two mode to switch, when it switch to provisioner/client mode, 2 elements should be initialized, and when it switch to server mode, 1 elements should be initialized

    so, I do some changes:

    modify #define ACCESS_ELEMENT_COUNT 1 (nrf_mesh_config_app.h)

    add#define CLIENT_ACCESS_ELEMENT_COUNT 2 (nrf_mesh_config_app.h)

    ACCESS_ELEMENT_COUNT ===> CLIENT_ACCESS_ELEMENT_COUNT (provisioner_helper.c)

    ACCESS_ELEMENT_COUNT ===> CLIENT_ACCESS_ELEMENT_COUNT (access.c)

    ACCESS_ELEMENT_COUNT ===> CLIENT_ACCESS_ELEMENT_COUNT (composition_data.h)

    ACCESS_ELEMENT_COUNT ===> CLIENT_ACCESS_ELEMENT_COUNT (composition_data.c)

    functions related to provisioner/client mode, ACCESS_ELEMENT_COUNT (value:1) ===> CLIENT_ACCESS_ELEMENT_COUNT (value:2) 

    it solved this provision issue, but is this modification ok? or do you have any suggestions for me to try?

    also,"your provisioner is incrementing the address with 1", what address do you mean? and why the second element of the first server gets the next address but it didn't show any log?

    Regards,

    Erin 

  • Hey Erin!

    erin_hong said:

    also,"your provisioner is incrementing the address with 1", what address do you mean? and why the second element of the first server gets the next address but it didn't show any log?

    The unicast address. The address to to a node is for instance 0x003, which is also the address to its first element. If the node had 3 elements their address would automatically be 0x003,0x004, 0x005. If the provisioner has already given 0x004 to another node then we will have overlap and address collisions and generally undefined behavior.

    That is why it works if you start provisioning the server with 1 element first: the first server will eg. get address 0x003 (along with element 1), and the second server gets address 0x004(along with its first element, and the second element gets address 0x005). If you start with the server with two elements then you get addresses 0x003 and 0x004 on the first server and 0x004 on the second server. Which leads to address collisions.

    erin_hong said:

    it solved this provision issue, but is this modification ok? or do you have any suggestions for me to try?

    Yeah that sounds like something that could work. Though this isn't a very scalable solution as you can see. This example uses a static provisioner, which isn't meant to be used as an embedded provisioner or something in for instance a production context.

    As the documentation says: "It works in a fixed, predefined way and can be used as the static provisioner with the following examples(...)". And "The static provisioner has its own limitations and is provided as a tool to evaluate SDK examples without the need to use a mobile application provisioner."

    If you'd want an embedded provisioner I would advice you to make your own and not base it completely on the provisioner example, but I would rather recommend that you didn't use an embedded device as a provisioner at all. It is such a powerful device that something like a cellphone or host app would be better. An exception would be if you had some sort of gateway unit as an interface between an IP-network and a mesh network.

    For a proof of concept, the nRF Mesh app can also be a great asset. The provisioner example is great if you don't change any of its assumptions, this was done here though.

    Best regards,

    Elfving

  • Hi Elfving,

    Thank you for the information! I will consider it.

    Regards,

    Erin

Related