Communication breakage between nrf52840 chip and host (imx6ull on stress testing)

Upon stress testing of the thread layer communication over a period of time (around 30 mins to 1 hr) the communication (over UART, 115200 baud rate) between the host (imx6ull based SoC) and nrf52840 chip gets disconnected and it does not recover until otbr-agent is restarted.

Continuous commands were sent from controller to the development kit to reproduce the issue.

1. 868b8d791fac9752d154ef0f0614ca15019056a9 - otbr-agent commit id (github.com/.../ot-br-posix)
2. IEEE 802.15.4 hardware platform - nrf52840
3. Build was done using Yocto recipe.
4. Network topology - Star topology (direct communication between controller and development kit)

Expected behaviour is no communication between the nrf52840 chip and the host(imx6ull based SoC).

Should we raise the transmit buffer size and if we can how can it be done?
Or is there any way to increase the tx timeout of a packet to higher value and will it help resolve the issue.

Parents
  • Expected behaviour is communication breakage should not happen between the nrf52840 chip and the host(imx6ull based SoC).
    Sorry for the confusion

  • Hello,

    What is programmed on the nRF52840? Is it the RCP or NCP sample? Did you do any changes to the application before flashing it?

    What HW are you running on (on the nRF)? Is it a DK or a custom board?

    Did you try to analyze the UART wires? Does the otbr actually send data that the nRF doesn't respond to, or is it dead? (using a logic analyzer, such as the saleae logic analyzer).

    Best regards,

    Edvin

  • Hello Edvin,

    1. RCP is programmed on the nRF52840 and there is no changes in application before flashing.
    2. We are using custom board (nrf integrated with the imx6ull SoC).
    3. Can I hook a JTAG debugger instead of using a logic analyser to check the data transfer, will it be same?

    Thanks,
    Vignesh R
  • This means that you are using custom board files when building, right? Or are you building for the nrf52840dk_nrf52840 board?

    What protocol are you using between your otbr and the nRF? Is it UART or USB? If it is UART, you will not be able to hook onto the UART. I you are using USB, there is not much use of hooking on.

    Another thing you can try, regardless, is to attach a debugger and see if you can see any logs from the nRF. Does it say anything when/before it stops working? And is it consistent? Does it always happen after 1h? Or only occasionally? Did you try the same with a DK instead of the imx6ull? I am not saying it wouldn't work, but if there is a difference in behavior, it may give some pointers to what the issue may be.

    Best regards,

    Edvin

    And what HW is your otbr running on? A computer? Raspberry pi?

  • Dear Edvin,
    Please find our response for your queries;

    >> This means that you are using custom board files when building, right? Or are you building for the nrf52840dk_nrf52840 board?
    We are building the application that is part of "/ncs/v2.3.0/nrf/samples/openthread/coprocessor". In the "Board" drop down text box of "Build Configuration" we select nrf52840dk_nrf52840.

    >>What protocol are you using between your otbr and the nRF? Is it UART or USB? If it is UART, you will not be able to hook onto the UART. I you are using USB, there is not much use of hooking on.
    Between the OTBR and the nRF, we are using UART. We have modified the nrf52840dk_nrf52840.overlay as per our UART pin configuration between the host & rcp. If required we can share the modified file.

    >> Another thing you can try, regardless, is to attach a debugger and see if you can see any logs from the nRF. Does it say anything when/before it stops working?
    It would be more helpful if you can share us any document to retrieve logs from the nRF through a debugger.

    >> And is it consistent? Does it always happen after 1h? Or only occasionally?
    No, it is not consistent it occurs occasionally. Sometimes after few mins and sometimes after 1hr or 2hrs.

    >> And what HW is your otbr running on? A computer? Raspberry pi?
    Our otbr runs on the NXP
    i.MX 6ULL. The nrf52840dk is connected to "NXP i.MX 6ULL" through UART.

  • Vignesh Ravi said:
    We are building the application that is part of "/ncs/v2.3.0/nrf/samples/openthread/coprocessor". In the "Board" drop down text box of "Build Configuration" we select nrf52840dk_nrf52840.

    Does this mean you are actually using a DK, or are you using a custom board?

    Vignesh Ravi said:
    It would be more helpful if you can share us any document to retrieve logs from the nRF through a debugger.

    If you add these two lines in your prj.conf:

    CONFIG_LOG=y
    CONFIG_LOG_BACKEND_RTT=y

    you should see something like "ieee802154_nrf5: nRF5 802154 radio initialized" when you start the application. To monitor this log, attach a JLink debugger, and open JLink RTT Viewer.

    Best regards,

    Edvin

  • Hi Edvin,

    As suggested, the above two configurations were enabled in the prj.conf. The host is able to properly communicate with ncp module, however only few debug messages were seen in the RTT terminal as captured in the below excerpts;

    SEGGER J-Link V7.86d - Real time terminal output
    SEGGER J-Link V9.7, SN=59701277
    Process: JLinkExe
    [00:00:00.002,349] <inf> ieee802154_nrf5: nRF5 802154 radio initialized
    [00:00:00.002,532]
    <err> qspi_nor: JEDEC id [c2 28 16] expect [c2 28 17] 
    *** Booting Zephyr OS build v3.2.99-ncs2 ***
    [00:00:00.004,211] <inf> coprocessor_sample:

    =========================================================
    OpenThread Coprocessor application is now running on NCS
    =========================================================
     


    Are we missing any other other configurations ?

    In addition, we would also like to notify that PTA configurations were also enabled as part of the RCP firmware. Is this HandleRcpTimeout() issue is due to enabling PTA ? any inputs would be much helpful.

Reply
  • Hi Edvin,

    As suggested, the above two configurations were enabled in the prj.conf. The host is able to properly communicate with ncp module, however only few debug messages were seen in the RTT terminal as captured in the below excerpts;

    SEGGER J-Link V7.86d - Real time terminal output
    SEGGER J-Link V9.7, SN=59701277
    Process: JLinkExe
    [00:00:00.002,349] <inf> ieee802154_nrf5: nRF5 802154 radio initialized
    [00:00:00.002,532]
    <err> qspi_nor: JEDEC id [c2 28 16] expect [c2 28 17] 
    *** Booting Zephyr OS build v3.2.99-ncs2 ***
    [00:00:00.004,211] <inf> coprocessor_sample:

    =========================================================
    OpenThread Coprocessor application is now running on NCS
    =========================================================
     


    Are we missing any other other configurations ?

    In addition, we would also like to notify that PTA configurations were also enabled as part of the RCP firmware. Is this HandleRcpTimeout() issue is due to enabling PTA ? any inputs would be much helpful.

Children
  • At least according to the log, there are no crashes then. 

    Vignesh Ravi said:
    Is this HandleRcpTimeout() issue is due to enabling PTA ?

    Not sure. I have not tested it before. Does it crash if you disable it?

    Did you try to analyse the UART pins? Do you use flow control on the UART?

    BR,

    Edvin

  • Hi Edvin,

    >> At least according to the log, there are no crashes then.    
    For this particular instance of test, we did not wait till the host gets disconnected with rcp. However, we were expecting runtime logs getting continuously captured in the RTT terminal, but there were only few lines of logs captured and then the logs stopped. Are we missing any other other configurations to see the runtime & crash related logs in RTT terminal ? Your support would be much appreciated.    
       



    >> Did you try to analyse the UART pins? Do you use flow control on the UART?
       
    We are yet to start analysing the UART pins. However, the HW flow control is enabled.

  • Hello,

    I spoke with our Thread Team. They wanted to ask you a few questions regarding this issue:

    1: Is it a possibility to change the UART baudrate from 115200 to 1000000? When the baudrate is 115200, then the radio is capable of delivering more data than the UART can handle, so the issue could be that this is overflowing, and in that case, the Flow Control will not help, because data is lost due to a UART TX buffer overflow on the nRF.

    2: Regardless of whether it is possible to change to 1M baudrate on UART, where/how did you set the baudrate on the nRF's UART?

    3: They also asked about more details on What/how you are building using Yocto. We assume that it is the border router itself (not the nRF) that is built using Yocto. Can you please verify that? And can you share some more details on this?

    4: And can you share some details on how the nRF and the imx6ull are connected. Are they connected directly via UART, or do you use some UART -> USB bridge? A Voltage level shifter? Anything else in between, or is it just pcb traces carrying the UART signal?

    Best regards,

    Edvin

  • Hi Edvin,

    Thanks for discussing with the Thread team. We shall compile our response for all your queries and get back as quickly as possible.

Related