Communication breakage between nrf52840 chip and host (imx6ull on stress testing)

Upon stress testing of the thread layer communication over a period of time (around 30 mins to 1 hr) the communication (over UART, 115200 baud rate) between the host (imx6ull based SoC) and nrf52840 chip gets disconnected and it does not recover until otbr-agent is restarted.

Continuous commands were sent from controller to the development kit to reproduce the issue.

1. 868b8d791fac9752d154ef0f0614ca15019056a9 - otbr-agent commit id (github.com/.../ot-br-posix)
2. IEEE 802.15.4 hardware platform - nrf52840
3. Build was done using Yocto recipe.
4. Network topology - Star topology (direct communication between controller and development kit)

Expected behaviour is no communication between the nrf52840 chip and the host(imx6ull based SoC).

Should we raise the transmit buffer size and if we can how can it be done?
Or is there any way to increase the tx timeout of a packet to higher value and will it help resolve the issue.

  • Dear Edvin,
    Please find our response for your queries;

    >> This means that you are using custom board files when building, right? Or are you building for the nrf52840dk_nrf52840 board?
    We are building the application that is part of "/ncs/v2.3.0/nrf/samples/openthread/coprocessor". In the "Board" drop down text box of "Build Configuration" we select nrf52840dk_nrf52840.

    >>What protocol are you using between your otbr and the nRF? Is it UART or USB? If it is UART, you will not be able to hook onto the UART. I you are using USB, there is not much use of hooking on.
    Between the OTBR and the nRF, we are using UART. We have modified the nrf52840dk_nrf52840.overlay as per our UART pin configuration between the host & rcp. If required we can share the modified file.

    >> Another thing you can try, regardless, is to attach a debugger and see if you can see any logs from the nRF. Does it say anything when/before it stops working?
    It would be more helpful if you can share us any document to retrieve logs from the nRF through a debugger.

    >> And is it consistent? Does it always happen after 1h? Or only occasionally?
    No, it is not consistent it occurs occasionally. Sometimes after few mins and sometimes after 1hr or 2hrs.

    >> And what HW is your otbr running on? A computer? Raspberry pi?
    Our otbr runs on the NXP
    i.MX 6ULL. The nrf52840dk is connected to "NXP i.MX 6ULL" through UART.

  • Vignesh Ravi said:
    We are building the application that is part of "/ncs/v2.3.0/nrf/samples/openthread/coprocessor". In the "Board" drop down text box of "Build Configuration" we select nrf52840dk_nrf52840.

    Does this mean you are actually using a DK, or are you using a custom board?

    Vignesh Ravi said:
    It would be more helpful if you can share us any document to retrieve logs from the nRF through a debugger.

    If you add these two lines in your prj.conf:

    CONFIG_LOG=y
    CONFIG_LOG_BACKEND_RTT=y

    you should see something like "ieee802154_nrf5: nRF5 802154 radio initialized" when you start the application. To monitor this log, attach a JLink debugger, and open JLink RTT Viewer.

    Best regards,

    Edvin

  • Hi Edvin,

    As suggested, the above two configurations were enabled in the prj.conf. The host is able to properly communicate with ncp module, however only few debug messages were seen in the RTT terminal as captured in the below excerpts;

    SEGGER J-Link V7.86d - Real time terminal output
    SEGGER J-Link V9.7, SN=59701277
    Process: JLinkExe
    [00:00:00.002,349] <inf> ieee802154_nrf5: nRF5 802154 radio initialized
    [00:00:00.002,532]
    <err> qspi_nor: JEDEC id [c2 28 16] expect [c2 28 17] 
    *** Booting Zephyr OS build v3.2.99-ncs2 ***
    [00:00:00.004,211] <inf> coprocessor_sample:

    =========================================================
    OpenThread Coprocessor application is now running on NCS
    =========================================================
     


    Are we missing any other other configurations ?

    In addition, we would also like to notify that PTA configurations were also enabled as part of the RCP firmware. Is this HandleRcpTimeout() issue is due to enabling PTA ? any inputs would be much helpful.

  • At least according to the log, there are no crashes then. 

    Vignesh Ravi said:
    Is this HandleRcpTimeout() issue is due to enabling PTA ?

    Not sure. I have not tested it before. Does it crash if you disable it?

    Did you try to analyse the UART pins? Do you use flow control on the UART?

    BR,

    Edvin

  • Hi Edvin,

    >> At least according to the log, there are no crashes then.    
    For this particular instance of test, we did not wait till the host gets disconnected with rcp. However, we were expecting runtime logs getting continuously captured in the RTT terminal, but there were only few lines of logs captured and then the logs stopped. Are we missing any other other configurations to see the runtime & crash related logs in RTT terminal ? Your support would be much appreciated.    
       



    >> Did you try to analyse the UART pins? Do you use flow control on the UART?
       
    We are yet to start analysing the UART pins. However, the HW flow control is enabled.

Related