Distance measurements (Nordic Distance Measurement Library) can result in a hung state, but only on the DM_ROLE_INITIATOR role device

I am an experienced user of the Nordic Distance Measurement Library and I am encountering an issue where the device that is assuming the role of the DM_ROLE_INITIATOR can hang at random, but when it occurs, it will always be at the completion of a distance measurement - where there appears to be a timeout on the RX part (as shown below on the left) and the device never reaches the TX part.



Not all 'timeout' waveforms result in a hung state, but shown below is an example of what occurs just prior to the hung state when current draw then remains high (multiple mAs).



Shown below is the image above, but zoomed out.



When I swap the role to DM_ROLE_REFLECTOR (and no other changes to the code are made), then the device never runs into the issue (but of course, the other device that is now the DM_ROLE_INITIATOR will eventually encounter the issue instead).

The issue doesn't relate to synchronization, as both devices are happily synchronized and performing multiple distance measurements up until this issue occurs.

I am using multiple nRF52833 devices (including a nRF52833-DK on which the issue can occur) and using SDK v2.8.0 and toolchain v2.8.0.

My hunch is that the code in the nrf_dm/dm module that listens for a transmission from the reflector is designed to timeout if that transmission fails to show, but sometimes this timeout is not being handled correctly. In other words, in all examples of this issue, the initial ranging is successful, but the tail end of the distance measurement seems to fail.

  • Hi Simon,

    Lowering req
    .extra_window_time_us to 500 from 2000, and/or reverting CONFIG_DM_RANGING_OFFSET_US back to the default value of 1,200,000 from 20,000 has noticeably reduced the rate at which the issue occurs. Also, now when the issue occurs, the nRF52833 only momentarily hangs before automatically resetting itself.

    However, the issue still occurs, just on the order of every 15 minutes, rather than more frequently than that.

    Below you can see that the second current waveform shows the NRF_DM_STATUS_EVENT_FAIL_TIMEOUT occurring and then in this instance, the device hangs for a period of time before automatically resetting.



    "One thought from one of our devs: nrf_dm_proc_execute takes a timeout value as an argument, so it might be that this timeout isn't handled correctly in all cases."

    I'm reluctant to lower the req.extra_window_time_us even further (e.g. 0) for the final implementation, but I'm starting to wonder if this issue only occurs when any of the time-related variables are set to anything but the default values?
  • Hello,

    Simon is out of office as he needed to attend to a work-related matter. He will be back next week, and you can expect a response early next week.
    Thank you for your patience.

    Kind regards,

    Abhijith

Related