This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

GZLL transmit fails periodically - dropping burst of packets in a row

I have a somewhat unique GZLL configuration and am running into a bug that I don't really understand.

"Maximum tx retries" is set to 1 using nrf_gzll_set_max_tx_attempts.  The reason for this is that in the case of a packet drop our application has no need to retry the old packet because by the time the retry arrives it is out of date.  We do need to know that it failed so we can send another packet with newer data right away.

When I have data to send in my application I call:
nrf_gzll_add_packet_to_tx_fifo
And monitor:
nrf_gzll_device_tx_success / nrf_gzll_device_tx_failed
To see what the result of the transmit was.
This appears to work as expected with the exception that every 600 packets or so, it hits a tx window where the next 20 or so will all fail reliably (see image from analyzer, pattern is continuous,)
This doesn't seem lik an interference or a resource interrupt issue because if I change the timeslot period it shifts when the stall occurs, linking it pretty clearly to something like "after X number of packets / timeslots " rather than "every X time interval"
Maybe someone with insight into the GZLL / gazell library could shine some light on this?  Appreciate any help.
Parents
  • Hi,

     

    I think this is due to a time drift on host vs. device.

    On the device side, could you try to increment the timeslot with 10 (ie. if its 600, set to 610) to see if this helps the scenario?

     

    Kind regards,

    Håkon

  • Changing the timeslot period does have an impact but negatively affects performance.

    Here I have increased the device timeslot by 10us, packet loss goes up to over 50%.  Similar results with increasing (and lowering) by 20us, and even 1 or 2 us.

    One thing that does not make sense to me that if I increase the "TX Retries" value, even just from 1 to 2, this recovery time seems to go way down (i.e. when it drops a packet the timing seems to re-sync after only 2 timeslots instead of 40 (20+ packets are dropped in a row when it drifts).

    Unfortunately increasing the retries would not work as a solution as we need all available timeslots to be used for newest data.

    Is there another configuration or change that I could try here?

    Currently channel table size is 5, timeslots per channel is 2, and our packet size is 7 bytes, so it seems like it should not take 20 ms to re-sync once it knows it's dropping packets (especially if it only takes 2 ms with retries turned on).

    Appreciate your help.

  • One other potentially illuminating piece of info is that if I slow down my internal data rate so as to send a new packet only every 3rd timeslot or so (in the graphs above I am attempting to transmit data every timeslot), it slows down the dropout period as well.

    So whereas before the dropouts were occurring every 800 ms or so, if I slow down my tx rate to 1/3 (while leaving the timeslot settings the same), the dropouts now occur every 2400 ms.

    This seems to link the dropout timing directly to the number of packets sent.  I don't actually know whether this supports the clock drift theory because I don't have any insight to the GZLL source library, but the documentation in the SDK makes it sound like channel hopping occurs even while packets are not being received, which suggests that this shift in the dropouts should not be linked to the successful packet rate (which it appears to be).

    Very curious as to what is going on here.

  • Hi,

     

    If it goes out-of-sync, the behavior changes (based on what is provided to nrf_gzll_set_timeslots_per_channel_when_device_out_of_sync()). Could you share your configuration of gazell (both host and device)? How large is the channel tab etc.?

     

    Kind regards,

    Håkon

  • Configuration is the same for both host and device:

    Channel Table size: 5 (changing table size or values does not seem to affect the issue)
    Channel Selection Policy: USE_CURRENT
    Max TX Attempts: 1 (If this is set to 0 we get no success or fail callbacks)
    Timeslot period: Have tried many values from 500 - 1000, currently at 600.
    Timeslots per channel: 2
    Timeslots per channel when out of sync: 15 (default value)
    Sync Lifetime: 30 (default value)

    Any other settings would be set to the "default" values.

    When it is running it hits all our performance targets with the exception of these dropouts.
    Our hardware design is clocked from an external 32.0000 MHz crystal so it should have the same accuracy as the reference design.  The host for this data collection is the nRF52840-DONGLE with identical clock settings.

    Appreciate any further suggestions.

  • Hi,

     

    Which version of the SDK are you using? I checked with my colleagues wrt. this issue, and it was mentioned that this has been observed on older SDKs (like in this thread: https://devzone.nordicsemi.com/f/nordic-q-a/11404/gazell-frame-rate-too-low/42985#42985)

     

    Kind regards,

    Håkon

Reply Children
  • This is a new design using SDK version 16.0.0.  This issue is our top priority, so we appreciate any other suggestions, thank you.

  • Hi,

     

    I see a "delay" (ie: no packets received) of 18,75 ms after approx. 1 second, similar as you do.

    There are two modes, in sync and out of sync, and these two have different timing parameters.

    When "in sync", and packets are lost (for instance due to drift), the default "in sync" period is 15 (shown in nrf_gzll_constants.h::NRF_GZLL_DEFAULT_SYNC_LIFETIME, note this is a reference; changing the define will not do anything, it must be set via nrf_gzll_ function).

    This means that if you lose packets, it will take 15 timeslots (9 ms) still trying to "follow" the RF channel tab that the host is using.

    When this "in sync" period has elapsed, the device is entering "out of sync", where he default time the device spends on a channel is also 15 timeslots.

     

    These two parameters will cause a delay in the case you're drifting.

    The parameter for out-of-sync is recommended to be greater than one "round trip" for the host, in your case 10 timeslots (2*5 channels):

    /**
     * @brief Set the number of timeslots that a Gazell shall
     * reside on a single channel before switching to another channel when
     * in the "out of sync" state.
     *
     * This value should be set so that the Device transmits on one channel
     * while the Host goes through a full channel rotation, i.e.,
     * channel_table_size*timeslots_per_channel.
     * This ensures that the channels on the Device and Host will coincide
     * at some point.
     * Further increasing the value has been observed to provide better performance
     * in the presence of interferers.
     *
     * @param timeslots The number of timeslots to reside on
     * each channel before channel switch.
     *
     * @retval true  If the parameter was set.
     * @retval false If Gazell was enabled.
     */
    bool nrf_gzll_set_timeslots_per_channel_when_device_out_of_sync(uint32_t timeslots);

     

    The parameter for in-sync should be set on an application-specific level, meaning it should be set per how often you send data (+ added time in case a packet is lost):

    /**
     * @brief Set the number of timeslots after a successful
     * reception of a Device or Host packet that the Gazell Link Layer shall assume
     * that the link is synchronized. A value of 0 implies that the
     * link is always out of sync.
     *
     * @param lifetime The sync lifetime in number of timeslots.
     *
     * @retval true  If the sync lifetime was set.
     * @retval false If Gazell was enabled.
     */
    bool nrf_gzll_set_sync_lifetime(uint32_t lifetime);

     

    When adjusting down the value of "in sync" period, using nrf_gzll_set_sync_lifetime(), I see that this idle time period is reduced. Could you try adjusting this down and see if you see similar behavior? Try setting a value that is asynchronous to the round trip of the host, or is equal to the round trip.

     

    Kind regards,

    Håkon

  • Thank you for this info, this is very helpful for understanding the limitations of the gzll protocol.  I will continue to play with these timings.

Related