Softdevice Assert at PC=0x15810 (S132 7.2.0) / RTC clock drift when using timeslot

I'm trying to narrow down the cause of a Softdevice assertion happening in S132 7.2.0 at PC=0x15810.

We set up a proprietary RF project which utilises parts of the SDK for Mesh (specifically, the timeslot implementation and bearer_handler) because it provides a safe base to run high performance timeslot applications on. Unfortunately I do have one device which runs into a softdevice assertion at instruction 0x15810. I feel that it is a timing issue - maybe the device is operating at the outer limits of the clock accuracy, because while the issue appears sporadically on Development Kits or other devices, this specific device does trigger it quite often.

  • What exact assertion fails when at PC=0x15810?
  • Does the Softdevice shut down TIMER0 before doing this test or after an assertion fails?
  • Are timing assertions made by the softdevice based on RTC0?
  • Is there a reason why TIMER0 in the mesh stack is running in 24-bit mode as opposed to 32-bit mode?

Any help is greatly appreciated.

EDIT: In the meantime I think I found the cause of the issue. Assuming that the timing assertions by the Softdevice are done using RTC0, there seems to be a rather large discrepancy between the RTC0 timing and TIMER0 timing. After 9'999'249us on TIMER0 pass, RTC0 has counted 10'000'732us, so they're almost 1ms apart!

The device in question is running the LFCLK from the RC oscillator and we do usually have BLE deactivated. I did assume that the softdevice takes care of adjusting for clock drift, but could it be that I have to somehow take care of this manually?

EDIT2: Note that - as we're using the nRF SDK for Mesh as a codebase - when calculating the available time on the timeslot, we should already account for clock drift per the following calculation:

(p_timeslot->length_us * (m_lfclk_ppm + HFCLOCK_PPM_WORST_CASE)) / 1000000;

EDIT3: I previously wrote that we "do usually have BLE deactivated". What I actually mean by this is that most of the time the device is neither connected nor is it currently advertising. So there is no BLE activity to schedule by the softdevice. Timeslots are always active, though.

Parents
  • Hi,

    The assert at 0x15810 is because the SoftDevice got an unexpected raio interrupt, which is typically because the application used the radio outside of the timeslot. Regarding LFCLK calibration that is handled automatically by the SoftDevice. And in any case RTC drift should not be relevant here, as there is just a single low frequency clock source in the nRF, so even if there was a significant drift, the app and SoftDevice would be drifting "together".

    Looking at your timeslot length calculation, it seems like you add the worst case drift value to the duration. Is that intentional? Should it not be subtracted?

  • Hi Einar,

    Can said assert fail if a radio interrupt occurs while the timeslot should have already ended, but has not yet ended? In the scenario I observe, the timeslot is still active and control has not yet been returned to the softdevice while said radio interrupt is triggered.

    Concerning drift: But the discrepancy appears between the HFCLK and LFCLK, no? TIMER0 runs off the HFCLK and is used to end the timeslot in time. The softdevice uses RTC0 which runs off the LFCLK to assert that the timeslot is ended in time. Am I wrong here? How could you explayn the discrepancy I observe?

    Concerning the duration calculation: Sorry, that was not the entirety of the code. The calculation of timeslot duration is done like this (see timeslot.c in nRF SDK for Mesh):

    static inline uint32_t end_timer_drift_margin(const timeslot_t* p_timeslot)
    {
        return (p_timeslot->length_us * (m_lfclk_ppm + HFCLOCK_PPM_WORST_CASE)) / 1000000;
    }
    
    /** Get the timeslot end timer timestamp. */
    static inline ts_timestamp_t get_end_time(const timeslot_t* p_timeslot)
    {
        return (p_timeslot->length_us - TIMESLOT_END_SAFETY_MARGIN_US -
                TIMESLOT_END_TIMER_OVERHEAD_US - end_timer_drift_margin(p_timeslot));
    }
    

    EDIT: Just a quick update: I just verified that this issue does not seem to occur while there is any BLE activity in parallel. Tested with a live connection and the system run stable for around 40 minutes, whereas it crashes after around 2 minutes if the timeslots are extended to the maximum 10'000'000us as defined in SDK for Mesh's timeslot.c:

    /** The upper limit for the length of a single timeslot. Has to be lower than
     * the 24bit TIMER0 rollover, as inforced by the Softdevice. */
    #define TIMESLOT_MAX_LENGTH_US        (10000000UL)

Reply
  • Hi Einar,

    Can said assert fail if a radio interrupt occurs while the timeslot should have already ended, but has not yet ended? In the scenario I observe, the timeslot is still active and control has not yet been returned to the softdevice while said radio interrupt is triggered.

    Concerning drift: But the discrepancy appears between the HFCLK and LFCLK, no? TIMER0 runs off the HFCLK and is used to end the timeslot in time. The softdevice uses RTC0 which runs off the LFCLK to assert that the timeslot is ended in time. Am I wrong here? How could you explayn the discrepancy I observe?

    Concerning the duration calculation: Sorry, that was not the entirety of the code. The calculation of timeslot duration is done like this (see timeslot.c in nRF SDK for Mesh):

    static inline uint32_t end_timer_drift_margin(const timeslot_t* p_timeslot)
    {
        return (p_timeslot->length_us * (m_lfclk_ppm + HFCLOCK_PPM_WORST_CASE)) / 1000000;
    }
    
    /** Get the timeslot end timer timestamp. */
    static inline ts_timestamp_t get_end_time(const timeslot_t* p_timeslot)
    {
        return (p_timeslot->length_us - TIMESLOT_END_SAFETY_MARGIN_US -
                TIMESLOT_END_TIMER_OVERHEAD_US - end_timer_drift_margin(p_timeslot));
    }
    

    EDIT: Just a quick update: I just verified that this issue does not seem to occur while there is any BLE activity in parallel. Tested with a live connection and the system run stable for around 40 minutes, whereas it crashes after around 2 minutes if the timeslots are extended to the maximum 10'000'000us as defined in SDK for Mesh's timeslot.c:

    /** The upper limit for the length of a single timeslot. Has to be lower than
     * the 24bit TIMER0 rollover, as inforced by the Softdevice. */
    #define TIMESLOT_MAX_LENGTH_US        (10000000UL)

Children
No Data
Related