nrf5340 Anomaly 159 Workaround : not a workaround? Breaks QPSI Flash use on NCS 2.8.0

Updating from NCS2.6 to 2.8, the access to my QPSI Flash chip got broken (mx25r64, connected and configured using QPSI exactly as per the nrf5340DK board)

DTS entry:

&qspi {					// external flash
	status = "okay";
	pinctrl-0 = <&qspi_default>;
	pinctrl-1 = <&qspi_sleep>;
	pinctrl-names = "default", "sleep";
	mx25r64: mx25r6435f@0 {
		compatible = "nordic,qspi-nor";
		reg = <0>;
		/* MX25R64 supports only pp and pp4io */
		writeoc = "pp4io";
		/* MX25R64 supports all readoc options */
		readoc = "read4io";
		sck-frequency = <8000000>;
		jedec-id = [c2 28 17];
		sfdp-bfp = [
			e5 20 f1 ff  ff ff ff 03  44 eb 08 6b  08 3b 04 bb
			ee ff ff ff  ff ff 00 ff  ff ff 00 ff  0c 20 0f 52
			10 d8 00 ff  23 72 f5 00  82 ed 04 cc  44 83 68 44
			30 b0 30 b0  f7 c4 d5 5c  00 be 29 ff  f0 d0 ff ff
		];
		size = <67108864>;
		has-dpd;
		t-enter-dpd = <10000>;
		t-exit-dpd = <35000>;
	};
};

When accessing it, I now get these logs:

[00:00:15.432,556] <err> qspi_nor: nRF5340 anomaly 159 conditions detected
[00:00:15.440,032] <err> qspi_nor: Set the CPU clock to 64 MHz before starting QSPI operation

and then any flash access (eg to filesystem access using ELM) fails. On NCS2.6 this all worked fine.

This appears to be a 'fix' for anomly 159 (which I was not aware of experiencing!) which notes that if the HCLK_192M divider is not 0 _and_ the CPU clock is >64MHz then QSPI accesses can fail. 

Digging into this, the log comes from zephyr/drivers/flasqh/nrf_qspi_nor.c (which uses modules/hal/nordic/nrfx/drivers/src/nrf_qpsi.c). The log is actually just when the code translates the underlying error of  NRFX_ERROR_FORBIDDEN into a ECANCELED:

#if NRF53_ERRATA_159_ENABLE_WORKAROUND
    case NRFX_ERROR_FORBIDDEN:
        LOG_ERR("nRF5340 anomaly 159 conditions detected");
        LOG_ERR("Set the CPU clock to 64 MHz before starting QSPI operation");
        return -ECANCELED;
#endif
And the nrf error codes from nrf_qspi.c, which has this code:
 
static bool .qspi_errata_159_conditions_check(void)
{
#if NRF_CLOCK_HAS_HFCLK192M && NRF53_ERRATA_159_ENABLE_WORKAROUND
    if ((nrf_clock_hfclk192m_div_get(NRF_CLOCK) != NRF_CLOCK_HFCLK_DIV_1) ||
      (nrf_clock_hfclk_div_get(NRF_CLOCK) != NRF_CLOCK_HFCLK_DIV_2))
    {
      return true;
    }
    else
#endif
    {
      return false;
    }
}
All the qspi access calls call this, and if it returns false, return a NRF_ERROR_FORBIDDEN!
So this is not a workaround, just a detection of the 'bad' case to fail the qspi operations!
However, the code in nrf_qspiçnor.c already seems to deal with forcing the first condition for the workaround of the anomaly 159 (HCLK_192M divider should be 0), but not the second (CPU clock must be 64HMz ie divider set to 1).
Given that the flash qspi access already does the job of changing the hclk192M divider before and after access, why doesn't it also change the CPU clock to avoid the issue? Instead it just logs that it detected it and fails..
So, I updated zephyr/drivers/flash/nrf_qspi_nor.c to add the change of the CPU clock down to 64MHz and back again:
static int _prev_hclk_div=0;
static inline void qspi_clock_div_change(void)
{
#ifdef CONFIG_SOC_SERIES_NRF53X
	/* Make sure the base clock divider is changed accordingly
	 * before a QSPI transfer is performed.
	 */
	nrf_clock_hfclk192m_div_set(NRF_CLOCK, BASE_CLOCK_DIV);
	k_busy_wait(BASE_CLOCK_SWITCH_DELAY_US);
#ifdef NRF53_ERRATA_159_ENABLE_WORKAROUND
	_prev_hclk_div = nrf_clock_hfclk_div_get(NRF_CLOCK);
	nrf_clock_hfclk_div_set(NRF_CLOCK, NRF_CLOCK_HFCLK_DIV_2);
	//nrfx_clock_divider_set(NRF_CLOCK_DOMAIN_HFCLK, NRF_CLOCK_HFCLK_DIV_1);
#endif
#endif
}

static inline void qspi_clock_div_restore(void)
{
#ifdef CONFIG_SOC_SERIES_NRF53X
	/* Restore the default base clock divider to reduce power
	 * consumption when the QSPI peripheral is idle.
	 */
	nrf_clock_hfclk192m_div_set(NRF_CLOCK, NRF_CLOCK_HFCLK_DIV_4);
#ifdef NRF53_ERRATA_159_ENABLE_WORKAROUND
	// set back to previous cpu clk speed
	nrf_clock_hfclk_div_set(NRF_CLOCK, _prev_hclk_div);
#endif
#endif
}
This seems to work : no more 'anomaly case' detection in nrf_qspi.c, and flash access works...
Can someone from Nordic comment on a) why this wasn't done already, and b) can this be a durable fix (PR to zephyr)? Maybe its just luck that its working right now...
  • Hi,

    Since these are vacation times and you have a workaround for this, I will not get to this right away.

    In January I will make a ticket to our devs who added this workaround to ask about this, then I will get back to you

    Thanks for the thorough report!

    Regards,
    Sigurd Hellesvik

  • Hello

    Any update on this. I have some code that was using 2.6.2, but when I move to 2.7.0 I get the same error as reported above.

    Regards

  • Devs say:

    "
    QSPI is the only user of the 192 MHz clock, so the divider for this clock can be changed by the QSPI driver without affecting any other part of the system. But this is not the case for the CPU clock. That's why changing of its divider was not added to the QSPI driver and the application that uses the QSPI driver needs to take care of avoiding the anomaly 159 conditions, i.e. to make sure the CPU clock is switched back to the default 64 MHz when a QSPI operation is requested. I think this could be modified, i.e. the responsibility for changing appropriately the CPU clock divider could be put on the QSPI driver but not by default, so for example only when a dedicated Kconfig option is enabled. Such behavior of the QSPI driver would have to be requested explicitly to ensure that its user is aware of the consequences and accepts them.

    Perhaps the reporter is not aware that he can define NRF53_ERRATA_159_ENABLE_WORKAROUND=0 to get rid of this mechanism if he does not care about the anomaly.

    "

    Does this help?

  • Well, I understand the reasoning presented for why the QSPI driver did not alter the CPU clock.

    Perhaps the reporter is not aware that he can define NRF53_ERRATA_159_ENABLE_WORKAROUND=0 to get rid of this mechanism if he does not care about the anomaly.

    Sure, I see that, but given that the anomaly (or bug as we used to call such things) exists, and the consequences are major for the operation of the application, I don't see why I would want disable this mechanism? Rather, I want it to be fully transparent to my app!

    Such behavior of the QSPI driver would have to be requested explicitly to ensure that its user is aware of the consequences and accepts them.

    This is bit I don't understand : what exactly are the consequences of the QSPI driver modifying the CPU clock speed during the QPSI access? Given that the driver can restore it to the previous state after, so the application doesn't see any impact, it seems like there are only consequences if you DONT change the speed!

    Maybe the devs can explain exactly what the risk for the application would be to incorporate the CPU clock changing code?

  • I would assume in more complex applications where the application is continuously changing hfclk for power/performance optimizations there could be a race condition between two contexts where an unaware app context would set the clock to 64mhz during a qspi transfer, after qspi_clock_div_change saves the 128div but before qspi_clock_div_restore, resulting in 128div being applied using the previously saved value.

    However my application leaves 128mhz on always so this fix works for me.

Related