qspi_nor: Failed to schedule device sleep: -16

I'm using nrf toolchain/sdk 2.5.2.

Got a strange problem with my QSPI NOR flash interface. when i run flash_erase i get error message "qspi_nor: Failed to schedule device sleep: -16" pretty much immediately. but the flash does seem to erase correctly.

in func qspi_erase() in nrf_qspi_nor.c if i put a breakpoint on ln689 which calls qspi_device_uninit(dev) and wait for 20+ seconds i do not receive the error. This corresponds roughly with whole long a full flash erase takes for my mx25r0835f flash chip. It looks like the qspi drivers aren't waiting for the flash to actually erase before deinit?

qspi_wait_for_completion returns immediately if that is relevant?

Any idea how i can resolve this issue?

Regards

Robert

call to flash_erase, where flash_dev points to the dts device below, address=0 and size = 1048576

return flash_erase(flash_dev, address, size);

Relevant section of my dts:

&qspi {
    compatible= "nordic,nrf-qspi";
    status = "okay";
    pinctrl-0 = <&qspi_default>;
    pinctrl-1 = <&qspi_sleep>;
    pinctrl-names = "default", "sleep";
    mx25r08: mx25r0835f@0 {
        compatible = "nordic,qspi-nor";
        reg = <0>;
        sck-frequency = <50000000>;
        jedec-id = [ c2 28 14  ]; 
        size = <0x0800000>; /* flash capacity in bits */
        has-dpd;
        t-enter-dpd = <10000>;
        t-exit-dpd = <35000>;
    };
};

Top Replies

Parents

0 Sigurd Hellesvik over 1 year ago

Hi,

You can see this with a simple sample where you only write/erase from external flash, right?

Do you have a nRF52840DK?
If so, can you test the same code on the nRF52840DK, to see if you see the same error on that?
It would be useful to test if the error is related to the specific external flash or the firmware.

Regards,
Sigurd Hellesvik
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 robsrick over 1 year ago in reply to Sigurd Hellesvik

Hi,

Using a DK with the built in flash does not result in this error so it seems to be linked to the flash chip I've got on my custom board.

The test code i've got set up is simply erasing and writing to the flash in a single thread.

Regards

Robert
Cancel
Vote Up +1 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Sigurd Hellesvik over 1 year ago in reply to Sigurd Hellesvik

I heard back from our developers, and gave me some updates to v2.5.1 which could be the cause. However, you also see this on v2.5.0 so its not likely it is those.

Now, this gives me a dilemma.

We try to take up as little as possible time from our developers, so they can focus on improving our stuff. Ideally, it would be nice if you could test for some newer NCS versions (binary search between v2.1.0 and v2.5.0), so we can figure out exactly when the error happens, so I can give the developers a more detailed report on the issue.

On the other hand, you took a while to get v2.1.0 running, so I don't really want to ask you to do all of that work again either. If I could reproduce this on my side I could do the work myself, but alas I do not have the flash you have.

That leaves me with a third option: Looking at the history from the file we suspect has the issue, and trying to reason my way to what could go wrong.
From the history, these commits are new since v2.1.0:

Out of these, here are my main suspects:

drivers: flash: nrf_qspi_nor: Mark device as busy when locked

drivers: flash: nrf_qspi_nor: Add runtime PM

Just now I had another look at your v2.5.1 code, and I see that some of it is grayed out because CONFIG_PM__DEVICE_RUNTIME is set. (there is an #else which I missed before)
Do you need CONFIG_PM__DEVICE_RUNTIME for your project? Can you try to disable this configuration and see if you still get the same error?
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 robsrick over 1 year ago in reply to Sigurd Hellesvik

Hi Sigurd

My understanding of the power management library is it's critical for putting devices to sleep when not used. For my use case this is very important because our device has to last for years on a small battery. The whole project is set up to use pinctrl style definitions in the device tree which i think depend on power management? So i don't really want to change this.

Understand you can't debug this directly because you don't have access to my flash chip, but i wonder could you step through the same code on a nrf52840dk and see why it works where mine doesn't?

Presumably with the flash chip that is on the DK the pm_device_runtime_put() must do something different? the this would probably give a useful clue.

Regards

Robert
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Sigurd Hellesvik over 1 year ago in reply to robsrick

One of our developers just suggested https://github.com/zephyrproject-rtos/zephyr/pull/66711/files, which adds CONFIG_NORDIC_QSPI_NOR_TIMEOUT_MS. Maybe you could try to add this as a patch and see if you can change the timeout to fix the issue?

robsrick said:
Understand you can't debug this directly because you don't have access to my flash chip, but i wonder could you step through the same code on a nrf52840dk and see why it works where mine doesn't?

That is a good idea, I will do that.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 robsrick over 1 year ago in reply to Sigurd Hellesvik

I tried the patch but it did nothing.

I also disabled CONFIG_PM__DEVICE_RUNTIME as a test and it fixed the issue but my power consumption jumped from 3uA to 1mA which isn't acceptable for our application.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 robsrick over 1 year ago in reply to robsrick

Another update: i tried running my code on the NRF52840DK and got the same error. Can you reproduce if you have CONFIG_PM_DEVICE_RUNTIME enabled? Sigurd Hellesvik
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Reply

0 robsrick over 1 year ago in reply to robsrick

Another update: i tried running my code on the NRF52840DK and got the same error. Can you reproduce if you have CONFIG_PM_DEVICE_RUNTIME enabled? Sigurd Hellesvik
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Children

0 Sigurd Hellesvik over 1 year ago in reply to robsrick

robsrick said:
I also disabled CONFIG_PM__DEVICE_RUNTIME as a test and it fixed the issue but my power consumption jumped from 3uA to 1mA which isn't acceptable for our application.

Good to know, it narrows in our s and that you need PM here, so we will figure out what goes wrong.

robsrick said:
i tried running my code on the NRF52840DK and got the same error. Can you reproduce if you have CONFIG_PM_DEVICE_RUNTIME enabled? Sigurd Hellesvik

I tried, but I had some issues with writing code to reproduce it, faced other errors along the way.
If you are able to share the code that you used for the DK, that would speed up my testing.

Let me know if I should convert this ticket to private for that.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 robsrick over 1 year ago in reply to Sigurd Hellesvik

my source is pretty large but in essence I'm just calling flash_erase as described in my original post, with the include

#include <zephyr/drivers/flash.h>

is that enough?
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Sigurd Hellesvik over 1 year ago in reply to robsrick

I do not see the error if I erase at size 0x1000. But that is quite fast either way, no?

However if I increase the erase size to 0x10000, I get error -5 instead. I do not get -5 if I step through the code, so that is interesting.

Alas, this is not the same error code as you get.

Do I do something you would not expect in my code? Or am I missing anything to reproduce your problem?
Here is a sample as simple as I can think of to try and reproduce:

flash_erase_test.zip
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

0 robsrick over 1 year ago in reply to Sigurd Hellesvik

Hi Sigurd,

I wanted to revisit this to see if we can come up with a solution.

I had originally wanted to create a hack for this by adding a sleep to the function qspi_erase in v2.6.0\zephyr\drivers\flash\nrf_qspi_nor.c but it did not like me using k_sleep() inside that file.

Since then there have been some updates to the nrf sdk and i can now modify the function to be as follows:

/* QSPI erase */
static int qspi_erase(const struct device *dev, uint32_t addr, uint32_t size)
{
	const struct qspi_nor_config *params = dev->config;
	int rc, rc2;

	rc = qspi_nor_write_protection_set(dev, false);
	if (rc != 0) {
		return rc;
	}
	while (size > 0) {
		nrfx_err_t res = !NRFX_SUCCESS;
		uint32_t adj = 0;

		if (size == params->size) {
			/* chip erase */
			res = nrfx_qspi_chip_erase();
			adj = size;
		} else if ((size >= QSPI_BLOCK_SIZE) &&
			   QSPI_IS_BLOCK_ALIGNED(addr)) {
			/* 64 kB block erase */
			res = nrfx_qspi_erase(NRF_QSPI_ERASE_LEN_64KB, addr);
			adj = QSPI_BLOCK_SIZE;
		} else if ((size >= QSPI_SECTOR_SIZE) &&
			   QSPI_IS_SECTOR_ALIGNED(addr)) {
			/* 4kB sector erase */
			res = nrfx_qspi_erase(NRF_QSPI_ERASE_LEN_4KB, addr);
			adj = QSPI_SECTOR_SIZE;
		} else {
			/* minimal erase size is at least a sector size */
			LOG_ERR("unsupported at 0x%lx size %zu", (long)addr, size);
			res = NRFX_ERROR_INVALID_PARAM;
		}

		k_sleep(K_MSEC(20000));
		qspi_wait_for_completion(dev, res);
		if (res == NRFX_SUCCESS) {
			addr += adj;
			size -= adj;
		} else {
			LOG_ERR("erase error at 0x%lx size %zu", (long)addr, size);
			rc = qspi_get_zephyr_ret_code(res);
			break; 
		}
	}

	rc2 = qspi_nor_write_protection_set(dev, true);

	return rc != 0 ? rc : rc2;
}

My change is adding k_sleep(K_MSEC(20000)); right before the line

qspi_wait_for_completion(dev, res);

This is fine for local build but we release our firmware using Github Actions for CI. This uses the nrf docker image found at https://github.com/NordicPlayground/nrf-docker for building and so i can't add my fix because it is in the sdk files themselves.

so i'm stuck in an annoying position where i have to release a special, locally compiled firmware with the fix -> run it on the board once, then flash the proper firmware that our CI workflow generates.

This makes our production quite complicated and i'd much rather not do it.

Do you have any idea when this bug will be fixed and if there are any workarounds i can do in the meantime?

0 Sigurd Hellesvik over 1 year ago in reply to robsrick

Hi Rob,

robsrick said:
v2.6.0\zephyr\drivers\flash\nrf_qspi_nor.c

There is a bug in this driver in v2.6.0 we are looking into internally.
I do not know if it is the same you refer to here, but we should fix the one I mention first.
I will let you know when a PR is released for this, so you can see what it is about
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

qspi_nor: Failed to schedule device sleep: -16

Top Replies

drivers: flash: nrf_qspi_nor: Mark device as busy when locked

drivers: flash: nrf_qspi_nor: Add runtime PM