qspi_nor: Failed to schedule device sleep: -16

Hi

I'm using nrf toolchain/sdk 2.5.2.

Got a strange problem with my QSPI NOR flash interface.  when i run flash_erase i get error message "qspi_nor: Failed to schedule device sleep: -16" pretty much immediately. but the flash does seem to erase correctly.

in func qspi_erase() in nrf_qspi_nor.c if i put a breakpoint on ln689 which calls qspi_device_uninit(dev) and wait for 20+ seconds i do not receive the error. This corresponds roughly with whole long a full flash erase takes for my mx25r0835f flash chip. It looks like the qspi drivers aren't waiting for the flash to actually erase before deinit?

qspi_wait_for_completion returns immediately if that is relevant?

Any idea how i can resolve this issue?

Regards

Robert

call to flash_erase, where flash_dev points to the dts device below, address=0 and size = 1048576

return flash_erase(flash_dev, address, size);

Relevant section of my dts:

&qspi {
    compatible= "nordic,nrf-qspi";
    status = "okay";
    pinctrl-0 = <&qspi_default>;
    pinctrl-1 = <&qspi_sleep>;
    pinctrl-names = "default", "sleep";
    mx25r08: mx25r0835f@0 {
        compatible = "nordic,qspi-nor";
        reg = <0>;
        sck-frequency = <50000000>;
        jedec-id = [ c2 28 14  ];
        size = <0x0800000>; /* flash capacity in bits */
        has-dpd;
        t-enter-dpd = <10000>;
        t-exit-dpd = <35000>;
    };
};
Parents
  • Hi,

    You can see this with a simple sample where you only write/erase from external flash, right?

    Do you have a nRF52840DK?
    If so, can you test the same code on the nRF52840DK, to see if you see the same error on that?
    It would be useful to test if the error is related to the specific external flash or the firmware.

    Regards,
    Sigurd Hellesvik

  • Hi,

    Using a DK with the built in flash does not result in this error so it seems to be linked to the flash chip I've got on my custom board.


    The test code i've got set up is simply erasing and writing to the flash in a single thread.

    Regards

    Robert

  • robsrick said:
    it looks like you are using 2.5.0 from the log

    Right, yes. Had an off by one error I guess.

    Good to see that you now got to read out the configurations. If it did not help with this issue, hopefully it will iron out some other issues that you now will not meet further down the road.

    Here is what I found looking more at the issue you see:

    Searching for your external flash I found this blog. And it does not mention your issue either, so that is interesting.
    There have been some recent changes in our QSPI drivers, so maybe you can try with v2.1.0 just to see if the issue has been introduced recently?

    Looking at our driver. From your good observations, I agree that the issue seems to be that the device does not wait for the erase to complete before uniniting flash. My prime suspect then would be that something goes wrong in qspi_wait_for_completion().  This does different things depending on CONFIG_MULTITHREADING. What is that in your project? And what happens if you try the other option?

    I think it is most logical to use multithreading in an RTOS, so if I assume that it is on:
    The semaphore we wait for is given by qspi_complete, called by qspi_handler(). I don't know if it will give much, but you could maybe check if that function is triggered too early? And maybe it is triggered again after the flash is erased or something?

  • Took a little while to get 2.1.0 running.. but i have results

    running 2.1.0 works correctly! when it gets to qspi_device_uninit(dev) called right at the end of qspi_erase() it runs this bit of code and will loop here until the flash erase has finished (15 secs or so)

    Running 2.5.1 it instead runs some power management specific code and ignores that previous loop that checks if the flash is busy.

    when i step through pm_device_runtime_put(dev) it eventually calls qspi_nor_pm_action with the action PM_DEVICE_ACTION_SUSPEND. this checks the same nrfx_qspi_mem_busy_check() but instead of waiting for it to stop being busy it just returns error EBUSY which i believe is the source of my problems.

    Do you agree? and can you suggest a fix / when this can be fixed in the sdk

    Regards

    Robert

  • That is interesting indeed. I will ask our QSPI developers about this.

  • I heard back from our developers, and gave me some updates to v2.5.1 which could be the cause. However, you also see this on v2.5.0 so its not likely it is those.

    Now, this gives me a dilemma.

    We try to take up as little as possible time from our developers, so they can focus on improving our stuff. Ideally, it would be nice if you could test for some newer NCS versions (binary search between v2.1.0 and v2.5.0), so we can figure out exactly when the error happens, so I can give the developers a more detailed report on the issue.

    On the other hand, you took a while to get v2.1.0 running, so I don't really want to ask you to do all of that work again either. If I could reproduce this on my side I could do the work myself, but alas I do not have the flash you have.

    That leaves me with a third option: Looking at the history from the file we suspect has the issue, and trying to reason my way to what could go wrong.
    From the history, these commits are new since v2.1.0:

    Out of these, here are my main suspects:

    Just now I had another look at your v2.5.1 code, and I see that some of it is grayed out because CONFIG_PM__DEVICE_RUNTIME is set. (there is an #else which I missed before)
    Do you need CONFIG_PM__DEVICE_RUNTIME for your project? Can you try to disable this configuration and see if you still get the same error?

  • Hi Sigurd

    My understanding of the power management library is it's critical for putting devices to sleep when not used. For my use case this is very important because our device has to last for years on a small battery. The whole project is set up to use pinctrl style definitions in the device tree which i think depend on power management? So i don't really want to change this.

    Understand you can't debug this directly because you don't have access to my flash chip, but i wonder could you step through the same code on a nrf52840dk and see why it works where mine doesn't?

    Presumably with the flash chip that is on the DK the pm_device_runtime_put() must do something different? the this would probably give a useful clue.

    Regards

    Robert

Reply
  • Hi Sigurd

    My understanding of the power management library is it's critical for putting devices to sleep when not used. For my use case this is very important because our device has to last for years on a small battery. The whole project is set up to use pinctrl style definitions in the device tree which i think depend on power management? So i don't really want to change this.

    Understand you can't debug this directly because you don't have access to my flash chip, but i wonder could you step through the same code on a nrf52840dk and see why it works where mine doesn't?

    Presumably with the flash chip that is on the DK the pm_device_runtime_put() must do something different? the this would probably give a useful clue.

    Regards

    Robert

Children
Related