performance issue with multi threadn in ncs with nrf52832

hello Nordic

i am working with nrf52832, with zephyr, ncs v1.7.1,

the device has a sensor that sample at 26khz each sample is 3 channels of 2 bytes data. the sensor use spi1 and has high thread priority (cooperative) the samples are then stream via a double buffer to the flash which uses spi2.

when tying to write a page (256 bytes) to the flash at a test in the beginning of the app it take 1ms to write that page but when trying to write within the app, while the other sensor is sampling, it takes 3ms and also i get a BUS FAULT. here is the log:

now i do have in the app another driver that uses spi (at the same time (getting samples and writing samples to flash), but it uses a different instance of spi so i don't see why it shell create such a delay .. here is my complete .dts file:

(running the app on another board with nrd52840, with nand flash that writes 2k each time, works well, so i don't know if it is a performance issue or something else.. i will also mention that the RAM take on the nrf52832 with zephyr and all is high:

any idea why i get the BUS FAULT ? 

p.s. i don't know if it has any affect but the page data is saved in a heap section previously allocated like so

the other question i will leave for now it is also regarding a try to improve the current spi_nor_write function but anyhow it will not solve the 3ms issue and if it will remain on 1ms per page write as seen in the first test then we will be ok 

we think it is a performance issue (which we don't see when the same code runs on nrf52840 with a nand flash, which also writes 2k each time and not 256 pages)

one of the possible solutions is to change the api of the sensor driver so instead of using many time fetch and get, to get one api for fetching a pointer to a buffer of the samples stored in the driver, another idea is to have the driver use easy dma to put the samples to a buffer in memory with an interrupt after reaching 256 bytes, and then write to flash from this buffer. any idea if there is some support for one of this operations in zephyr/ncs ?,

any ideas as for which is better ?

any ideas of beast way to do preferred one ?

hope to read from you soon

best regards

Ziv

Parents
  • Hi Ziv

    How are you reading the sensor over SPI? 

    Are you using the normal or the async driver?

    Running an SPI transaction every ~38us is not ideal, but it is very odd that the 256 byte write should slow down when it is running on a separate SPI bus. At the lower levels the SPI peripheral should use DMA, which means the 256 byte transaction should go through in the background (unfortunately it would have to be split in 2 transactions, since the maximum length of one transaction is 255 bytes in the nRF52832) . 

    The nRF52840 allows you to send much longer packets in a single transaction, which might be why it is working better. 

    I agree that sampling the sensor in a more automated way would be a better approach. The SPIM module in the nRF52 devices support a so called list feature, which allows you to queue a list of transactions and have them triggered in the background from a hardware event (such as a timer compare event). In this setup you could have the SPIM module run several operations in the background until you have filled the flash buffer, and run the processing and SPI interaction at a less frequent rate. 

    You will need to use the nrfx_spim driver directly in this case, since the generic Zephyr SPI driver doesn't support the list feature of the SPIM peripheral. 

    Best regards
    Torbjørn

  • hi overbekk

    How are you reading the sensor over SPI?

    yes spi1, the flash works on spi2

    Are you using the normal or the async driver?

    i have CONFIG_SPI_ASYNC=y in my .config file under the build/zephyr, so async is how i work.

    In this setup you could have the SPIM module run several operations in the background until you have filled the flash buffer

    maybe i misunderstood your intended use, but 1. i need to be writing while sampling for a second so if my buffers is filling up faster then the are dequeued then its not good enough, also since nrf52832 have 1/4 the ram the nrf52840 has, i need to find a way to reduce my buffers and not put more cause i am already around 90% ram take. 2. there is some unclear issue to work with spim in the flash cause of  CONFIG_SOC_NRF52832_ALLOW_SPIM_DESPITE_PAN_58=y,

    it fails here (on the spi_nrfx_spim.c file on the ncs (1.7.1 v): 

    i had to change to spi

    Running an SPI transaction every ~38us is not ideal,

    the 38us spi call is called via ppi and i was thinking that what slows the flash page write is not the spi itself but available cpu and ram .. but i am not sure anymore and the coming questions are about that: 

    1. i sometimes get this error, and i wonder if it is related to the 90% ram take or is it something else ??

    2. i used ppk with nrf connect power manager desktop application to monitor some gpios and this is the outcome 

    (line 0 - iis3dwb sensor int1 

    line 3 - flash spi chip select 

    line 2 - gpio(0,26) which is up at the beginning of 'spi_nor_cmd_addr_write(...)' and down before the return from it.

    line 4 - gpio(0,27) - the flash_write api function ( 'spi_nor_write(...) ' ), gpio is up at entry and down before return.

    as can be seen in the following code parts:

    (after i moved the ' spi_nor_wait_until_ready(dev); ' from its original place to just before the ' spi_nor_cmd_write(dev, SPI_NOR_CMD_WREN); '  it did go faster but still i see there is a long delay between first read of the flash ready state to the second one, almost 1ms which is a lot in my case .. so why is it and how can i reduce it ??

    3. spi speed is set to 8mhz so passing 256 bytes should take around 250us but, looking at the nrf ppk image above, it looks like it actually takes almost 1ms for chip select to go back up..  what can cause this delay ? the flash itself can work up to 133mhz so it should not be the source for the delay

    4. going in to the  ' spi_nrfx_spi.c ' and ' spi_context.h ' for the implementation of the ' transceive(...) ' i don't understand how the ' transfer_next_chunk( ..) '  is called twice and if i understand it correct then the event handler when first spi is done calls it again, but do no check the count there only later after some if's 

    which also brings to to the efficiency question we have double calls with chunk 0 and chunk 1 for simple WREN command instead of only one dive into functions ? and the "count check" also seems wasteful sorry to say again if i understood it right then a lot of operations can be saved by checking it earlier

    and i wonder if instead of having this structure :

    ,which do save coping of the data to one buffer, is actually more efficient then having one spi_buf struct ? so instead of having a chunk of 4 and another chunk of 256 bytes with 2 separate spi calls, i will have one with 260 bytes ?

    5. in the following function in the spi_context.h file:

    i see that  CONFIG_SPI_COMPLETION_TIMEOUT_TOLERANCE=200ms by default, what should happen if i change it to '0' and is that what maybe slows the re-checking of the 'is flash ready' read command ?  also maybe i am missing something but 8000/8000000 (8Mhz = 8000000) will always come out as '0' cause 'timeout_ms' is not a double or float am i wrong ?

    6. i tried to improve the - spi_nor_cmd_addr_write(devopcodeaddrsrclength)

    with my own specific program page function: 

    p.s. also tried to implement a page program leaner function instead of the  ' spi_nor_cmd_addr_write '  which is here

    but it actually took longer time which i can't understand why ?

     

    sorry it is a long question but i am really trying to get to the bottom of it and see how i can improve and maybe even change configurations to use the spi_nrfx_spi driver in a better way or if i have a way to make modification without changing the ncs or rewriting an spi driver myself (which i don't want to do)

    hope to read from you soon  

    best regards

    Ziv

  • hi Håkon

    Try adjusting it to 1024 (or higher) to see if the fault disappears.

    though increasing it to 1024, seems to help at first look, i still wonder cause we have another board with the same code base that runs on nrf52840 (has larger RAM and FLASH), it has the same value in the CONFIG_LOG_PROCESS_THREAD_STACK_SIZE and we don't get the MPU FAULTS with it so it makes me wonder if that is the solution 

    this is my path for the compiler - /opt/gcc-arm-none-eabi-10.3-2021.10/ and under it there is no file or fir named "arm-none-eabi-addr2line" so again, not sure what is the command

    hope to read you soon

    best regards

    Ziv

  • Hi,

     

    ziv123 said:
    though increasing it to 1024, seems to help at first look, i still wonder cause we have another board with the same code base that runs on nrf52840 (has larger RAM and FLASH), it has the same value in the CONFIG_LOG_PROCESS_THREAD_STACK_SIZE and we don't get the MPU FAULTS with it so it makes me wonder if that is the solution 

    Logging is dynamic, ie. based on what is happening in real-time. A compiled image for 52832 and nrf52840 will differ, and timing on the bluetooth link (if running) also have some randomness thrown into the mix here. 

    ziv123 said:
    this is my path for the compiler - /opt/gcc-arm-none-eabi-10.3-2021.10/ and under it there is no file or fir named "arm-none-eabi-addr2line" so again, not sure what is the command

    I have the exact same version installed, and it's there at my end:

    /opt/gcc-arm-none-eabi-10.3-2021.10/bin/arm-none-eabi-addr2line

     

    If you have gcc (x86/x86-64) installed on your system, you can technically just call "addr2line" instead.

    Kind regards,

    Håkon

  • hi Håkon

    /opt/gcc-arm-none-eabi-10.3-2021.10/bin/arm-none-eabi-addr2line

    any idea whats wrong ?

  • You'll need to provide an .elf file via the -e switch:

    Håkon Alseth said:
    c:\path\to\gnuarmemb\bin\arm-none-eabi-addr2line -e zephyr\zephyr.elf 0xFAULTING_ADDRESSES

     

    Cheers,

    Håkon

  • unfortunately the outcome does not help to find where it fails in my part of the app code

      

Reply
  • unfortunately the outcome does not help to find where it fails in my part of the app code

      

Children
  • Hi,

     

    Could you share your logging configuration?

    With SEGGER RTT, you should have the logging backend in CONFIG_LOG_MODE_DEFERRED, and not LOG_IMMEDIATE. This is to ensure that logging happens in its own thread and not in-place (ie. when you call the printk or LOG_*("some log entry");)

     

    Kind regards,

    Håkon

  • hi 

    specifically the config you mentioned is set as you suggested

    this is my complete .config file:

    hope to read you soon 

    best regards

    Ziv

  • Hi Ziv,

     

    Is it still the logger thread that faults with MPU fault? If yes, then just double/triple the stack size of the logger thread to see if this helps. I see that your segger rtt buffer is quite large, which again will take a chunk out of the logger thread.

     

    Kind regards,

    Håkon

  • hi Håkon

    actually i also got help in the thread :

     how to decipher HARD FAULT / MPU FAULT on zephyr running on nrf52832 

    which helped me to locate the issue

    and you are correct, increasing the CONFIG_IDLE_STACK_SIZE, as was suggested in that thread solved the MPU issue for now

    so we can close this thread i think

    thanks a bunch  Slight smile