Beware that this post is related to an SDK in maintenance mode
More Info: Consider nRF Connect SDK for new designs
This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

App timer (and atomic fifo) no memory

Hi,

I've encountered problem with app timer functions being called from different interrupt levels. Looks like no memory error returned is caused by collision in underlaying atomic fifo library. This seems like unexpected behaviour as there is no information in documentation about such a behaviour.

Setup:
nRF5_SDK_17.1.0
gcc-arm-none-eabi-10-2020-q4-major (tested both windows and IOS)
gcc-arm-none-eabi-9-2019-q4-major (tested on IOS)

nRF52840_XXAA
App timer v2
no rtos

Originally encountered problem:
From main application thread SAADC was configured and started to perform conversion. After triggering SAADC operation app timer was started with purpose of generating timeout event for SAADC (we had previous problem with SAADC driver hangup and not handling interrupt properly). In SAADC driver event handler, app timer stop was called. App timer interrupt level was set to 7 and SAADC to 6. Randomly (due to SD and active BLE connection) when calling app timer start from main thread, NO_MEMORY error was returned. First investigation showed that app timer start and subsequent, underlaying rtc interrupt is again interrupted by SAADC callback. This interrupt of start operation scheduling caused it to return NO_MEMORY error. 
I'm aware that NO_MEMORY error could be returned in case timer library has high load and operation queue is full. One of first solution was increasing timer operation queue from 10 to 20, however it didn't change anything, error was returned similarly frequent. Moreover, enabling app timer profiler (APP_TIMER_WITH_PROFILER=1) completely solved problem, as internally, queue element allocation (which was failing), is surrounded by critical section.
I've created simple project (zip will be attached to this ticket) that can reliably reproduce this issue.
To reproduce this problem slight modification was done in order not to use SAADC and simulate its delay (plus SD interrupt) with something more predictable. Instead of using SAADC, timer will be used, to get constant delay of interrupt. As timers are synchronous to 16MHz clock, we also have to synchronize test program with it. Main thread will be elevated to interrupt level 7, by additional timer. All other interrupts levels will be increased by 1 compared to original problem: app timer (rtc) to 6 and timer simulating SAADC to 5. Delay of simulating timer will be changed in loop and return code from app_timer_start will be monitored. 
I was able to reproduce issue with different compilers and different optimization levels.
I haven't look deeper into it, but looks like reentrancy or other collision in atomic fifo is causing this issue. Or maybe I simply have something incorrectly configured.
Best regards
Tomasz
P.S. Regarding mentioned saadc driver problem, I will also create ticket when I will be able to reproduce it somehow reliably.
  • Hi,

    That's great news.

    I will soon patch sdk and check if now everything works flawlessly.

    Thank you for help.

  • Hi

    I have good and bad news.
    Bad news is that it's causing hard fault, however good news is that I probably managed to repair it.

    So what is happening:

    With original test project that I included in main ticket, it's indeed working flawlessly and as expected. So I tried it in other project that I'm currently working on, and unfortunately I was getting Hard Fault when trying to start app_timer (using underlaying atomic fifo item alloc).
    I've dug deeper and found thet whether hard fault is caused or not, depends on what compiler and what optimization level is set.

    How it looked on different compilers and optimization (though it doesn't matter as came out later):

    gcc-arm-none-eabi-10-2020-q4-major for iOS:
     - O0 --> Hard fault
     - O1 --> OK
     - O2 --> OK
     - O3 --> OK
     - Ofast --> OK

    gcc-arm-none-eabi-9-2019-q4-major for iOS:
     - O0 --> Hard fault
     - O1 --> Hard fault
     - O2 --> Hard fault
     - O3 --> OK
     - Ofast --> OK

    Counterintuitively higher optimization worked better.

    Exact fault was Unaligned access flag returned (USFR.UNALIGNED) which in my case was elevated to hard fault. Below I will post code for nrf_atfifo_wspace_req function as it will need it as reference.

    bool nrf_atfifo_wspace_req(nrf_atfifo_t * const p_fifo, nrf_atfifo_postag_t * const p_old_tail)
    {
        volatile bool ret;
        volatile uint32_t old_tail;
        uint32_t new_tail;
        uint32_t temp;
    
        __ASM volatile(
            /* For more comments see Keil version above */
            "1:                                                             \n"
            "   ldrex %[old_tail], [%[p_fifo], %[offset_tail]]              \n"
            "   uxth %[new_tail], %[old_tail]                               \n"
            "                                                               \n"
            "   ldrh  %[temp], [%[p_fifo], %[offset_item_size]]             \n"
            "   add   %[new_tail], %[temp]                                  \n"
            "   ldrh  %[temp], [%[p_fifo], %[offset_buf_size]]              \n"
            "   cmp   %[new_tail], %[temp]                                  \n"
            "   it    hs                                                    \n"
            "   subhs %[new_tail], %[new_tail], %[temp]                     \n"
            "                                                               \n"
            "   ldrh  %[temp], [%[p_fifo], %[offset_head_wr]]               \n"
            "   cmp   %[new_tail], %[temp]                                  \n"
            "   ittt  eq                                                    \n"
            "   moveq %[ret], %[false_val]                                  \n"
            "   moveq %[new_tail], %[old_tail]                              \n"
            "   beq.n 2f                                                    \n"
            "                                                               \n"
            "   pkhbt %[new_tail], %[new_tail], %[old_tail]                 \n"
            "                                                               \n"
            "   mov %[ret], %[true_val]                                     \n"
            "                                                               \n"
            "2:                                                             \n"
            "   strex %[temp], %[new_tail], [%[p_fifo], %[offset_tail]]     \n"
            "   cmp   %[temp], #0                                           \n"
            "   bne.n 1b                                                    \n"
            : /* Output operands */
                [ret]     "=r"(ret),
                [temp]    "=&r"(temp),
                [old_tail]"=&r"(old_tail),
                [new_tail]"=&r"(new_tail)
            : /* Input operands */
                [p_fifo]          "r"(p_fifo),
                [offset_tail]     "J"(offsetof(nrf_atfifo_t, tail)),
                [offset_head_wr]  "J"(offsetof(nrf_atfifo_t, head) + offsetof(nrf_atfifo_postag_pos_t, wr)),
                [offset_item_size]"J"(offsetof(nrf_atfifo_t, item_size)),
                [offset_buf_size] "J"(offsetof(nrf_atfifo_t, buf_size)),
                [true_val]        "I"(true),
                [false_val]       "I"(false)
            : /* Clobbers */
                "cc");
    
        p_old_tail->tag = old_tail;
        UNUSED_VARIABLE(new_tail);
        UNUSED_VARIABLE(temp);
        return ret;
    }


    Fault was caused in strex instruction (line 33 here). How it was presented in disassembly:

    __ASM volatile(
     0002A3C6   LDR            R3, [R7, #4]
     0002A3C8   LDREX          R1, [R3, #4]
     0002A3CC   UXTH           R2, R1
     0002A3CE   LDRH           R0, [R3, #14]
     0002A3D0   ADD            R2, R0
     0002A3D2   LDRH           R0, [R3, #12]
     0002A3D4   CMP            R2, R0
     0002A3D6   IT             CS
     0002A3D8   SUBCS          R2, R2, R0
     0002A3DA   LDRH           R0, [R3, #8]
     0002A3DC   CMP            R2, R0
     0002A3DE   ITTT           EQ
     0002A3E0   MOVEQ          R3, #0
     0002A3E2   MOVEQ          R2, R1
     0002A3E4   BEQ            0x0002A3EE                    ; <nrf_atfifo_wspace_req>+0x32
     0002A3E6   PKHBT          R2, R2, R1
     0002A3EA   MOV.W          R3, #1
     0002A3EE   STREX          R0, R2, [R3, #4]
     0002A3F2   CMP            R0, #0
     0002A3F4   BNE            0x0002A3C8                    ; <nrf_atfifo_wspace_req>+0xC


    In mentioned strex instruction register R3 should contain address of atomic fifo storage. However, just one line before, R3 register is overwritten with variable ret (in current case equal to true so 1). This 1 is used by strex as pointer and causes unaligned access fault. Depending on which failing combination of compiler and optimization was used, register numbers were different, however situation was exactly same, address of storage was overwritten with return value. Correctly working programs of course had different registers used (it was not overwritten).

    Mu curiosity forced me to look up documentation what those "=", "&" or "r" symbols in output operands means and why [ret] (which is causing issues) is missing "&". I found this:
    ‘&’ - Means (in a particular alternative) that this operand is an 
    earlyclobber operand, which is written before the instruction is 
    finished using the input operands. [...]


    I don't rally know assembler and how to use it in C code, however in this situation definitely after writing value to ret, input operands are still used (p_fifo pointer) yet "&" is missing from output operands declaration for ret.

    What happened:

    Looking at patch you proposed, 
    mov %[ret], %[true_val]

    instruction was moved up in code. In original implementation it was last instruction in assembler part and indeed "&" was not needed. After patching it's not longer at the and looks like "&" is needed, otherwise compiler thinks that input values are no longer needed and registers holding them can be reused.

    Same situation goes for nrf_atfifo_rspace_req function.

    After changing it, all works like charm on both compilers I have installed and all 5 optimization levels for those compilers.

    tl;dr Final solution:
    nrf_atfifo_internal.h --> nrf_atfifo_wspace_req --> ASM output operands
    change:
        [ret]     "=r"(ret), 
    to:
        [ret]     "=&r"(ret),
    
    nrf_atfifo_internal.h --> nrf_atfifo_rspace_req --> ASM output operands
    change:
        [ret]     "=r"(ret), 
    to:
        [ret]     "=&r"(ret),


    Could you please ask developer of proposed patch if it could really be this issue, and if it's correct solution for it?

    Thanks in advance for help and hope to hear back soon.

    Best regards
    Tomasz

  • Hi Tomasz,

    Excellent work! Thanks for sharing your solution. The developer has reviewed your changes now and he thought they were the correct fix for the problem.

    Best regards,

    Vidar

Related