This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Debug stack corruption problem?

I've got an issue where calling a function is overwriting a variable in the calling function. See attached screenshots. In the first, byte is 0x1b. After stepping into the function (without any more code being executed) byte is overwritten with 0x0. I don't know where to start to debug this issue! Fortunately it is easily reproducible (in my firmware), and happens every time. Any suggestions? I'm using GNU Arm Embedded Toolchain 10-2020-q4-major, debugging here with arm-none-eabi-gdb in CLion. Stepping over instructions I can see the corruption occurs during the bl instruction:

0x0004f492 <+6>:	ldrb.w	r0, [r0, #512]	; 0x200
=> 0x0004f496 <+10>:	bl	0x28b64 <min_tx_byte> // stepi triggers corruption

However, this doesn't occur every time this function is called, only during a few specific calls. The calling function is in a linked library, and the called function is in my main application, not sure if that makes any difference but explains the large delta in the bl instruction. I've checked the softdevice ram allocation (it requested 0x3120 bytes, so RAM size is 0xcee0, and I've defined __HEAP_SIZE=0 __STACK_SIZE=0x2000).

  • Hi,

    Have you monitored the value of "byte" continuously while single-stepping?

    Have you looked at the assembly code generated for the function?

    Is there any use of pointers inside of min_tx_byte?

    Regards,
    Terje

  • Hi Terje, Yes to the first two, from the question:

    => 0x0004f496 <+10>:	bl	0x28b64 <min_tx_byte> // stepi triggers corruption

    So stepping over the bl instruction causes byte to go from 0xb1 to 0x00.

    This occurs before any code in min_tx_byte runs, however the code for that function is:

    static uint8_t tx_buffer[255];
    static size_t tx_buffer_length;
    
    void min_tx_byte(uint8_t port, uint8_t byte) {
        ASSERT(tx_buffer_length < sizeof(tx_buffer));
        tx_buffer[tx_buffer_length] = byte;
        tx_buffer_length++;
    }

  • Hi,

    Right. And as it is in a linked library, you do not get to single step within that function itself?

    Is there anything fishy going on with regards to optimizations on the caller side of things? E.g. that the caller code is inlined, or the "byte" value can be deterministically decided build time (ending up as a constant in the code itself), or some other optimization leading to the value shown in debug to be wrong? Does it happen when optimization is turned off?

    Regards,
    Terje

  • The library is compiled from source so yes I can single step inside it, but it goes to 0x0 before the first line is hit.

    Here is the disassembly of the calling function:

    Dump of assembler code for function stuffed_tx_byte:
       0x0004f484 <+0>:	push	{r3, r4, r5, lr}
       0x0004f486 <+2>:	mov	r4, r0
       0x0004f488 <+4>:	mov	r5, r1
    => 0x0004f48a <+6>:	ldrb.w	r0, [r0, #512]	; 0x200
       0x0004f48e <+10>:	bl	0x28b64 <min_tx_byte>
       0x0004f492 <+14>:	mov	r1, r5
       0x0004f494 <+16>:	add.w	r0, r4, #500	; 0x1f4
       0x0004f498 <+20>:	bl	0x4f2f4 <crc32_step>

    This is with -O0, but usually I use -Og for debug builds, however the behaviour is exactly the same.

    I'm wondering if the softdevice could be overwriting it. I noticed that the softdevice observes the stack, perhaps it could be doing something that is overwriting the local variables when the function is called...

  • Hi,

    At least the assembler code that you provided looks sane. The "byte" value received through register r1 is moved to r5, which I must assume min_tx_byte preserves, and that should also be where the debugger takes the value from for displaying it. Have you also confirmed that program execution actually continues with a byte value of 0? I.e. that this is not just a debugger glitch?

    Regarding SoftDevice, I would expect a lot of breakage if it did any changes like this one, either on the stack or in the registers. In any case, are you expecting SoftDevice activity interrupting this function? Do you have any interrupt routines (triggered from SoftDevice or otherwise) that can interfere in any way?

    I do not see how use of the self pointer anywhere else could interfere with the value of byte, but maybe there is a link there. Is the self pointer used anywhere else? I see further down in stuffed_tx_byte() you decrement the self pointer. A bit far fetched, but if byte (residing in r5) is pushed to the stack inside min_tx_byte (i.e. min_tx_byte uses r5 for some other purpose) and while stored there an interrupt routine changes the value at that address, then yes. Could be stack/heap overlap, but as you are not using a heap it does not sound likely...

    Regards,
    Terje

Related