This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

NRF9160: Downloading FOTA image crashes with SDK 1.4

Hi,

I've had my share of problems with upgrading from SDK 1.3.1 to 1.4. Now I noticed that my app crashes randomly when I am downloading FOTA image with download_client(running SDK 1.4 app, updating to SDK1.4 app). I managed to catch one crash with debugger and it looks identical to what I saw earlier when SPM crashed to random number generation: https://devzone.nordicsemi.com/f/nordic-q-a/68395/nrf9160-sdk-1-4-crash-in-random-number-generation

That crash was solved and I found the offending operation because it was so deterministic to reproduce. But this time, image download might work 5 times and the crash at 6th try when any% is downloaded, so I have no idea where it crashes.

I am actually quite noob with interpreting the crashdumps and because it looks like e.g. link register contains just gibberish, I don't have clue what to blame/investigate now. Any ideas? Or the idea is to learn how to interpret the crash, but any ideas if this is a known problem already and is there a fix for it. Attached a screenshot of the registers

  • A wild shot, but could you try to increase CONFIG_DOWNLOAD_CLIENT_STACK_SIZE ?

  • Hi,

    I should have mentioned, it is already at max 4096

  • Are you able to locate the exact line that causes the program to crash? Try to enable more logging by setting these in prj.conf:

    CONFIG_DFU_TARGET_LOG_LEVEL_DBG=y
    CONFIG_DOWNLOAD_CLIENT_LOG_LEVEL_DBG=y
    CONFIG_FOTA_DOWNLOAD_LOG_LEVEL_DBG=y

    The value of the link register is not faulty. Usally it is used to hold the value of the address you'll return to after a jump. This is not the case for exceptions, ffffffa0 is an exception return value that says something about what context (interrupt, what stack etc..). Check out this: https://developer.arm.com/documentation/100235/0004/the-cortex-m33-processor/exception-model/exception-entry-and-return

    Best regards,

    Simon

  • Hi,

    No luck with finding the line yet. but I found another bug. This a a bit more straight forward. download_client goes haywire and interprets HTTP headers as payload. See attached log, I commented my analysis to there. Consider this as issue/bug report/what_ever to download_client.

    I think the ultimate fault is that download_client does not use the length fields in HTTP header to decide the lenght of the payload in hand. Let me know your view

    I hit this 90% of times when I try to reproduce the crash problem.

    // There are two ways that we normally receive packets. Here we receive 1556 bytes with one read
    // HTTP header + payload. This works ok
    00> [00:12:36.227,935] <dbg> download_client.download_thread: Receiving up to 2048 bytes at 0x20021774...
    00> [00:12:38.687,927] <dbg> download_client.download_thread: Read 1556 bytes from socket
    00> [00:12:38.688,079] <dbg> download_client.http_header_parse: GET header size: 532
    00> [00:12:38.688,537] <err> download_client: Header
    00> 68 74 74 70 2f 31 2e 31  20 32 30 36 20 70 61 72 |http/1.1  206 par
    ....
    00> [00:12:38.688,873] <err> download_client: Payload
    ....
    00> [00:12:38.688,903] <dbg> download_client.http_parse: Copying 1024 payload bytes
    00> [00:12:38.689,025] <inf> download_client: Downloaded 290816/318212 bytes (91%)
    
    
    // Another way is that we first get the HTTP header and then receive payload separately
    // this also works ok
    00> [00:12:38.710,021] <dbg> download_client.download_thread: Receiving up to 2048 bytes at 0x20021774...
    00> [00:12:39.861,114] <dbg> download_client.download_thread: Read 532 bytes from socket
    00> [00:12:39.861,267] <dbg> download_client.http_header_parse: GET header size: 532
    00> [00:12:39.861,724] <err> download_client: Header
    .... 
    00> [00:12:39.861,877] <dbg> download_client.download_thread: Receiving up to 2048 bytes at 0x20021774...
    00> [00:12:41.959,472] <dbg> download_client.download_thread: Read 1024 bytes from socket
    00> [00:12:41.959,503] <inf> download_client: Downloaded 291840/318212 bytes (91%)
    00> [00:12:41.959,564] <inf> STREAM_FLASH: Erasing page at offset 0x000ca000
    
    // I found a third way. Here we manage to read 2048 bytes from socket with one recv
    // It contains HTTP header + payload + part of next packets HTTP header
    00> [00:12:45.355,895] <dbg> download_client.download_thread: Receiving up to 2048 bytes at 0x20021774...
    00> [00:12:48.273,834] <dbg> download_client.download_thread: Read 2048 bytes from socket
    00> [00:12:48.273,986] <dbg> download_client.http_header_parse: GET header size: 532
    00> [00:12:48.274,414] <err> download_client: Header
    ....           
    00> [00:12:48.274,871] <err> download_client: Payload
    00> 6c 65 64 3a 20 25 64 00  6c 77 6d 32 6d 5f 69 6e |led: %d. lwm2m_in
    ....
    // I print here the whole hexdum that the code interprets as payload.
    // Here we see that the actual payload ends and next HTTP header starts in the middle of the packet
    00> 4c 20 3c 3c 20 28 30 29  29 00 00 00 57 45 53 54 |L << (0) )...WEST
    00> 48 54 54 50 2f 31 2e 31  20 32 30 36 20 50 61 72 |HTTP/1.1  206 Par
    ....
    
    // Code puts rubbish to image payload...
    00> [00:12:48.274,902] <dbg> download_client.http_parse: Copying 1516 payload bytes
    00> [00:12:48.275,054] <inf> download_client: Downloaded 295404/318212 bytes (92%)
    
    // And finally fails when trying to deocode the last few bytes of the HTTP header that was already read as payload last time
    00> [00:12:48.288,208] <dbg> download_client.download_thread: Receiving up to 2048 bytes at 0x20021774...
    00> [00:12:48.288,391] <dbg> download_client.download_thread: Read 40 bytes from socket
    00> [00:12:48.288,421] <dbg> download_client.http_header_parse: GET header size: 40
    00> [00:12:48.288,482] <err> download_client: Header
    00> 39 63 72 7a 6d 70 75 79  2d 75 62 65 79 78 6e 6f |9crzmpuy -ubeyxno
    00> 79 2d 6b 73 61 67 78 71  3d 3d 0d 0a 61 67 65 3a |y-ksagxq ==..age:
    00> 20 36 39 35 0d 0a 0d 0a                          | 695....         
    00> [00:12:48.288,543] <err> download_client: Server did not honor partial content request
    00> [00:12:48.352,416] <err> fota_download: Download client error
    00> [00:12:48.352,447] <inf> dfu_target_mcuboot: MCUBoot image upgrade aborted.
    00> [00:12:48.352,478] <err> main: Received error from fota_download
    

  • Oh crap, I've messed up some versions in git. The error above happens because I forgot the workaround from my fork for setting the timeout to TLS socket bug described in here https://devzone.nordicsemi.com/f/nordic-q-a/68594/nrf9160-download_client-does-not-work-anymore-in-sdk-1-4

    Anyway. The crash hunt continues...

Related