This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

NRF9160: Downloading FOTA image crashes with SDK 1.4

Hi,

I've had my share of problems with upgrading from SDK 1.3.1 to 1.4. Now I noticed that my app crashes randomly when I am downloading FOTA image with download_client(running SDK 1.4 app, updating to SDK1.4 app). I managed to catch one crash with debugger and it looks identical to what I saw earlier when SPM crashed to random number generation: https://devzone.nordicsemi.com/f/nordic-q-a/68395/nrf9160-sdk-1-4-crash-in-random-number-generation

That crash was solved and I found the offending operation because it was so deterministic to reproduce. But this time, image download might work 5 times and the crash at 6th try when any% is downloaded, so I have no idea where it crashes.

I am actually quite noob with interpreting the crashdumps and because it looks like e.g. link register contains just gibberish, I don't have clue what to blame/investigate now. Any ideas? Or the idea is to learn how to interpret the crash, but any ideas if this is a known problem already and is there a fix for it. Attached a screenshot of the registers

Parents
  • A wild shot, but could you try to increase CONFIG_DOWNLOAD_CLIENT_STACK_SIZE ?

  • Hi,

    I should have mentioned, it is already at max 4096

  • Are you able to locate the exact line that causes the program to crash? Try to enable more logging by setting these in prj.conf:

    CONFIG_DFU_TARGET_LOG_LEVEL_DBG=y
    CONFIG_DOWNLOAD_CLIENT_LOG_LEVEL_DBG=y
    CONFIG_FOTA_DOWNLOAD_LOG_LEVEL_DBG=y

    The value of the link register is not faulty. Usally it is used to hold the value of the address you'll return to after a jump. This is not the case for exceptions, ffffffa0 is an exception return value that says something about what context (interrupt, what stack etc..). Check out this: https://developer.arm.com/documentation/100235/0004/the-cortex-m33-processor/exception-model/exception-entry-and-return

    Best regards,

    Simon

  • Hi,

    No luck with finding the line yet. but I found another bug. This a a bit more straight forward. download_client goes haywire and interprets HTTP headers as payload. See attached log, I commented my analysis to there. Consider this as issue/bug report/what_ever to download_client.

    I think the ultimate fault is that download_client does not use the length fields in HTTP header to decide the lenght of the payload in hand. Let me know your view

    I hit this 90% of times when I try to reproduce the crash problem.

    // There are two ways that we normally receive packets. Here we receive 1556 bytes with one read
    // HTTP header + payload. This works ok
    00> [00:12:36.227,935] <dbg> download_client.download_thread: Receiving up to 2048 bytes at 0x20021774...
    00> [00:12:38.687,927] <dbg> download_client.download_thread: Read 1556 bytes from socket
    00> [00:12:38.688,079] <dbg> download_client.http_header_parse: GET header size: 532
    00> [00:12:38.688,537] <err> download_client: Header
    00> 68 74 74 70 2f 31 2e 31  20 32 30 36 20 70 61 72 |http/1.1  206 par
    ....
    00> [00:12:38.688,873] <err> download_client: Payload
    ....
    00> [00:12:38.688,903] <dbg> download_client.http_parse: Copying 1024 payload bytes
    00> [00:12:38.689,025] <inf> download_client: Downloaded 290816/318212 bytes (91%)
    
    
    // Another way is that we first get the HTTP header and then receive payload separately
    // this also works ok
    00> [00:12:38.710,021] <dbg> download_client.download_thread: Receiving up to 2048 bytes at 0x20021774...
    00> [00:12:39.861,114] <dbg> download_client.download_thread: Read 532 bytes from socket
    00> [00:12:39.861,267] <dbg> download_client.http_header_parse: GET header size: 532
    00> [00:12:39.861,724] <err> download_client: Header
    .... 
    00> [00:12:39.861,877] <dbg> download_client.download_thread: Receiving up to 2048 bytes at 0x20021774...
    00> [00:12:41.959,472] <dbg> download_client.download_thread: Read 1024 bytes from socket
    00> [00:12:41.959,503] <inf> download_client: Downloaded 291840/318212 bytes (91%)
    00> [00:12:41.959,564] <inf> STREAM_FLASH: Erasing page at offset 0x000ca000
    
    // I found a third way. Here we manage to read 2048 bytes from socket with one recv
    // It contains HTTP header + payload + part of next packets HTTP header
    00> [00:12:45.355,895] <dbg> download_client.download_thread: Receiving up to 2048 bytes at 0x20021774...
    00> [00:12:48.273,834] <dbg> download_client.download_thread: Read 2048 bytes from socket
    00> [00:12:48.273,986] <dbg> download_client.http_header_parse: GET header size: 532
    00> [00:12:48.274,414] <err> download_client: Header
    ....           
    00> [00:12:48.274,871] <err> download_client: Payload
    00> 6c 65 64 3a 20 25 64 00  6c 77 6d 32 6d 5f 69 6e |led: %d. lwm2m_in
    ....
    // I print here the whole hexdum that the code interprets as payload.
    // Here we see that the actual payload ends and next HTTP header starts in the middle of the packet
    00> 4c 20 3c 3c 20 28 30 29  29 00 00 00 57 45 53 54 |L << (0) )...WEST
    00> 48 54 54 50 2f 31 2e 31  20 32 30 36 20 50 61 72 |HTTP/1.1  206 Par
    ....
    
    // Code puts rubbish to image payload...
    00> [00:12:48.274,902] <dbg> download_client.http_parse: Copying 1516 payload bytes
    00> [00:12:48.275,054] <inf> download_client: Downloaded 295404/318212 bytes (92%)
    
    // And finally fails when trying to deocode the last few bytes of the HTTP header that was already read as payload last time
    00> [00:12:48.288,208] <dbg> download_client.download_thread: Receiving up to 2048 bytes at 0x20021774...
    00> [00:12:48.288,391] <dbg> download_client.download_thread: Read 40 bytes from socket
    00> [00:12:48.288,421] <dbg> download_client.http_header_parse: GET header size: 40
    00> [00:12:48.288,482] <err> download_client: Header
    00> 39 63 72 7a 6d 70 75 79  2d 75 62 65 79 78 6e 6f |9crzmpuy -ubeyxno
    00> 79 2d 6b 73 61 67 78 71  3d 3d 0d 0a 61 67 65 3a |y-ksagxq ==..age:
    00> 20 36 39 35 0d 0a 0d 0a                          | 695....         
    00> [00:12:48.288,543] <err> download_client: Server did not honor partial content request
    00> [00:12:48.352,416] <err> fota_download: Download client error
    00> [00:12:48.352,447] <inf> dfu_target_mcuboot: MCUBoot image upgrade aborted.
    00> [00:12:48.352,478] <err> main: Received error from fota_download
    

  • Oh crap, I've messed up some versions in git. The error above happens because I forgot the workaround from my fork for setting the timeout to TLS socket bug described in here https://devzone.nordicsemi.com/f/nordic-q-a/68594/nrf9160-download_client-does-not-work-anymore-in-sdk-1-4

    Anyway. The crash hunt continues...

  • Could you try setting CONFIG_LOG=y in prj.conf? I did some experiments with a hello world sample and figured out that has to be done in order to get these log printed out:

    Then you should get the faulting instruction address that caused the error, and you can look at the file <..>/build/zephyr/zephyr.lst to figure out what function caused this error. You can also use arm-none-eabi-addr2line.exe like done in this ticket.

    If you can't find it, I would just recommend you to add a bunch of logs (using printk()) everywhere in your program to locate where the hardfault occurs.

    To be honest, I have not handled too many cases regarding hardfaults so there may be better ways of going about this. However, using logs always works for me when trying to locate the last line that was executed before a crash occurs.

    Best regards,

    Simon

Reply Children
No Data
Related