nrfjprog times out with error flashing Thingy:91 - Did worker process die?

I'm getting timeout errors when trying to program a Thingy:91 with a J-Link Pro PoE on macOS on an ARM Apple M3 Max.

Here's the command and error message:

❯ west flash -i 1196000003
-- west flash: rebuilding
[0/26] Performing build step for 'tfm'
ninja: no work to do.
[1/7] Performing build step for 'mcuboot_subimage'
ninja: no work to do.
[2/7] Performing install step for 'tfm'
-- Install configuration: "MinSizeRel"
[6/6] Completed 'mcuboot_subimage'
-- west flash: using runner nrfjprog
-- runners.nrfjprog: Flashing file: /Users/chris/cgnd/clients/golioth/thingy91-golioth-workspace/app/build/zephyr/merged.hex
Timed out waiting for progress updates.sing non-volatile memory - block 2 of 2                                         
Did worker process die?
[ #################### ]  14.809s | Erase file - Done erasing                                                          
ERROR: An internal error has occurred, please try again.
NOTE: For additional output, try running again with logging enabled (--log).
NOTE: Any generated log error messages will be displayed.
FATAL ERROR: command exited with status 63: nrfjprog --program /Users/chris/cgnd/clients/golioth/thingy91-golioth-workspace/app/build/zephyr/merged.hex --sectorerase --verify -f NRF91 --snr 1196000003

I get the same error when running nrfjprog manually (I've attached the log file):

❯ nrfjprog --program /Users/chris/cgnd/clients/golioth/thingy91-golioth-workspace/app/build/zephyr/merged.hex --sectorerase --verify -f NRF91 --snr 1196000003 --log
Timed out waiting for progress updates.sing non-volatile memory - block 2 of 2                                         
Did worker process die?
[ #################### ]  16.435s | Erase file - Done erasing                                                          
ERROR: An internal error has occurred, please try again.

However, running nrfjprog --recover appears to fix whatever is causing the errors:

❯ nrfjprog --recover
Recovering device. This operation might take 30s.
Erasing user code and UICR flash areas.

❯ nrfjprog --log --program /Users/chris/cgnd/clients/golioth/thingy91-golioth-workspace/app/build/zephyr/merged.hex --sectorerase --verify -f NRF91 --snr 1196000003
[ #################### ]  13.059s | Erase file - Done erasing                                                          
[ #################### ]   2.836s | Program file - Done programming                                                    
[ #################### ]   2.820s | Verify file - Done verifying

Is there a way to get a better understanding of what's going on here? Is there something that my application code is doing that is preventing the J-Link from being able to flash the device reliably? What is nrfjprog --recover doing to "fix" this issue?

3173.log.log

Parents Reply
  • Any further insights? Same symptoms for me on a M1 using a JLink Pro, when flashing a nrf5340.

    The --recover flag also resolves it for me, but that can't be used when starting a Debug session inside VSCode, because it triggers a rebuild, which is another issue. In that case I can just run "nrfjprog -e" before clicking Debug.

    But it's a long and painful sequence of steps for such a core function.

    I suspect it's related to the partitiion manager some how, and perhaps having a custom pm_static.yml is related. The complicated build process that creates and uploads to the network coprocessor first also seems implicated, but it's very hard to tease apart.

Children
  • We weren't able to reproduce it unfortunately.

    Does it happen on a basic sample like hello world or blinky as well?

    The extension also has an option of attaching the debugger to a running target, could that work as a workaround at least?

    Best regards,

    Michal

  • Okay I came up with a MRE:

    1. Create new application based on the Zephyr Bluetooth Beacon sample.
    2. Replace the prj.conf with the attached*. This just adds an assortment of the configs we are working with. I'm sure there's a smaller set, but it's laborious to minimise and this set does the trick.
    3. Create a build configuration for nrf7002dk_nrf5340_cpuapp_ns and build.
    4. Connect to a nrf7002dk.
    5. Click Flash a few times. In my experience it works the first time after an automatic recover and some very long pauses, but then starts to fail after that with "did worker die?" errors.

    Note I was able to eliminate the pm_static.yml file from the MRE. So I'm now wondering if it's simply the size of the image that triggers the issue. I couldn't pinpoint what the prj.conf addition was that trigged the MRE, but things seem to start tripping up once I added a bunch of them.

    * "attachment": https://pastebin.com/ia4ijcb7

Related