nrfjprog times out with error flashing Thingy:91 - Did worker process die?

I'm getting timeout errors when trying to program a Thingy:91 with a J-Link Pro PoE on macOS on an ARM Apple M3 Max.

Here's the command and error message:

❯ west flash -i 1196000003
-- west flash: rebuilding
[0/26] Performing build step for 'tfm'
ninja: no work to do.
[1/7] Performing build step for 'mcuboot_subimage'
ninja: no work to do.
[2/7] Performing install step for 'tfm'
-- Install configuration: "MinSizeRel"
[6/6] Completed 'mcuboot_subimage'
-- west flash: using runner nrfjprog
-- runners.nrfjprog: Flashing file: /Users/chris/cgnd/clients/golioth/thingy91-golioth-workspace/app/build/zephyr/merged.hex
Timed out waiting for progress updates.sing non-volatile memory - block 2 of 2                                         
Did worker process die?
[ #################### ]  14.809s | Erase file - Done erasing                                                          
ERROR: An internal error has occurred, please try again.
NOTE: For additional output, try running again with logging enabled (--log).
NOTE: Any generated log error messages will be displayed.
FATAL ERROR: command exited with status 63: nrfjprog --program /Users/chris/cgnd/clients/golioth/thingy91-golioth-workspace/app/build/zephyr/merged.hex --sectorerase --verify -f NRF91 --snr 1196000003

I get the same error when running nrfjprog manually (I've attached the log file):

❯ nrfjprog --program /Users/chris/cgnd/clients/golioth/thingy91-golioth-workspace/app/build/zephyr/merged.hex --sectorerase --verify -f NRF91 --snr 1196000003 --log
Timed out waiting for progress updates.sing non-volatile memory - block 2 of 2                                         
Did worker process die?
[ #################### ]  16.435s | Erase file - Done erasing                                                          
ERROR: An internal error has occurred, please try again.

However, running nrfjprog --recover appears to fix whatever is causing the errors:

❯ nrfjprog --recover
Recovering device. This operation might take 30s.
Erasing user code and UICR flash areas.

❯ nrfjprog --log --program /Users/chris/cgnd/clients/golioth/thingy91-golioth-workspace/app/build/zephyr/merged.hex --sectorerase --verify -f NRF91 --snr 1196000003
[ #################### ]  13.059s | Erase file - Done erasing                                                          
[ #################### ]   2.836s | Program file - Done programming                                                    
[ #################### ]   2.820s | Verify file - Done verifying

Is there a way to get a better understanding of what's going on here? Is there something that my application code is doing that is preventing the J-Link from being able to flash the device reliably? What is nrfjprog --recover doing to "fix" this issue?

3173.log.log

Parents
  • BTW, I only get these timeout errors when trying to flash a board that is running an existing firmware image. Flashing a board with empty flash works.

    Also, if I first open the nRF Connect for Desktop Programmer GUI software and do "Erase all", and THEN run west flash, it works every time:

    ❯ west flash -i 1196000003
    -- west flash: rebuilding
    [0/26] Performing build step for 'tfm'
    ninja: no work to do.
    [1/7] Performing build step for 'mcuboot_subimage'
    ninja: no work to do.
    [2/7] Performing install step for 'tfm'
    -- Install configuration: "MinSizeRel"
    [6/6] Completed 'mcuboot_subimage'
    -- west flash: using runner nrfjprog
    -- runners.nrfjprog: Flashing file: /Users/chris/cgnd/clients/golioth/thingy91-golioth-workspace/app/build/zephyr/merged.hex
    [ #################### ]  13.381s | Erase file - Done erasing                                                          
    [ #################### ]   2.829s | Program file - Done programming                                                    
    [ #################### ]   2.810s | Verify file - Done verifying                                                       
    Applying system reset.
    Run.
    -- runners.nrfjprog: Board with serial number 1196000003 flashed successfully.

  • Hello,

    Are you using APPROTECT on this board?

    Best regards,

    Michal

  • I had to do something, this is killing me.

    So I found that the script responsible for the problematic commands is here:

    /opt/nordic/ncs/v2.5.2/zephyr/scripts/west_commands/runners/nrfjprog.py

    After some perusing, I ended up commenting out line 68 and inserting this instead:

                elif erase == 'ERASE_PAGES':
                    self.logger.info('Forcing recover mode')
                    cmd.append('--recover')
    #                cmd.append('--sectorerase')

    In other words, I disabled "sectorerase" and replaced it with "recover". Flashing now takes around 10 seconds instead of well over a minute. Crucially, the erase step that used to take 30 seconds or so and timeout, only takes a second or so, and succeeds.

    I can't begin to tell you the change in ability to get things done that occurs when every time you click on "Build" or "Debug", you get a result within 10 seconds instead of bombing out 60+ seconds later.

    So obviously this is a big dirty hack, and I'm far from confident in the result, but I can see big portions of the script are hopeful hacks anyway. At least I can make progress, and start to work on all the other debugging dramas, like no restart function, GBD crashing and RTT not working. Some time in the future I hope to be able to debug my application, rather than the IDE.

  • I think in this case recover does chip erase instead of sector erase, which is indeed faster.

    If you hover over the "Flash" button in the extension, there should be a small button that does "Erase and Flash" to the right of the main "Flash" button. Does it behave the same by chance as your hack?

    Best regards,

    Michal

  • I think that's something else again. The "recover" seems to be what happens when you have a flash failure and you're prompted with a message about the flash being "protected" - you're given the option to "recover". I often wish I could just go straight to that, instead of hoping it fails and I get the prompt. Now I can.

  • I understand, thank you for the information.

    Best regards,

    Michal

  • I'm seeing the same "Did worker process die?" issue constantly.  Sometimes a series of recovers allows it to complete.  Flashing does take a long time.  My colleagues and I are all using Mac silicon and have large applications.  Note that we are using an nRF5340.  This seemed to start happening a few weeks ago, possibly when the application reached a certain size.  Looks like this was reported to Nordic months ago.  Has Nordic reproduced the issue in house? Is there a fix or workaround available?

Reply
  • I'm seeing the same "Did worker process die?" issue constantly.  Sometimes a series of recovers allows it to complete.  Flashing does take a long time.  My colleagues and I are all using Mac silicon and have large applications.  Note that we are using an nRF5340.  This seemed to start happening a few weeks ago, possibly when the application reached a certain size.  Looks like this was reported to Nordic months ago.  Has Nordic reproduced the issue in house? Is there a fix or workaround available?

Children
Related