nrfjprog times out with error flashing Thingy:91 - Did worker process die?

I'm getting timeout errors when trying to program a Thingy:91 with a J-Link Pro PoE on macOS on an ARM Apple M3 Max.

Here's the command and error message:

❯ west flash -i 1196000003
-- west flash: rebuilding
[0/26] Performing build step for 'tfm'
ninja: no work to do.
[1/7] Performing build step for 'mcuboot_subimage'
ninja: no work to do.
[2/7] Performing install step for 'tfm'
-- Install configuration: "MinSizeRel"
[6/6] Completed 'mcuboot_subimage'
-- west flash: using runner nrfjprog
-- runners.nrfjprog: Flashing file: /Users/chris/cgnd/clients/golioth/thingy91-golioth-workspace/app/build/zephyr/merged.hex
Timed out waiting for progress updates.sing non-volatile memory - block 2 of 2                                         
Did worker process die?
[ #################### ]  14.809s | Erase file - Done erasing                                                          
ERROR: An internal error has occurred, please try again.
NOTE: For additional output, try running again with logging enabled (--log).
NOTE: Any generated log error messages will be displayed.
FATAL ERROR: command exited with status 63: nrfjprog --program /Users/chris/cgnd/clients/golioth/thingy91-golioth-workspace/app/build/zephyr/merged.hex --sectorerase --verify -f NRF91 --snr 1196000003

I get the same error when running nrfjprog manually (I've attached the log file):

❯ nrfjprog --program /Users/chris/cgnd/clients/golioth/thingy91-golioth-workspace/app/build/zephyr/merged.hex --sectorerase --verify -f NRF91 --snr 1196000003 --log
Timed out waiting for progress updates.sing non-volatile memory - block 2 of 2                                         
Did worker process die?
[ #################### ]  16.435s | Erase file - Done erasing                                                          
ERROR: An internal error has occurred, please try again.

However, running nrfjprog --recover appears to fix whatever is causing the errors:

❯ nrfjprog --recover
Recovering device. This operation might take 30s.
Erasing user code and UICR flash areas.

❯ nrfjprog --log --program /Users/chris/cgnd/clients/golioth/thingy91-golioth-workspace/app/build/zephyr/merged.hex --sectorerase --verify -f NRF91 --snr 1196000003
[ #################### ]  13.059s | Erase file - Done erasing                                                          
[ #################### ]   2.836s | Program file - Done programming                                                    
[ #################### ]   2.820s | Verify file - Done verifying

Is there a way to get a better understanding of what's going on here? Is there something that my application code is doing that is preventing the J-Link from being able to flash the device reliably? What is nrfjprog --recover doing to "fix" this issue?

3173.log.log

Parents
  • BTW, I only get these timeout errors when trying to flash a board that is running an existing firmware image. Flashing a board with empty flash works.

    Also, if I first open the nRF Connect for Desktop Programmer GUI software and do "Erase all", and THEN run west flash, it works every time:

    ❯ west flash -i 1196000003
    -- west flash: rebuilding
    [0/26] Performing build step for 'tfm'
    ninja: no work to do.
    [1/7] Performing build step for 'mcuboot_subimage'
    ninja: no work to do.
    [2/7] Performing install step for 'tfm'
    -- Install configuration: "MinSizeRel"
    [6/6] Completed 'mcuboot_subimage'
    -- west flash: using runner nrfjprog
    -- runners.nrfjprog: Flashing file: /Users/chris/cgnd/clients/golioth/thingy91-golioth-workspace/app/build/zephyr/merged.hex
    [ #################### ]  13.381s | Erase file - Done erasing                                                          
    [ #################### ]   2.829s | Program file - Done programming                                                    
    [ #################### ]   2.810s | Verify file - Done verifying                                                       
    Applying system reset.
    Run.
    -- runners.nrfjprog: Board with serial number 1196000003 flashed successfully.

  • Hello,

    Are you using APPROTECT on this board?

    Best regards,

    Michal

  • FWIW, I recently upgraded to J-Link v7.96h which prompted me to upgrade my J-Link firmware from compiled "Mar 29 2024", to "compiled May 15 2024". The result is the same, but the log might be a bit more informative:

    Flashing build to 176000446
    west flash -d beacon/build --skip-rebuild --dev-id 176000446

    -- west flash: using runner nrfjprog
    -- runners.nrfjprog: Flashing file: beacon/build/zephyr/merged_domains.hex
    -- runners.nrfjprog: beacon/build/zephyr/merged_domains.hex targets both nRF53 coprocessors; splitting it into: beacon/build/zephyr/GENERATED_CP_NETWORK_merged_domains.hex and beacon/build/zephyr/GENERATED_CP_APPLICATION_merged_domains.hex
    [ ###### ] 0.000s | Erase file - Check image
    [ ##### ] 0.000s | Check image validity - Initialize device info
    [ ########## ] 0.000s | Check image validity - Check region 0 settings
    [ ############### ] 0.000s | Check image validity - block 1 of 2
    [ #################### ] 0.037s | Check image validity - Finished
    [ ############# ] 0.000s | Erase file - Erasing
    [ ########## ] 0.000s | Erasing non-volatile memory - block 1 of 1
    [ #################### ] 0.000s | Erasing non-volatile memory - Erase successful
    [ #################### ] 9.519s | Erase file - Done erasing

    [ ###### ] 0.000s | Program file - Checking image
    [ ##### ] 0.000s | Check image validity - Initialize device info
    [ ########## ] 0.000s | Check image validity - Check region 0 settings
    [ ############### ] 3.236s | Check image validity - block 1 of 2
    [ #################### ] 0.006s | Check image validity - Finished
    [ ############# ] 0.000s | Program file - Programming
    [ ########## ] 0.000s | Programming image - block 1 of 1
    [ #################### ] 0.000s | Programming image - Write successful
    [ #################### ] 1.097s | Program file - Done programming

    [ ###### ] 0.000s | Verify file - Check image
    [ ##### ] 0.000s | Check image validity - Initialize device info
    [ ########## ] 0.000s | Check image validity - Check region 0 settings
    [ ############### ] 3.634s | Check image validity - block 1 of 2
    [ #################### ] 0.009s | Check image validity - Finished
    [ ############# ] 0.000s | Verify file - Verifying
    [ ########## ] 0.000s | Verifying image - block 1 of 1
    [ #################### ] 0.000s | Verifying image - Verify successful
    [ #################### ] 1.097s | Verify file - Done verifying

    [ ###### ] 0.000s | Erase file - Check image
    [ ## ] 0.000s | Check image validity - Initialize device info
    [ ##### ] 0.000s | Check image validity - Check region 0 settings
    [ ####### ] 0.000s | Check image validity - block 1 of 6
    [ ########## ] 0.036s | Check image validity - block 2 of 6
    [ ############ ] 0.027s | Check image validity - block 3 of 6
    [ ############### ] 0.017s | Check image validity - block 4 of 6
    [ ################# ] 0.009s | Check image validity - block 5 of 6
    [ #################### ] 0.088s | Check image validity - Finished
    [ ############# ] 0.000s | Erase file - Erasing
    [ ### ] 0.000s | Erasing non-volatile memory - block 1 of 5
    [ ###### ] 0.272s | Erasing non-volatile memory - block 2 of 5
    [ ########## ] 0.501s | Erasing non-volatile memory - block 3 of 5
    [ ############# ] 0.510s | Erasing non-volatile memory - block 4 of 5
    [ ################ ] 0.225s | Erasing non-volatile memory - block 5 of 5
    Timed out waiting for progress updates.
    Did worker process die?
    [ #################### ] 0.000s | Erasing non-volatile memory - Erase successful
    [ #################### ] 21.276s | Erase file - Done erasing

    ERROR: An internal error has occurred, please try again.
    NOTE: For additional output, try running again with logging enabled (--log).
    NOTE: Any generated log error messages will be displayed.
    FATAL ERROR: command exited with status 63: nrfjprog --program beacon/build/zephyr/GENERATED_CP_APPLICATION_merged_domains.hex --sectorerase --verify -f NRF53 --coprocessor CP_APPLICATION --snr 176000446

    * The terminal process terminated with exit code: 63.



    So in this case it seems the flashing the NET image succeeds, but the process gives up on flashing the APP image because it takes too long, even though it eventually succeeds!

  • I had to do something, this is killing me.

    So I found that the script responsible for the problematic commands is here:

    /opt/nordic/ncs/v2.5.2/zephyr/scripts/west_commands/runners/nrfjprog.py

    After some perusing, I ended up commenting out line 68 and inserting this instead:

                elif erase == 'ERASE_PAGES':
                    self.logger.info('Forcing recover mode')
                    cmd.append('--recover')
    #                cmd.append('--sectorerase')

    In other words, I disabled "sectorerase" and replaced it with "recover". Flashing now takes around 10 seconds instead of well over a minute. Crucially, the erase step that used to take 30 seconds or so and timeout, only takes a second or so, and succeeds.

    I can't begin to tell you the change in ability to get things done that occurs when every time you click on "Build" or "Debug", you get a result within 10 seconds instead of bombing out 60+ seconds later.

    So obviously this is a big dirty hack, and I'm far from confident in the result, but I can see big portions of the script are hopeful hacks anyway. At least I can make progress, and start to work on all the other debugging dramas, like no restart function, GBD crashing and RTT not working. Some time in the future I hope to be able to debug my application, rather than the IDE.

  • I think in this case recover does chip erase instead of sector erase, which is indeed faster.

    If you hover over the "Flash" button in the extension, there should be a small button that does "Erase and Flash" to the right of the main "Flash" button. Does it behave the same by chance as your hack?

    Best regards,

    Michal

  • I think that's something else again. The "recover" seems to be what happens when you have a flash failure and you're prompted with a message about the flash being "protected" - you're given the option to "recover". I often wish I could just go straight to that, instead of hoping it fails and I get the prompt. Now I can.

  • I understand, thank you for the information.

    Best regards,

    Michal

Reply Children
No Data
Related