nrfjprog times out with error flashing Thingy:91 - Did worker process die?

I'm getting timeout errors when trying to program a Thingy:91 with a J-Link Pro PoE on macOS on an ARM Apple M3 Max.

Here's the command and error message:

❯ west flash -i 1196000003
-- west flash: rebuilding
[0/26] Performing build step for 'tfm'
ninja: no work to do.
[1/7] Performing build step for 'mcuboot_subimage'
ninja: no work to do.
[2/7] Performing install step for 'tfm'
-- Install configuration: "MinSizeRel"
[6/6] Completed 'mcuboot_subimage'
-- west flash: using runner nrfjprog
-- runners.nrfjprog: Flashing file: /Users/chris/cgnd/clients/golioth/thingy91-golioth-workspace/app/build/zephyr/merged.hex
Timed out waiting for progress updates.sing non-volatile memory - block 2 of 2                                         
Did worker process die?
[ #################### ]  14.809s | Erase file - Done erasing                                                          
ERROR: An internal error has occurred, please try again.
NOTE: For additional output, try running again with logging enabled (--log).
NOTE: Any generated log error messages will be displayed.
FATAL ERROR: command exited with status 63: nrfjprog --program /Users/chris/cgnd/clients/golioth/thingy91-golioth-workspace/app/build/zephyr/merged.hex --sectorerase --verify -f NRF91 --snr 1196000003

I get the same error when running nrfjprog manually (I've attached the log file):

❯ nrfjprog --program /Users/chris/cgnd/clients/golioth/thingy91-golioth-workspace/app/build/zephyr/merged.hex --sectorerase --verify -f NRF91 --snr 1196000003 --log
Timed out waiting for progress updates.sing non-volatile memory - block 2 of 2                                         
Did worker process die?
[ #################### ]  16.435s | Erase file - Done erasing                                                          
ERROR: An internal error has occurred, please try again.

However, running nrfjprog --recover appears to fix whatever is causing the errors:

❯ nrfjprog --recover
Recovering device. This operation might take 30s.
Erasing user code and UICR flash areas.

❯ nrfjprog --log --program /Users/chris/cgnd/clients/golioth/thingy91-golioth-workspace/app/build/zephyr/merged.hex --sectorerase --verify -f NRF91 --snr 1196000003
[ #################### ]  13.059s | Erase file - Done erasing                                                          
[ #################### ]   2.836s | Program file - Done programming                                                    
[ #################### ]   2.820s | Verify file - Done verifying

Is there a way to get a better understanding of what's going on here? Is there something that my application code is doing that is preventing the J-Link from being able to flash the device reliably? What is nrfjprog --recover doing to "fix" this issue?

3173.log.log

Parents
  • BTW, I only get these timeout errors when trying to flash a board that is running an existing firmware image. Flashing a board with empty flash works.

    Also, if I first open the nRF Connect for Desktop Programmer GUI software and do "Erase all", and THEN run west flash, it works every time:

    ❯ west flash -i 1196000003
    -- west flash: rebuilding
    [0/26] Performing build step for 'tfm'
    ninja: no work to do.
    [1/7] Performing build step for 'mcuboot_subimage'
    ninja: no work to do.
    [2/7] Performing install step for 'tfm'
    -- Install configuration: "MinSizeRel"
    [6/6] Completed 'mcuboot_subimage'
    -- west flash: using runner nrfjprog
    -- runners.nrfjprog: Flashing file: /Users/chris/cgnd/clients/golioth/thingy91-golioth-workspace/app/build/zephyr/merged.hex
    [ #################### ]  13.381s | Erase file - Done erasing                                                          
    [ #################### ]   2.829s | Program file - Done programming                                                    
    [ #################### ]   2.810s | Verify file - Done verifying                                                       
    Applying system reset.
    Run.
    -- runners.nrfjprog: Board with serial number 1196000003 flashed successfully.

  • Hello,

    Are you using APPROTECT on this board?

    Best regards,

    Michal

  • Sorry, I was away for a while unfortunately.

    I will have to check with one of my colleagues with access to one of the new Macs if they could test that.

    Best regards,

    Michal

  •  what version of nRF Command Line Tools are you using? I just updated to the latest 10.24.2 build for macOS and I'm not getting the timeout errors I was getting earlier. If you are using an older version, can you try updating to the latest macOS build from https://www.nordicsemi.com/Products/Development-tools/nRF-Command-Line-Tools/Download and see if that fixes it?

  • I was only on 10.24.0. I verified the issue was still present using the MRE above, quit VSCode, disconnected the dev kit, installed 10.24.2, confirmed the new version number for nrfjprog, and ran the MRE again. The behaviour was identical, as far as I could tell: a slow but successful flash the first time after an auto-recovery, then fails on the very next time with "did worker process die".

    I think there was one subtle change - the "did worker process die" notification is no longer sticky, auto-dismissing shortly after it appears. The log however, and the fatal error, look the same.

    Worth a shot, but alas, no cigar!

  • FWIW, I recently upgraded to J-Link v7.96h which prompted me to upgrade my J-Link firmware from compiled "Mar 29 2024", to "compiled May 15 2024". The result is the same, but the log might be a bit more informative:

    Flashing build to 176000446
    west flash -d beacon/build --skip-rebuild --dev-id 176000446

    -- west flash: using runner nrfjprog
    -- runners.nrfjprog: Flashing file: beacon/build/zephyr/merged_domains.hex
    -- runners.nrfjprog: beacon/build/zephyr/merged_domains.hex targets both nRF53 coprocessors; splitting it into: beacon/build/zephyr/GENERATED_CP_NETWORK_merged_domains.hex and beacon/build/zephyr/GENERATED_CP_APPLICATION_merged_domains.hex
    [ ###### ] 0.000s | Erase file - Check image
    [ ##### ] 0.000s | Check image validity - Initialize device info
    [ ########## ] 0.000s | Check image validity - Check region 0 settings
    [ ############### ] 0.000s | Check image validity - block 1 of 2
    [ #################### ] 0.037s | Check image validity - Finished
    [ ############# ] 0.000s | Erase file - Erasing
    [ ########## ] 0.000s | Erasing non-volatile memory - block 1 of 1
    [ #################### ] 0.000s | Erasing non-volatile memory - Erase successful
    [ #################### ] 9.519s | Erase file - Done erasing

    [ ###### ] 0.000s | Program file - Checking image
    [ ##### ] 0.000s | Check image validity - Initialize device info
    [ ########## ] 0.000s | Check image validity - Check region 0 settings
    [ ############### ] 3.236s | Check image validity - block 1 of 2
    [ #################### ] 0.006s | Check image validity - Finished
    [ ############# ] 0.000s | Program file - Programming
    [ ########## ] 0.000s | Programming image - block 1 of 1
    [ #################### ] 0.000s | Programming image - Write successful
    [ #################### ] 1.097s | Program file - Done programming

    [ ###### ] 0.000s | Verify file - Check image
    [ ##### ] 0.000s | Check image validity - Initialize device info
    [ ########## ] 0.000s | Check image validity - Check region 0 settings
    [ ############### ] 3.634s | Check image validity - block 1 of 2
    [ #################### ] 0.009s | Check image validity - Finished
    [ ############# ] 0.000s | Verify file - Verifying
    [ ########## ] 0.000s | Verifying image - block 1 of 1
    [ #################### ] 0.000s | Verifying image - Verify successful
    [ #################### ] 1.097s | Verify file - Done verifying

    [ ###### ] 0.000s | Erase file - Check image
    [ ## ] 0.000s | Check image validity - Initialize device info
    [ ##### ] 0.000s | Check image validity - Check region 0 settings
    [ ####### ] 0.000s | Check image validity - block 1 of 6
    [ ########## ] 0.036s | Check image validity - block 2 of 6
    [ ############ ] 0.027s | Check image validity - block 3 of 6
    [ ############### ] 0.017s | Check image validity - block 4 of 6
    [ ################# ] 0.009s | Check image validity - block 5 of 6
    [ #################### ] 0.088s | Check image validity - Finished
    [ ############# ] 0.000s | Erase file - Erasing
    [ ### ] 0.000s | Erasing non-volatile memory - block 1 of 5
    [ ###### ] 0.272s | Erasing non-volatile memory - block 2 of 5
    [ ########## ] 0.501s | Erasing non-volatile memory - block 3 of 5
    [ ############# ] 0.510s | Erasing non-volatile memory - block 4 of 5
    [ ################ ] 0.225s | Erasing non-volatile memory - block 5 of 5
    Timed out waiting for progress updates.
    Did worker process die?
    [ #################### ] 0.000s | Erasing non-volatile memory - Erase successful
    [ #################### ] 21.276s | Erase file - Done erasing

    ERROR: An internal error has occurred, please try again.
    NOTE: For additional output, try running again with logging enabled (--log).
    NOTE: Any generated log error messages will be displayed.
    FATAL ERROR: command exited with status 63: nrfjprog --program beacon/build/zephyr/GENERATED_CP_APPLICATION_merged_domains.hex --sectorerase --verify -f NRF53 --coprocessor CP_APPLICATION --snr 176000446

    * The terminal process terminated with exit code: 63.



    So in this case it seems the flashing the NET image succeeds, but the process gives up on flashing the APP image because it takes too long, even though it eventually succeeds!

Reply
  • FWIW, I recently upgraded to J-Link v7.96h which prompted me to upgrade my J-Link firmware from compiled "Mar 29 2024", to "compiled May 15 2024". The result is the same, but the log might be a bit more informative:

    Flashing build to 176000446
    west flash -d beacon/build --skip-rebuild --dev-id 176000446

    -- west flash: using runner nrfjprog
    -- runners.nrfjprog: Flashing file: beacon/build/zephyr/merged_domains.hex
    -- runners.nrfjprog: beacon/build/zephyr/merged_domains.hex targets both nRF53 coprocessors; splitting it into: beacon/build/zephyr/GENERATED_CP_NETWORK_merged_domains.hex and beacon/build/zephyr/GENERATED_CP_APPLICATION_merged_domains.hex
    [ ###### ] 0.000s | Erase file - Check image
    [ ##### ] 0.000s | Check image validity - Initialize device info
    [ ########## ] 0.000s | Check image validity - Check region 0 settings
    [ ############### ] 0.000s | Check image validity - block 1 of 2
    [ #################### ] 0.037s | Check image validity - Finished
    [ ############# ] 0.000s | Erase file - Erasing
    [ ########## ] 0.000s | Erasing non-volatile memory - block 1 of 1
    [ #################### ] 0.000s | Erasing non-volatile memory - Erase successful
    [ #################### ] 9.519s | Erase file - Done erasing

    [ ###### ] 0.000s | Program file - Checking image
    [ ##### ] 0.000s | Check image validity - Initialize device info
    [ ########## ] 0.000s | Check image validity - Check region 0 settings
    [ ############### ] 3.236s | Check image validity - block 1 of 2
    [ #################### ] 0.006s | Check image validity - Finished
    [ ############# ] 0.000s | Program file - Programming
    [ ########## ] 0.000s | Programming image - block 1 of 1
    [ #################### ] 0.000s | Programming image - Write successful
    [ #################### ] 1.097s | Program file - Done programming

    [ ###### ] 0.000s | Verify file - Check image
    [ ##### ] 0.000s | Check image validity - Initialize device info
    [ ########## ] 0.000s | Check image validity - Check region 0 settings
    [ ############### ] 3.634s | Check image validity - block 1 of 2
    [ #################### ] 0.009s | Check image validity - Finished
    [ ############# ] 0.000s | Verify file - Verifying
    [ ########## ] 0.000s | Verifying image - block 1 of 1
    [ #################### ] 0.000s | Verifying image - Verify successful
    [ #################### ] 1.097s | Verify file - Done verifying

    [ ###### ] 0.000s | Erase file - Check image
    [ ## ] 0.000s | Check image validity - Initialize device info
    [ ##### ] 0.000s | Check image validity - Check region 0 settings
    [ ####### ] 0.000s | Check image validity - block 1 of 6
    [ ########## ] 0.036s | Check image validity - block 2 of 6
    [ ############ ] 0.027s | Check image validity - block 3 of 6
    [ ############### ] 0.017s | Check image validity - block 4 of 6
    [ ################# ] 0.009s | Check image validity - block 5 of 6
    [ #################### ] 0.088s | Check image validity - Finished
    [ ############# ] 0.000s | Erase file - Erasing
    [ ### ] 0.000s | Erasing non-volatile memory - block 1 of 5
    [ ###### ] 0.272s | Erasing non-volatile memory - block 2 of 5
    [ ########## ] 0.501s | Erasing non-volatile memory - block 3 of 5
    [ ############# ] 0.510s | Erasing non-volatile memory - block 4 of 5
    [ ################ ] 0.225s | Erasing non-volatile memory - block 5 of 5
    Timed out waiting for progress updates.
    Did worker process die?
    [ #################### ] 0.000s | Erasing non-volatile memory - Erase successful
    [ #################### ] 21.276s | Erase file - Done erasing

    ERROR: An internal error has occurred, please try again.
    NOTE: For additional output, try running again with logging enabled (--log).
    NOTE: Any generated log error messages will be displayed.
    FATAL ERROR: command exited with status 63: nrfjprog --program beacon/build/zephyr/GENERATED_CP_APPLICATION_merged_domains.hex --sectorerase --verify -f NRF53 --coprocessor CP_APPLICATION --snr 176000446

    * The terminal process terminated with exit code: 63.



    So in this case it seems the flashing the NET image succeeds, but the process gives up on flashing the APP image because it takes too long, even though it eventually succeeds!

Children
  • I had to do something, this is killing me.

    So I found that the script responsible for the problematic commands is here:

    /opt/nordic/ncs/v2.5.2/zephyr/scripts/west_commands/runners/nrfjprog.py

    After some perusing, I ended up commenting out line 68 and inserting this instead:

                elif erase == 'ERASE_PAGES':
                    self.logger.info('Forcing recover mode')
                    cmd.append('--recover')
    #                cmd.append('--sectorerase')

    In other words, I disabled "sectorerase" and replaced it with "recover". Flashing now takes around 10 seconds instead of well over a minute. Crucially, the erase step that used to take 30 seconds or so and timeout, only takes a second or so, and succeeds.

    I can't begin to tell you the change in ability to get things done that occurs when every time you click on "Build" or "Debug", you get a result within 10 seconds instead of bombing out 60+ seconds later.

    So obviously this is a big dirty hack, and I'm far from confident in the result, but I can see big portions of the script are hopeful hacks anyway. At least I can make progress, and start to work on all the other debugging dramas, like no restart function, GBD crashing and RTT not working. Some time in the future I hope to be able to debug my application, rather than the IDE.

  • I think in this case recover does chip erase instead of sector erase, which is indeed faster.

    If you hover over the "Flash" button in the extension, there should be a small button that does "Erase and Flash" to the right of the main "Flash" button. Does it behave the same by chance as your hack?

    Best regards,

    Michal

  • I think that's something else again. The "recover" seems to be what happens when you have a flash failure and you're prompted with a message about the flash being "protected" - you're given the option to "recover". I often wish I could just go straight to that, instead of hoping it fails and I get the prompt. Now I can.

  • I understand, thank you for the information.

    Best regards,

    Michal

Related