nrfjprog times out with error flashing Thingy:91 - Did worker process die?

I'm getting timeout errors when trying to program a Thingy:91 with a J-Link Pro PoE on macOS on an ARM Apple M3 Max.

Here's the command and error message:

❯ west flash -i 1196000003
-- west flash: rebuilding
[0/26] Performing build step for 'tfm'
ninja: no work to do.
[1/7] Performing build step for 'mcuboot_subimage'
ninja: no work to do.
[2/7] Performing install step for 'tfm'
-- Install configuration: "MinSizeRel"
[6/6] Completed 'mcuboot_subimage'
-- west flash: using runner nrfjprog
-- runners.nrfjprog: Flashing file: /Users/chris/cgnd/clients/golioth/thingy91-golioth-workspace/app/build/zephyr/merged.hex
Timed out waiting for progress updates.sing non-volatile memory - block 2 of 2                                         
Did worker process die?
[ #################### ]  14.809s | Erase file - Done erasing                                                          
ERROR: An internal error has occurred, please try again.
NOTE: For additional output, try running again with logging enabled (--log).
NOTE: Any generated log error messages will be displayed.
FATAL ERROR: command exited with status 63: nrfjprog --program /Users/chris/cgnd/clients/golioth/thingy91-golioth-workspace/app/build/zephyr/merged.hex --sectorerase --verify -f NRF91 --snr 1196000003

I get the same error when running nrfjprog manually (I've attached the log file):

❯ nrfjprog --program /Users/chris/cgnd/clients/golioth/thingy91-golioth-workspace/app/build/zephyr/merged.hex --sectorerase --verify -f NRF91 --snr 1196000003 --log
Timed out waiting for progress updates.sing non-volatile memory - block 2 of 2                                         
Did worker process die?
[ #################### ]  16.435s | Erase file - Done erasing                                                          
ERROR: An internal error has occurred, please try again.

However, running nrfjprog --recover appears to fix whatever is causing the errors:

❯ nrfjprog --recover
Recovering device. This operation might take 30s.
Erasing user code and UICR flash areas.

❯ nrfjprog --log --program /Users/chris/cgnd/clients/golioth/thingy91-golioth-workspace/app/build/zephyr/merged.hex --sectorerase --verify -f NRF91 --snr 1196000003
[ #################### ]  13.059s | Erase file - Done erasing                                                          
[ #################### ]   2.836s | Program file - Done programming                                                    
[ #################### ]   2.820s | Verify file - Done verifying

Is there a way to get a better understanding of what's going on here? Is there something that my application code is doing that is preventing the J-Link from being able to flash the device reliably? What is nrfjprog --recover doing to "fix" this issue?

3173.log.log

Parents

0 cgnd_chris over 1 year ago

BTW, I only get these timeout errors when trying to flash a board that is running an existing firmware image. Flashing a board with empty flash works.

Also, if I first open the nRF Connect for Desktop Programmer GUI software and do "Erase all", and THEN run west flash, it works every time:

❯ west flash -i 1196000003
-- west flash: rebuilding
[0/26] Performing build step for 'tfm'
ninja: no work to do.
[1/7] Performing build step for 'mcuboot_subimage'
ninja: no work to do.
[2/7] Performing install step for 'tfm'
-- Install configuration: "MinSizeRel"
[6/6] Completed 'mcuboot_subimage'
-- west flash: using runner nrfjprog
-- runners.nrfjprog: Flashing file: /Users/chris/cgnd/clients/golioth/thingy91-golioth-workspace/app/build/zephyr/merged.hex
[ #################### ]  13.381s | Erase file - Done erasing                                                          
[ #################### ]   2.829s | Program file - Done programming                                                    
[ #################### ]   2.810s | Verify file - Done verifying                                                       
Applying system reset.
Run.
-- runners.nrfjprog: Board with serial number 1196000003 flashed successfully.

0 Michal over 1 year ago in reply to cgnd_chris

Hello,

Are you using APPROTECT on this board?

Best regards,

Michal
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Michal over 1 year ago in reply to cgnd_chris

Could you send me your .config file? It should be in build/zephyr/.config

Best regards,

Michal
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 cgnd_chris over 1 year ago in reply to Michal

Please see attached

4035.dot_config.txt
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Michal over 1 year ago in reply to cgnd_chris

I have forwarded the question to the experts and I will let you know when I will have any more information.

Best regards,

Michal
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Michal over 1 year ago in reply to Michal

We are still looking into it.

In the meantime, are you able to check if it works with an x86 computer?

Best regards,

Michal
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 liteyear over 1 year ago in reply to Michal

Any further insights? Same symptoms for me on a M1 using a JLink Pro, when flashing a nrf5340.

The --recover flag also resolves it for me, but that can't be used when starting a Debug session inside VSCode, because it triggers a rebuild, which is another issue. In that case I can just run "nrfjprog -e" before clicking Debug.

But it's a long and painful sequence of steps for such a core function.

I suspect it's related to the partitiion manager some how, and perhaps having a custom pm_static.yml is related. The complicated build process that creates and uploads to the network coprocessor first also seems implicated, but it's very hard to tease apart.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Reply

0 liteyear over 1 year ago in reply to Michal

Any further insights? Same symptoms for me on a M1 using a JLink Pro, when flashing a nrf5340.

The --recover flag also resolves it for me, but that can't be used when starting a Debug session inside VSCode, because it triggers a rebuild, which is another issue. In that case I can just run "nrfjprog -e" before clicking Debug.

But it's a long and painful sequence of steps for such a core function.

I suspect it's related to the partitiion manager some how, and perhaps having a custom pm_static.yml is related. The complicated build process that creates and uploads to the network coprocessor first also seems implicated, but it's very hard to tease apart.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Children

0 Michal over 1 year ago in reply to liteyear

We weren't able to reproduce it unfortunately.

Does it happen on a basic sample like hello world or blinky as well?

The extension also has an option of attaching the debugger to a running target, could that work as a workaround at least?

Best regards,

Michal
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 liteyear over 1 year ago in reply to Michal

Okay I came up with a MRE:

1. Create new application based on the Zephyr Bluetooth Beacon sample.
2. Replace the prj.conf with the attached*. This just adds an assortment of the configs we are working with. I'm sure there's a smaller set, but it's laborious to minimise and this set does the trick.
3. Create a build configuration for nrf7002dk_nrf5340_cpuapp_ns and build.
4. Connect to a nrf7002dk.
5. Click Flash a few times. In my experience it works the first time after an automatic recover and some very long pauses, but then starts to fail after that with "did worker die?" errors.

Note I was able to eliminate the pm_static.yml file from the MRE. So I'm now wondering if it's simply the size of the image that triggers the issue. I couldn't pinpoint what the prj.conf addition was that trigged the MRE, but things seem to start tripping up once I added a bunch of them.

* "attachment": https://pastebin.com/ia4ijcb7
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 liteyear over 1 year ago in reply to Michal

Michal were you able to reproduce using my MRE?
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Michal over 1 year ago in reply to liteyear

Sorry, I was away for a while unfortunately.

I will have to check with one of my colleagues with access to one of the new Macs if they could test that.

Best regards,

Michal
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 cgnd_chris over 1 year ago in reply to Michal

liteyear what version of nRF Command Line Tools are you using? I just updated to the latest 10.24.2 build for macOS and I'm not getting the timeout errors I was getting earlier. If you are using an older version, can you try updating to the latest macOS build from https://www.nordicsemi.com/Products/Development-tools/nRF-Command-Line-Tools/Download and see if that fixes it?
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel