This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

nRF Connect SDK DFU over BLE fails with bad State: Remote error: Unknown(1)?

I am having trouble setting up DFU over BLE in nRF Connect SDK. I am using NCS 1.9.1.

I have followed these instructions with my own project on both the nrf52840 devkit and a custom board: https://devzone.nordicsemi.com/guides/nrf-connect-sdk-guides/b/software/posts/ncs-dfu

Side note: The DFU UUID in this guide did not work for me: I used the UUID that is shown in the iOS nRF Connect Device Manager App when no devices are found:

BT_UUID_128_ENCODE(0x8d53dc1d, 0x1db7, 0x4cd3, 0x868b, 0x8a527460aa84)
With this UUID, I was able to connect with the device manager. However, when I start the image upgrade with "erase application settings" enabled and mode "test and confirm", I get this message: "State: Remote error: Unknown(1)". If I try again after that, the state says "UPLOADING..." until it times out. (restarting the app recreates the unknown(1) error).
When I try uploading the same firmware image to the same device via the nRF Connect for iOS app, I get more details. The log looks good until I get a message "Length of data to send exceeds MTU". This seems straightforward, but I am failing to get it resolved.
Is 252 (as stated in the above guide) the correct MTU length? If not, how do I change it? I've tried messing around with several of the CONFIG_ values, but it doesn't appear to update because I see in the nRF Connect iOS app the MTU is the default 23. 
Any help appreciated, thank you!
EDIT:

for anyone else who comes across this problem:
- "Length of data to send exceeds MTU" is an expected message and doesn't indicate a problem
- The issue ended up being that I had set CONFIG_HEAP_MEM_POOL_SIZE=256, which is too small for the DFU to allocate the buffer needed to write to flash. OK values I tested are 4096 and 0(instead of allocating from heap, a static buffer is used). Optimal size is probably sizeof(struct flash_img_context)
Parents
  • Hi,

    I've tried the guide myself and I believe I was able to recreate the issue you're experiencing (albeit with an Android device and app). I am adding a description of my issue so you may compare it to your case, aswell as my fix to this issue: 

    Description:

    After building and flashing with erase setting, connecting with nRF Connect, adding the update image to the and pressing "test and confirm", the app starts to download the image, before the DK crashes and resets before it connects again and repeats the motion until it times out. Let me know if I've understood correct w.r to this being similar to your experience.

    Fix

    Increase CONFIG_SYSTEM_WORKQUEUE_STACK_SIZE in proj.conf from 4096 to 8192 "CONFIG_SYSTEM_WORKQUEUE_STACK_SIZE=8192"

    This should allow you to download and update the image without the DK crashing.
    Let me know if this fixes your issue, or if you've tried this already.

    Kind regards, 
    Andreas
  • Thank you for checking it out, however I'm not getting the same symptoms as you. My device does not crash and reset, in fact it is completely functional and recoverable after every attempt I've tried so far. I did increase the workq stack size but it had no effect, in the iOS nRF Connect app I still get the "Length of data to send exceeds MTU" even though on the device's advertisement page the MTU is set at 252 bytes.

  • Allright, then I'll have one more potential solutions to this issue before digging deeper. 

    Simon referred me to this other similar issue where an iOS user were experiencing the same data-length issue. Here the conclusion was that this is to be expected and that the message only ment that the iOS DFU library will send the data in more than one transmission, and will not cause any problems for you. 

    If this is correct, then it should be possible to check the difference in build timestamps according to the testing step in the guide to verify if the DFU went through and the new build is running

    Could you try to perform the testing steps for DFU over Bluetooth and compare the two serial terminal outputs to see if it is the first or second build that boots up after pressing test and confirm? 



    AHaug said:
    Fix

    Post note: This seems to be an issue when running the guide with NCS v1.9.1 and were not present in v1.8.0

    Kind regards,
    Andreas

  • That does look like a similar issue, but for me the DFU never completes, progress bar never fills up, the device doesn't reset and the build timestamp on the device does not update with the new image. My nRF Connect iOS app shows the exact same error log as the issue you linked, but stops at the MTU message.

    If the MTU message isn't a concern, maybe there is something else that is causing this issue. 

  • Ok, now I have noticed the on my first attempt at flashing the image via DFU, I get the message "DFU failed: this image has already been flashed (hash value of the previous image matches current one.)". At first I thought I was just receiving this error because I was trying to flash the same image over and over again and it was maybe a caching issue within the app, but I also receive this error when I'm trying to flash a completely new image. I checked that the binaries that I'm testing are distinct. Is this normal?

    In the Device Manager app, under Image>Advanced I read the current image on the device as: 
    Split status: 0
    Image: 0
    - Slot: 0
    - Version: 0.0.0
    - Hash: XXXXXXXX(long string of hex, different from what is shown when I select an image to flash)
    - Flags: Bootable, Confirmed, Active


    Under stats I get "No status found." both before and after attempting to flash an image. 

    Are these values normal? It seems pretty empty to me 

  • I stripped all other functionality out of my code and applied the DFU setup to the button example, the project zip file is attached. This produces the same problem on my end. I will attempt the DFU from an android phone tomorrow, please let me know if you find anything that may be wrong with my setup.

    button_dfu_example.zip

  • Hi,

    dev_giraffe said:
    I stripped all other functionality out of my code and applied the DFU setup to the button example, the project zip file is attached.

    Great! Thanks for supplying the project zip. I will set up at similar test with this project to see if I am able to reproduce the error.

    dev_giraffe said:
    At first I thought I was just receiving this error because I was trying to flash the same image over and over again and it was maybe a caching issue within the app, but I also receive this error when I'm trying to flash a completely new image.

    I must admit that the thought struck me. But if you get the same issue with a new image it should not be the case. Just to confirm: The file you're uploading to the device manager to perform the DFU is the app_update.bin file located in build->zephyr, right?

    dev_giraffe said:
    In the Device Manager app, under Image>Advanced I read the current image on the device as: 

    It seems like the iOS apps app is somewhat different than the Android app (which is not really a surprise), so I will try to find someone who can perform the same tests with iOS. The I can only see the file name, size and Hash under the advanced tab here. The file hashes between two different builds are distinct as shown in the image 

    dev_giraffe said:
    Are these values normal?

    I will ask the iOS app developers if they know the answer to this. 

    I'll come back with an answer with my progress as soon as I can verify with the developers, and with results from reproducing the issue with your project, none later than by the end of this working day

    Kind regards,
    Andreas

Reply
  • Hi,

    dev_giraffe said:
    I stripped all other functionality out of my code and applied the DFU setup to the button example, the project zip file is attached.

    Great! Thanks for supplying the project zip. I will set up at similar test with this project to see if I am able to reproduce the error.

    dev_giraffe said:
    At first I thought I was just receiving this error because I was trying to flash the same image over and over again and it was maybe a caching issue within the app, but I also receive this error when I'm trying to flash a completely new image.

    I must admit that the thought struck me. But if you get the same issue with a new image it should not be the case. Just to confirm: The file you're uploading to the device manager to perform the DFU is the app_update.bin file located in build->zephyr, right?

    dev_giraffe said:
    In the Device Manager app, under Image>Advanced I read the current image on the device as: 

    It seems like the iOS apps app is somewhat different than the Android app (which is not really a surprise), so I will try to find someone who can perform the same tests with iOS. The I can only see the file name, size and Hash under the advanced tab here. The file hashes between two different builds are distinct as shown in the image 

    dev_giraffe said:
    Are these values normal?

    I will ask the iOS app developers if they know the answer to this. 

    I'll come back with an answer with my progress as soon as I can verify with the developers, and with results from reproducing the issue with your project, none later than by the end of this working day

    Kind regards,
    Andreas

Children
  • Just to confirm: The file you're uploading to the device manager to perform the DFU is the app_update.bin file located in build->zephyr, right?

    Correct

    Thank you Andreas, I'll continue to look into it on my end

  • Hi,

    Updating with the results from today

    I was able to reproduce your errors, but sadly I have found no fix so far. The symptoms are that the update starts and is stuck at state "uploading" until time out where I get the state "Transaction 102 timed out without receiving a response". This leads me to believe (but not necessarily verify) that the DFU implementation is lacking anything.

    Did you have any more luck on your end?

    If you follow the guide to the letter, does the DFU work for you with the modified peripheral_lbs sample?

    Also, another question regarding precisely how you performed the update steps just to exclude any mishaps: Is it correct that you performed the following steps when testing?
    1) Build and flash the application to the DK/Custom board
    2) Build the application again and add the app_update.bin binary from build/zephyr to the device with Device Manager
    3) Selecting the app_uppdate.bin on the image tab
    4) Pressing  start -> Test and confirm

    I have asked the app team for more information regarding the rest of the questions as well as for input on details regarding the fault messages.

    Kind regards,
    Andreas

  • I tried the DFU using the zip'd project I uploaded and ran into the same error. 

    I followed the guide to the letter and succeeded in DFU. What's more, I didn't get the error message "DFU failed: this image has already been flashed (hash value of the previous image matches current one.)". I am trying to figure out what the difference is between my code and the peripheral LBS but struggling to find something that makes my app's DFU work.

    I am beginning to suspect that the app refuses to send the DFU because it believes that the hashes are identical, but I can't figure out why the app thinks so. 

  • Hi,

    I got a reply from the apps team:

    1. Q: When testing according to this DFU guide the they first get the message "State: Remote error: Unknown(1)", and get stuck with at the state "UPLOADING" until it times out. I was able to recreate this issue with Android as well, with the addition "Transaction 102 timed out without receiving a response". His logs states that "Length of data to send exceeds MTU", which I've come to understand (from this case) as that the app needs multiple transmissions to perform the update. 

      A
       Yesterday we released a new version of the mcu manager library for iOS. As the previous one was maintained by JuulLabs, we had to create a new library, instead of just incrementing the version. The library is located at the same GitHub location, but (if the user is using CocoaPods dependendy manager) change changed name from "McuManager" to "iOSMcuManagerLibrary". Current version is 1.1.0. If they are using SMP, or some other way of including the code, this may not be important.

      The Unknown (1) error is sent from the remote device. The meaning depends on the method throwing this error. For example, it may be thrown from: https://github.com/nrfconnect/sdk-zephyr/blob/79b77b4f6130d0db539a6f6655b3e03818ab98a6/subsys/mgmt/mcumgr/lib/cmd/img_mgmt/src/img_mgmt.c#L427 but it depends on the version of their firmware.

      The "Length of data to send exceeds MTU" is always thrown in the library at the beginning, as the manager does not know the MTU of the transport layer, so tries to send super huge packet initially, gets the error with correct MTU and continues with smaller chunks. That is expected and not an error.

      The "Transaction 102 timed out without receiving a response" should not happen. What "Number of mcumgr buffers" are they using? As far as I know one buffer is needed for responses, so for pipelinig we use 3 in the app, assuming the value is set to 4 (default). After setting to 4 in the app you may get some packets getting lost. Try decreasing this value (will make DFU slower). But even with this error, the upload should be successful. The device sometimes does not have a buffer to send a notification, or the notification is lost somewhere. usually, the fw is confirmed by the notification for the following chunk, so the data have been sent correctly.

    2. Q: The customer also states that they get the message "DFU failed: this image has already been flashed (hash value of the prevous image matches current one.)", which also happens with a completely new image. Is it possible this has occured when the images are distinct (different hash values), or is it more likely that they uploaded the same app_uppdate image to the Device Manager for testing?

      A: This can only happen if the new fw has the same hash as the one on the device. Most prob they're trying to send the same fw. This should not fail the upload, but make it actually succeed immediately. Please, update to latest library version and try again.

    3. Q: Under the "Stats"-tab, they see "No status found" both before and after attempting to flash the image. Is this to be expected?

      A:This depends on the fw on the device. Whatever it returns, the app displays. The SMP Server sample from NCS returns number of ticks since boot, as far as I remember. Other samples may just throw error 8 (not supported).

    What I see from this answer is that the issue is not necessarily an mobile app issue, but rather caused the DFU implementation. I will keep investigating to see if I can spot something that might cause this

    Let me know if the answer above clarifies things for you!

    Kind regards,
    Andreas

  • The Unknown (1) error is sent from the remote device. The meaning depends on the method throwing this error.

    I was able to track down the source of MGMT_ERR_EUNKNOWN (which is the error code for a lot of conditions) to img_mgmt.c line 499. It appears to fail at alloc_ctx() within img_mgmt_impl_write_image_data and causes the errstr "img_mgmt_err_str_flash_write_failed".

    On a hunch based off of alloc_ctx()'s function brief, I increased:

    CONFIG_HEAP_MEM_POOL_SIZE=256
    to 
    CONFIG_HEAP_MEM_POOL_SIZE=4096 (the default for a lot of cases)
    ... and the DFU suceeded!
    It looks like CONFIG_HEAP_MEM_POOL_SIZE=0 also works. (in which case it looks like it would use a statically allocated buffer for alloc_ctx())
    Not sure why I had originally set it to 256, I think maybe it had been a configuration I saw in an example while I was trying to get SPI to work. Frustrating, but glad it works now. The error cases could have been a bit more descriptive - I would have thought that the alloc_ctx() failure should return 
    MGMT_ERR_ENOMEM.
Related