FOTA over BLE fails after upgrade to NCS 2.8.0

Hi,

I'm trying to implement firmware updates over BLE with mcuboot as boot loader.

I got it working with NCS 2.6.1, but I've upgraded to NCS 2.8.0 and now nRF Connect for iOS gives the error: Sending the request timed out.

I have the following configuration (of which I think it's important to the issue):

sysbuild.conf:

SB_CONFIG_BOOTLOADER_MCUBOOT=y
SB_CONFIG_BOOT_SIGNATURE_TYPE_ECDSA_P256=y
SB_CONFIG_BOOT_SIGNATURE_KEY_FILE="${APP_DIR}/priv.pem"

SB_CONFIG_PARTITION_MANAGER=y

SB_CONFIG_DFU_ZIP=y
SB_CONFIG_DFU_ZIP_APP=y

prj.conf:

CONFIG_BT_SMP=y

# FOTA configuration
# Configuration is basically the expansion of:
# CONFIG_NCS_SAMPLE_MCUMGR_BT_OTA_DFU=y
# CONFIG_NCS_SAMPLE_MCUMGR_BT_OTA_DFU_SPEEDUP=y
CONFIG_MCUMGR=y
CONFIG_IMG_ERASE_PROGRESSIVELY=y
CONFIG_NET_BUF=y
CONFIG_ZCBOR=y
CONFIG_CRC=y
CONFIG_MCUMGR_TRANSPORT_BT=y
CONFIG_MCUMGR_TRANSPORT_BT_CONN_PARAM_CONTROL=y
CONFIG_MCUMGR_TRANSPORT_BT_PERM_RW_ENCRYPT=y
CONFIG_IMG_MANAGER=y
CONFIG_STREAM_FLASH=y
CONFIG_FLASH_MAP=y
CONFIG_FLASH=y
CONFIG_MCUMGR_GRP_IMG=y
CONFIG_MCUMGR_GRP_OS=y
CONFIG_MCUMGR_GRP_OS_BOOTLOADER_INFO=y
CONFIG_MCUMGR_GRP_OS_MCUMGR_PARAMS=y
CONFIG_MCUMGR_TRANSPORT_BT_REASSEMBLY=y
CONFIG_BT_L2CAP_TX_MTU=498
CONFIG_BT_BUF_ACL_RX_SIZE=502
CONFIG_BT_BUF_ACL_TX_SIZE=251
CONFIG_BT_CTLR_DATA_LENGTH_MAX=251

Apart from shuffling a few config's from prj.conf to sysbuild.conf, the only change is that I needed to add 'CONFIG_MCUMGR_TRANSPORT_BT_PERM_RW_ENCRYPT' to allow access to the SMP characteristics as CONFIG_BT_SMP is enabled but I do not have MITM protection.

I can use nRFConnect for iOS to list the images, get bootloader info and reboot the device, so it seems that the basic SMP commands are working. Why does it error-out while trying to perform the actual update? Did I miss something while migrating to NCS 2.8.0?

Thanks in advance.

Kind regards,

Remco Poelstra

Parents
  • Hi,

    Which version of the nRF Connect for iOS application do you use?

    Have you tried to use Device Manager mobile application instead? Do you see any difference in result compared to nRF Connect for iOS?
    You can look at this testing steps when performing FOTA update on the nrf52 device.

    Apart from shuffling a few config's from prj.conf to sysbuild.conf, the only change is that I needed to add 'CONFIG_MCUMGR_TRANSPORT_BT_PERM_RW_ENCRYPT' to allow access to the SMP characteristics as CONFIG_BT_SMP is enabled but I do not have MITM protection.

    What happens if you do not add CONFIG_MCUMGR_TRANSPORT_BT_PERM_RW_ENCRYPT? Do you get the same error?

    Best regards,
    Dejan

Reply
  • Hi,

    Which version of the nRF Connect for iOS application do you use?

    Have you tried to use Device Manager mobile application instead? Do you see any difference in result compared to nRF Connect for iOS?
    You can look at this testing steps when performing FOTA update on the nrf52 device.

    Apart from shuffling a few config's from prj.conf to sysbuild.conf, the only change is that I needed to add 'CONFIG_MCUMGR_TRANSPORT_BT_PERM_RW_ENCRYPT' to allow access to the SMP characteristics as CONFIG_BT_SMP is enabled but I do not have MITM protection.

    What happens if you do not add CONFIG_MCUMGR_TRANSPORT_BT_PERM_RW_ENCRYPT? Do you get the same error?

    Best regards,
    Dejan

Children
  • Hi,

    I'm using nRF Connect for iOS version 2.7.13(40).

    I just tried Device Manager (1.8.1(3)) and it seems to have the same problem. It gets stuck on UPLOADING... after I press 'Start'. Other functions seem to work fine.

    I I remove CONFIG_MCUMGR_TRANSPORT_BT_PERM_RW_ENCRYPT I can't use the SMP functions at all. The device disconnects and reports an error (over RTT) that the requested authentication level can't be reached (which is an understandable error in my situation).

    During the last tests I had some logging intensive task running and I noticed that during the upload the device (also) logs 'mcuboot_util: Image index: 0, Swap type: none' and keeps logging the other task for a short while and then completely stops. It seems the processor completely hangs. There is no other output and running the firmware in the debugger also doesn't trigger a breakpoint or something. It simply all stops.

  • Hi,

    If you have any newer Android phone, you could try to test if the same problem appears when using corresponding Android applications.

    Best regards,
    Dejan

  • I'm sorry, I don't have a recent Android device.

    I think the problem is on the firmware side, as I can't explain the hanging of the firmware otherwise. That would explain why the mobile device can't continue the upload, though.

  • Hi,

    Remco Poelstra said:
    I just tried Device Manager (1.8.1(3)) and it seems to have the same problem. It gets stuck on UPLOADING... after I press 'Start'. Other functions seem to work fine.
    Remco Poelstra said:
    During the last tests I had some logging intensive task running and I noticed that during the upload the device (also) logs 'mcuboot_util: Image index: 0, Swap type: none' and keeps logging the other task for a short while and then completely stops. It seems the processor completely hangs. There is no other output and running the firmware in the debugger also doesn't trigger a breakpoint or something. It simply all stops.

    What happens if you remove all logging intensive tasks? 

    Could you provide full logs when using both nRF Connect for iOS and Device Manager?

    Best regards,
    Dejan

  • Hi,

    Thanks for suggesting to disable the logging intensive task. This allowed me to track down the issue to the following call:

    err = adc_read_async(adc, &sequence, &adc_signal);

    It doesn't matter whether I include the &adc_signal or not, if it's NULL the firmware upload still times out. So it seems that the ADC code running results in some kind of dead-lock in the system.

    I've included the complete ADC reading code, but it's basically a timer that calls adc_read_async at a 100 Hz interval. I'm unsure why this should cause a dead-lock when used in companion with the mcumgr code.

    #include <zephyr/kernel.h>
    #include <zephyr/drivers/adc.h>
    #include <zephyr/logging/log.h>
    LOG_MODULE_REGISTER(adc, LOG_LEVEL_DBG);
    
    #include "../common.h"
    
    #ifdef CONFIG_BOARD_T502534
    #define ADC_NODE DT_NODELABEL(adc) // ADC node from the device tree
    static const struct device *adc = DEVICE_DT_GET(ADC_NODE); //Data of ADC device specified in device tree
    static const struct adc_channel_cfg channel_cfgs[] = {
        DT_FOREACH_CHILD_SEP(ADC_NODE, ADC_CHANNEL_CFG_DT, (,))
    };
    #define CHANNEL_COUNT ARRAY_SIZE(channel_cfgs)
    #endif
    
    /* Define a variable of type adc_sequence and a buffer of type int16_t */
    static int16_t buf[CHANNEL_COUNT];
    static struct adc_sequence sequence = {
        .buffer = buf,
        .buffer_size = sizeof(buf),
        .resolution = 12,
        .oversampling = 0,
        // Optional
        .calibrate = true, //TODO: Will this cause the ADC to calibrate for each sample?? We don't want that probably
    };
    
    static struct k_poll_signal adc_signal;
    static struct k_poll_event adc_event;
    
    /**
     * ADC main loop called by the timer
     */
    static void adc_timer_handler(struct k_timer *timer_id)
    {
    #ifdef CONFIG_BOARD_T502534
        int err;
        err = adc_read_async(adc, &sequence, &adc_signal);
        if (err < 0) {
            LOG_ERR("Failed to initialize ADC read: %d", err);
            return;
        }
    #endif    
    }
    
    void adcInit(void) {
    #ifdef CONFIG_BOARD_T502534
        ASSERT_ERROR_AND_REBOOT(!device_is_ready(adc));
    
        for (size_t i = 0; i < CHANNEL_COUNT; i++) {
            sequence.channels |= BIT(channel_cfgs[i].channel_id);
            ASSERT_ERROR_AND_REBOOT(adc_channel_setup(adc, &channel_cfgs[i]));
        }
    
        // use a timer to precisely trigger ADC reads at 100hz
        static struct k_timer adc_timer;
        k_timer_init(&adc_timer, adc_timer_handler, NULL);
        k_timer_start(&adc_timer, K_MSEC(10), K_MSEC(10));
    
        k_poll_signal_init(&adc_signal);
        k_poll_event_init(&adc_event, K_POLL_TYPE_SIGNAL, K_POLL_MODE_NOTIFY_ONLY, &adc_signal);
    #endif
    }
    
    static void adc_report_error(int result) {
        switch(result) {
            case 0:
                // Success
            case -EINVAL: 
                LOG_WRN("Parameter with an invalid value has been provided");
                break;
            case -ENOMEM: 
                LOG_WRN("Provided buffer is to small");
                break;
            case -ENOTSUP: 
                LOG_WRN("Requested mode of operation is not supported");
                break;
            case -EBUSY: 
                LOG_WRN("Another sampling was triggered while the previous one was still in progress");
                // try a longer interval
                break;
            case -EAGAIN:
                LOG_WRN("ADC read returned EAGAIN, conversion not ready");
                // try blocking reads instead
                break;
            default:
                LOG_ERR("Error reading ADC (%d)", result);
        }
    }
    
    int getPositionBlocking(int16_t val[2], k_timeout_t timeout) {
        // wait for adc read to complete
        int ret = k_poll(&adc_event, 1, timeout);
        if(ret != 0) {
            // LOG_ERR("k_poll failed (%d)", ret);
            return -EAGAIN;
        }
    
        // check if adc has error flags
        if (0 != adc_event.signal->result) {
            adc_report_error(adc_event.signal->result);
            return adc_event.signal->result;
        }
    
        // inform callers of updated value, available at &buf
        val[0] = buf[0];
        val[1] = buf[1];
    
        // housekeeping at end of loop
        adc_event.signal->signaled = 0;
        adc_event.state = K_POLL_STATE_NOT_READY;
    
        return 0; // return success
    }
    
    void getPosition(int16_t val[2]) {
        val[0] = buf[0];
        val[1] = buf[1];
    }

Related