BLE failures with EIO

I'm developing a BLE application using several connections, and of course I'm advertising and scanning to establish those connections. I have up to 5 connections at a time and running both peripheral and central roles.

Occasionally I get an error when trying to do a BLE operation and it's not clear to me what the error means for how I should respond. In particular I sometimes get -EIO when calling bt_le_ext_adv_start() or bt_conn_disconnect(). A cursory look through the code in these functions doesn't show an obvious reason why EIO occurs, except that it may be caused by an error when sending an HCI command to the controller in bt_hci_cmd_send_sync(). Looking at the logs, I do sometimes see a warning like "bt_hci_core: opcode 0x200e status 0x0c" which is indeed generated in bt_hci_cmd_send_sync(). Since the EIO error only occurs occasionally I suspect it may be occurring due to a timing window where the host and controller are out of sync, or some other transient condition. In some cases retrying the operation later works eventually, sometimes not (advertising seems to often fail many times in a row and then succeed; disconnecting failures seem to never succeed with retries).

At the end of the day I just want my application to robustly handle error conditions like this. But, with the error code EIO I'm not sure how to respond. If it's dependent on the specific opcode/status code, the application doesn't have access to this information (only EIO). Can you give me any insight into what I should do in situations like this? Is there any way for the application to know more specific error codes so it can respond appropriately?

Details: I am using NCS 2.4.1 on the nRF5340 with the net core running the soft device controller. I think these are the relevant parts of my project configuration.

App core:

CONFIG_BT=y
CONFIG_BT_PERIPHERAL=y
CONFIG_BT_CENTRAL=y

CONFIG_BT_MAX_CONN=5

CONFIG_BT_EXT_ADV=y
CONFIG_BT_EXT_ADV_MAX_ADV_SET=2

CONFIG_BT_GATT_DM=y

CONFIG_BT_BUF_ACL_TX_SIZE=251
CONFIG_BT_BUF_ACL_TX_COUNT=20
CONFIG_BT_BUF_ACL_RX_SIZE=251

CONFIG_BT_FILTER_ACCEPT_LIST=y
CONFIG_BT_USER_PHY_UPDATE=y
CONFIG_BT_USER_DATA_LEN_UPDATE=y
CONFIG_BT_SMP=y
CONFIG_BT_DEVICE_NAME="ABC"

CONFIG_BT_ATT_PREPARE_COUNT=2
CONFIG_BT_GATT_CLIENT=y
CONFIG_BT_GAP_AUTO_UPDATE_CONN_PARAMS=n

CONFIG_BT_L2CAP_TX_MTU=247
CONFIG_BT_L2CAP_DYNAMIC_CHANNEL=y

CONFIG_BT_SCAN=y
CONFIG_BT_SCAN_FILTER_ENABLE=y
CONFIG_BT_SCAN_UUID_CNT=1

Net core:

CONFIG_BT_MAX_CONN=5

CONFIG_BT_BUF_ACL_TX_SIZE=251
CONFIG_BT_BUF_ACL_TX_COUNT=20
CONFIG_BT_BUF_ACL_RX_SIZE=251

CONFIG_BT_CTLR=y
CONFIG_BT_CTLR_ADV_DATA_LEN_MAX=255
CONFIG_BT_CTLR_ADV_EXT=y
CONFIG_BT_CTLR_ADV_SET=2
CONFIG_BT_CTLR_DATA_LENGTH_MAX=251
CONFIG_BT_CTLR_PHY_2M=y
CONFIG_BT_CTLR_PHY_CODED=y
CONFIG_BT_CTLR_RX_BUFFERS=10
CONFIG_BT_CTLR_TX_PWR_DYNAMIC_CONTROL=y

CONFIG_BT_CTLR_SDC_PERIPHERAL_COUNT=1

Parents
  • Hello Steve,

    Thank you for your patience.

    The function bt_le_ext_adv_start() calls the bt_le_adv_set_enable_ext() function, which in turn calls the bt_hci_cmd_send_sync() function, and that is returning the -EOI.

    The function bt_hci_cmd_send_sync() is used for sending a HCI command synchronously. The function will block until a Command Status or a Command Complete event is returned. If either of these have a non-zero status, which happens to be the case you are experiencing, then the function returns a negative error code.


    As per the code (in hci_core.c), the function will return 0 if successful, or -ENOBUS (if buffer creation was unsuccessful), or -ECONNREFUSED (if status=0x09), or -EOI for all other values of the status. So yes, I agree with you that -EOI does not provide much information.


    However, the status which in your case is 0x0c means the HCI Error is that the Command is DisAllowed (BT_HCI_ERR_CMD_DISALLOWED). As per bluetooth core specification, the Command Disallowed error code indicates that the command requested cannot be executed because the Controller is in a state where it cannot process this command at this time.

    Regards, 
    Naeem

  • Thanks Naeem, that is encouraging as it matches what I could learn from reading the code.

    I'm not sure where to go from here, though. Why might the controller not be able to process the command? From my perspective I have a valid connection handle that I want to disconnect, so I'm not sure what state the controller might be in that would disallow this. In my experience retrying the command doesn't help - I continue to get the same error. I'm not sure what else to do, though. Is there a way for me to find out what the root cause might be? What would you recommend I do to continue investigating this?

  • Was there any resolution to this?

    I am running into a similar issue were I get a "BT_HCI_ERR_CMD_DISALLOWED" error, which propagates as a -EIO error when I attempt to start scanning.

    Once I get the error the stack appears to freeze, I cannot scan or advertise. We are using a nRF5340.

  • Hi MMater,

    What NCS version are you using? If you are using a version earlier than 2.5.0, then there is a known issue. You can find more details in this DevZone case:

     nRF5340 - bt_hci_cmd_send_sync hangs with error -11  

    Hieu

  • I think I resolved this by turning off ACL flow control with `CONFIG_BT_HCI_ACL_FLOW_CONTROL=n`. I believe there is an issue with the flow control that was worked around in 2.5.0, but if you are on 2.4.2 (like me) then the only fix was to disable flow control. I didn't see any negative side effects from turning it off either.

  • I attempted to disable flow control with no success.

    We are on 2.5.2 of the SDK

  • Hmm, not sure then. Only other thing I can recall that might be relevant is that EIO can come up if you 'overallocate' the radio. If you ask for more than 100% of the time to be spent scanning, for example asking for 80% duty cycle scan on 1M PHY and 50% duty cycle scan on coded PHY. That might be a different root error code though and it shouldn't show up intermittently.

Reply
  • Hmm, not sure then. Only other thing I can recall that might be relevant is that EIO can come up if you 'overallocate' the radio. If you ask for more than 100% of the time to be spent scanning, for example asking for 80% duty cycle scan on 1M PHY and 50% duty cycle scan on coded PHY. That might be a different root error code though and it shouldn't show up intermittently.

Children
  • Here is a snippet of the error log. Email me for the full file.

    [00:01:03.733,306] .[0m<dbg> bt_hci_driver: bt_rpmsg_evt_recv: len 33.[0m
    [00:01:03.733,367] .[0m<dbg> bt_hci_driver: bt_rpmsg_rx: Calling bt_recv(0x2001e4c8).[0m
    [00:01:03.733,367] .[0m<dbg> bt_hci_core: bt_recv: buf 0x2001e4c8 len 35.[0m
    [00:01:03.733,428] .[0m<dbg> bt_hci_core: rx_work_handler: Getting net_buf from queue.[0m
    [00:01:03.733,428] .[0m<dbg> bt_hci_core: rx_work_handler: buf 0x2001e4c8 type 1 len 35.[0m
    [00:01:03.733,459] .[0m<dbg> bt_hci_core: hci_event: event 0x3e.[0m
    [00:01:03.733,489] .[0m<dbg> bt_hci_core: hci_le_meta_event: subevent 0x0d.[0m
    [00:01:03.733,489] .[0m<dbg> bt_scan: bt_hci_le_adv_ext_report: Adv number of reports 1.[0m
    [00:01:03.733,612] .[0m<dbg> bt_scan: le_adv_recv: E9:D5:00:82:00:07 (public) event 0, len 7, rssi -91 dBm.[0m
    [00:01:03.733,703] .[0m<dbg> bt_hci_driver: bt_rpmsg_rx: RX buf payload:.[0m
    .[1;31m--- 60 messages dropped ---
    .[0m[00:01:03.898,803] .[0m<dbg> bt_scan: le_adv_recv: E1:81:EB:FD:FF:33 (random) event 3, len 8, rssi -96 dBm.[0m
    .[1;31m--- 26 messages dropped ---
    .[0m[00:01:03.905,517] .[0m<dbg> bt_hci_core: rx_work_handler: Getting net_buf from queue.[0m
    .[1;31m--- 42 messages dropped ---
    .[0m[00:01:03.934,997] .[0m<dbg> bt_hci_core: bt_recv: buf 0x2001e4c8 len 47.[0m
    .[1;31m--- 67 messages dropped ---
    .[0m[00:01:03.935,699] .[0m<dbg> bt_hci_driver: bt_rpmsg_rx: RX buf payload:.[0m
    [00:01:03.936,309] .[0m<dbg> bt_hci_driver: bt_rpmsg_rx: RPMsg data:
    04 3e 39 0d 01 10 00 01 c8 ca de d8 ae 27 01 00 |.>9..... .....'..
    ff 7f a4 00 00 ff 00 00 00 00 00 00 1f 1e ff 06 |........ ........
    00 01 09 20 22 93 f1 15 1f e7 da 81 d7 95 13 72 |... "... .......r
    6d 79 70 8b 9e 26 49 76 b8 4f 7f 26 |myp..&Iv .O.& .[0m
    [00:01:03.936,340] .[0m<dbg> bt_hci_driver: bt_rpmsg_evt_recv: len 57.[0m
    [00:01:03.936,370] .[0m<dbg> bt_hci_driver: bt_rpmsg_rx: Calling bt_recv(0x2001e4c8).[0m
    [00:01:03.936,401] .[0m<dbg> bt_hci_core: bt_recv: buf 0x2001e4c8 len 59.[0m
    [00:01:03.936,431] .[0m<dbg> bt_hci_core: rx_work_handler: Getting net_buf from queue.[0m
    .[1;31m--- 33 messages dropped ---
    .[0m[00:01:04.029,418] .[0m<dbg> bt_hci_core: bt_recv: buf 0x2001ef5c len 6.[0m
    [00:01:04.029,418] .[0m<dbg> bt_hci_core: hci_cmd_complete: opcode 0x2042.[0m
    [00:01:04.029,449] .[0m<dbg> bt_hci_core: hci_cmd_done: opcode 0x2042 status 0x0c buf 0x2001ef5c.[0m                                                                                                                                  ERROR HERE
    [00:01:04.029,510] .[0m<dbg> bt_hci_driver: bt_rpmsg_rx: RX buf payload:
    0c |. .[0m
    [00:01:04.029,541] .[1;33m<wrn> bt_hci_core: opcode 0x2042 status 0x0c.[0m
    .[1;31m--- 16 messages dropped ---
    .[0m[00:01:04.030,303] .[0m<dbg> bt_hci_core: bt_recv: buf 0x2001ef5c len 6.[0m
    .[1;31m--- 24 messages dropped ---
    .[0m[00:01:04.095,581] .[0m<dbg> bt_scan: le_adv_recv: E9:D5:00:82:00:07 (public) event 0, len 7, rssi -89 dBm.[0m
    .[1;31m--- 32 messages dropped ---
    .[0m[00:01:04.100,433] .[0m<dbg> bt_hci_driver: bt_rpmsg_rx: RX buf payload:.[0m
    .[1;31m--- 21 messages dropped ---
    .[0m[00:01:04.125,396] .[0m<dbg> bt_hci_driver: bt_rpmsg_rx: RX buf payload:.[0m
    .[1;31m--- 128 messages dropped ---
    .[0m[00:01:04.136,383] .[0m<dbg> bt_hci_core: hci_event: event 0x3e.[0m
    [00:01:04.136,413] .[0m<dbg> bt_hci_core: hci_le_meta_event: subevent 0x0d.[0m
    [00:01:04.136,413] .[0m<dbg> bt_scan: bt_hci_le_adv_ext_report: Adv number of reports 1.[0m
    [00:01:04.136,535] .[0m<dbg> bt_scan: le_adv_recv: 4C:2E:91:B8:A1:E3 (random) event 0, len 20, rssi -86 dBm.[0m
    [00:01:04.136,596] .[0m<dbg> bt_host_crypto: bt_encrypt_le: key affc5f8d5dd5bfe9374a42ee5b4a4c94.[0m
    .[1;31m--- 11 messages dropped ---
    .[0m[00:01:04.139,404] .[0m<dbg> bt_scan: bt_hci_le_adv_ext_report: Adv number of reports 1.[0m
    [00:01:04.139,526] .[0m<dbg> bt_scan: le_adv_recv: 35:12:DE:56:AB:35 (random) event 3, len 31, rssi -70 dBm.[0m
    [00:01:04.139,617] .[0m<dbg> bt_hci_driver: bt_rpmsg_rx: RX buf payload:.[0m
    [00:01:04.178,497] .[0m<dbg> bt_hci_core: bt_hci_cmd_create: opcode 0x2042 param_len 6.[0m
    [00:01:04.178,527] .[0m<dbg> bt_hci_core: bt_hci_cmd_create: buf 0x2001ef5c.[0m
    .[1;31m--- 14 messages dropped ---
    .[0m[00:01:04.179,138] .[0m<dbg> bt_hci_core: bt_recv: buf 0x2001ef5c len 6.[0m
    [00:01:04.179,168] .[0m<dbg> bt_hci_core: hci_cmd_complete: opcode 0x2042.[0m
    [00:01:04.179,199] .[0m<dbg> bt_hci_core: hci_cmd_done: opcode 0x2042 status 0x0c buf 0x2001ef5c.[0m
    [00:01:04.179,229] .[0m<dbg> bt_hci_driver: bt_rpmsg_rx: RX buf payload:
    0c |. .[0m

  • Hi MMater,

    My apology for the late follow up. We have been short-handed/overloaded for a few days due to various sicknesses and the Easter holiday.

    In your log, the issue is OpCode 0x2042 resulted in 0x0c. 0x2042 is the OpCode for LE Set Extended Scan Enable, and 0x0c stands for Command Disallowed.

    With the SoftDevice Controller, currently that happens in a few scenarios:

    1. Trying to enable scanner when it is already running
    2. Trying to disable scanner when initiator is running
    3. Trying to disable scanner when it is not running

    #1 and #3 are not forbidden as per the Bluetooth Core Specification, but is not allowed in the current SoftDevice Controller. We have noted this internally and will address it.

    Could you please check if your code is causing one of those cases? The fix would then to just avoid it.

    For info about what the OpCode and Status codes mean, refer to:  Log warning message: bt_hci_core: opcode 0x200a status 0x0d .

    Hieu

  • Hieu,

    Thanks for checking back. We believe the issue was having EXT_ADV enabled on the appcore and not on the netcore.

  • MMater,

    You will need it on both cores. I hope things work well now?

Related