High Failure Rate in START ENCRYPTION Sequence: 2 Failures per 10 Trials (Baseband or Link Layer Issue)

Hello Nordic Semiconductor tech support team,

HW : nRF52840 ( Build code : nRF52840-CKAA-F )

SW : ncs 2.6.0

Sample Application :  zephyr/samples/bluetooth/hci_spi sample with SoftDevice Controller


While verifying our software design, we frequently encounter connection errors. Could you help us identify the possible cause of this issue and suggest a resolution? Right after the enhanced connection is completed, we’re expecting an Encryption Change Event from the SoftDevice Controller. However, instead of an Encryption Change Event, a Disconnect Complete (0x05) occurs with the reason ‘Connection Terminated due to MIC Failure (0x3d).

Device A : Android Mobile Phone (central), (iOS case is also reported)

Device B: nRF52840 (Peripheral) 

Frame 17605: 7 bytes on wire (56 bits), 7 bytes captured (56 bits) on interface Fake IF, Import from Hex Dump, id 0 (inbound)
Bluetooth
Bluetooth HCI H4
Bluetooth HCI Event - Disconnect Complete
Event Code: Disconnect Complete (0x05)
Parameter Total Length: 4
Status: Success (0x00)
Connection Handle: 0x00cd
Reason: Connection Terminated due to MIC Failure (0x3d)

(highlighted marked HCI events)

Parents
  • Hello,

    I get slight impression that there is some marginally timing here or that the wrong keys ares provided. 

    Do you see the same if you relax the SPI clock speed, and in specific if you relax (e.g. >10us for test) for instance the timing between slave select and first clock pulse.

    Kenneth

  • Hello Kenneth, 

    Thank you for your prompt response and update. Based on the HCI log from device B (nRF52840 SoftDevice controller), I have ruled out one possible cause: the wrong LTK case. During this test, we consistently used the same LTK.

    Could you please investigate this case from another perspective? Specifically, could you identify the scenario in which the SoftDevice Controller sends out the (0x3d) reason during a Disconnection Complete event, in collaboration with your BLE Core development team?

    According to our logs, the error occurred when the connection began encrypting immediately after it was created, rather than during heavy traffic to/from the SPI bus.

    Thanks,

    Charles

  • Hi Charles,

    Can you also provide an on-air sniffer log?

    Also, if you have a sniffer, it would be helpful if you enable CONFIG_BT_LOG_SNIFFER_INFO option to print LTK key. It is needed to have ability to decrypt encrypted traffic.

    One of the reason to disconnect during the Encryption start procedure is receiving a Data Physical Channel PDU. It is done according to the following Core Spec requirement:

    If, at any time during the encryption start procedure after the Peripheral has received
    the LL_ENC_REQ PDU or the Central has received the LL_ENC_RSP PDU, the Link
    Layer of the Central or the Peripheral receives an unexpected Data Physical Channel
    PDU from the peer Link Layer, it shall immediately exit the Connection state, and
    shall transition to the Standby state. The Host shall be notified that the link has been
    disconnected with the error code Connection Terminated Due to MIC Failure (0x3D).

    So, for instance, if the BLE softdevice controller receives a non-empty data PDU during the Encryption start procedure, SDC will disconnect with the 0x3D error code. It is difficult to say if it is a right explanation of this particular case without having a sniffer trace, but it is one of the possible explanations.

    Kenneth

Reply
  • Hi Charles,

    Can you also provide an on-air sniffer log?

    Also, if you have a sniffer, it would be helpful if you enable CONFIG_BT_LOG_SNIFFER_INFO option to print LTK key. It is needed to have ability to decrypt encrypted traffic.

    One of the reason to disconnect during the Encryption start procedure is receiving a Data Physical Channel PDU. It is done according to the following Core Spec requirement:

    If, at any time during the encryption start procedure after the Peripheral has received
    the LL_ENC_REQ PDU or the Central has received the LL_ENC_RSP PDU, the Link
    Layer of the Central or the Peripheral receives an unexpected Data Physical Channel
    PDU from the peer Link Layer, it shall immediately exit the Connection state, and
    shall transition to the Standby state. The Host shall be notified that the link has been
    disconnected with the error code Connection Terminated Due to MIC Failure (0x3D).

    So, for instance, if the BLE softdevice controller receives a non-empty data PDU during the Encryption start procedure, SDC will disconnect with the 0x3D error code. It is difficult to say if it is a right explanation of this particular case without having a sniffer trace, but it is one of the possible explanations.

    Kenneth

Children
  • Hello Kenneth, 

    Thank you for your valuable input. This case was reported by our integration test team. To verify it from my side, I tried to replicate the issue with other Android phones (Galaxy A12, A13, and Pixel 6). Up until now, I haven’t been able to replicate this case on these three phone models.

    So, I tried to get more information about the central device, which is an Android SS Galaxy A15, and it has the remote device information below. At the same time, I’m trying to get an A15 phone for myself. Once I get the A15 phone and the sniffer logs, I will share them with you.

    Bluetooth HCI Event - Read Remote Version Information Complete
        Event Code: Read Remote Version Information Complete (0x0c)
        Parameter Total Length: 8
        Status: Success (0x00)
        Connection Handle: 0x0004
        LMP Version: 5.3 (0x0c)
    
        Manufacturer Name: MediaTek, Inc. (0x0046)
    
        LMP Subversion: 0

    In the meantime, could you help me get the answers below?

    1) According to my code lookup, the NCS code CONFIG_BT_LOG_SNIFFER_INFO is only applicable to the Zephyr host configuration. Could you check if CONFIG_BT_LOG_SNIFFER_INFO works for the Zephyr Controller-only build as well (zephyr/samples/bluetooth/hci_spi)??

    2) Could you please ask your Application Engineer team to query if the nRF52840 SoftDevice Controller has a known issue with LMP Version: 5.3 (0x0c), Manufacturer Name: MediaTek, Inc. (0x0046), and LMP Subversion: 0 while starting encryption?

    Thanks,

    Charles 

  • Hello again,

    1. Your understanding is correct, CONFIG_BT_LOG_SNIFFER_INFO is only applicable to the Zephyr host. So you need to share the LTK some other way then.

    2. We didn't find any similar issues mentioned before.

    Please share sniffer log when you are able to replicate the issue.

    Kenneth

  • Hello Kenneth,

    Thank you for your patience. I received the sniffer log from the test team. This test case involves around 50 reconnection tests. I observed that 13 of them failed with the same pattern.

    	15,645		2	0xbc85f063	0x000a	LL_START_ENC_REQ		43	 00:00:00.097501125	8/16/2024 8:25:22.270591865 AM	
    	18,487		2	0xda25e476	0x000c	LL_START_ENC_REQ		43	 00:00:00.195464125	8/16/2024 8:26:24.134999740 AM	
    	25,526		2	0x30b62793	0x0010	LL_START_ENC_REQ		43	 00:00:00.097501000	8/16/2024 8:29:22.513598365 AM	
    	35,393		2	0x4b0f9672	0x0010	LL_START_ENC_REQ		43	 00:00:00.097500250	8/16/2024 8:33:28.167792615 AM	
    	38,324		2	0x545d9ae1	0x0012	LL_START_ENC_REQ		43	 00:00:00.097503000	8/16/2024 8:34:39.343057990 AM	
    	44,843		2	0x11572d49	0x0014	LL_START_ENC_REQ		43	 00:00:00.146252500	8/16/2024 8:37:23.194317365 AM	
    	47,776		2	0x5f123729	0x0010	LL_START_ENC_REQ		43	 00:00:00.146251625	8/16/2024 8:38:44.314594615 AM	
    	54,035		2	0x7b38cd16	0x000f	LL_START_ENC_REQ		43	 00:00:00.195000750	8/16/2024 8:41:23.046934865 AM	
    	67,019		2	0x4a9b76df	0x000f	LL_START_ENC_REQ		43	 00:00:00.097502000	8/16/2024 8:46:33.977749740 AM	
    	151,397		2	0x69835adb	0x0010	LL_START_ENC_REQ		43	 00:00:00.195001625	8/16/2024 9:22:41.194609865 AM	
    	154,291		2	0xcfa5441f	0x000e	LL_START_ENC_REQ		43	 00:00:00.146251375	8/16/2024 9:23:47.153576490 AM	
    	183,910		2	0x9825d1c4	0x000f	LL_START_ENC_REQ		43	 00:00:00.097501125	8/16/2024 9:35:43.885001240 AM	
    	190,851		2	0x987d7cd5	0x0009	LL_START_ENC_REQ		43	 00:00:00.097501250	8/16/2024 9:38:40.265265615 AM	

    According to the sniffer logs, somehow the NRF52840 does not understand LL_START_ENC_RSP. For example, in the first failure, the LL of Device A (Mediatek, Inc., subversion: 0x0000) keeps sending LL_START_ENC_RSP, but there is no LL_START_ENC_RSP response back. Most likely, the NRF52840 determined that the Message Integrity Check (MIC) failed on a received LL_START_ENC_RSP packet.

    Could you verify if the encrypted data is valid and share how we can verify if the encrypted MIC: 0x04ec92c6 is correct or not?

    LTK  : A1 B8 AB 72 D9 4E D7 AD 32 39 23 CF 68 B9 B5 03 (0x03B5B968CF233932ADD74ED972ABB8A1)

    To read the attached sniffer logs, I had to install the wps4.00_24.6.34658.34822.exe version. Please find the Pump Reconnection-withLTK.zip file. The unzip password is: devzone114123.

    Pump Reconnection-withLTK.zip

    Thanks,

    Charles

  • Hi Charles,

    I have forwarded the details internally. Will let you know when I learn more or they need further details.

    Kenneth

  • Hello again,

    The team have taken a look, and this indeed look like an issue in the softdevice controller, the team will start working on a fix and will target to get this into v2.8.0 that is scheduled end of next month.

    Sorry for this issue.

    Kenneth

Related