Supervision timeout does not expire during 120hz data streaming

Hi,

I am using an two nrf5340 with hci_ipc enabled one acting as central and one as peripheral built on v2.9.0

The issue I am having is that the supervision timeout doesn't expire when I hard reset the peripheral whilst sending data (len 91) via notifications at 120hz .

My current fix is to set CONFIG_BT_ATT_LOG_LEVEL_DBG=y which then doesn't miss a supervision timeout event. This doesn't feel like a good long term solution to fixing whatever timing issue this is. Please could you help me.

Problem Flow from Central perspective:

Scan and connect
Characteristic and CCC descriptor discovery
Enable Notifications
Central MTU Updated TX: 247 RX: 247 bytes
PHY updated: TX PHY 2M, RX PHY 2M
Connection Params:interval 7.50 ms, latency 1, timeout 100 ms
start stream data over notifications at 120hz
wait 3 seconds
Hit reset on peripheral
Central Hangs indefinitely with no supervision timeout

Parents

0 chris.c 4 months ago

If I use CONFIG_LOG_MODE_MINIMAL it all works fine however if I use CONFIG_LOG_MODE_DEFERRED I can see that messages are being dropped:

[00:00:43.546,661] <dbg> bt_att: bt_att_recv: Received ATT chan 0x2000f1d8 code 0x1b len 91
--- 2 messages dropped ---
[00:00:43.554,168] <dbg> bt_att: bt_att_recv: Received ATT chan 0x2000f1d8 code 0x1b len 91
--- 3 messages dropped ---
[00:00:43.569,213] <dbg> bt_att: att_notify: chan 0x2000f1d8 handle 0x0012
--- 2 messages dropped ---
[00:00:43.586,059] <dbg> bt_att: att_notify: chan 0x2000f1d8 handle 0x0012
--- 3 messages dropped ---
[00:00:43.599,151] <dbg> bt_att: att_notify: chan 0x2000f1d8 handle 0x0012
--- 3 messages dropped ---
[00:00:43.666,656] <dbg> bt_att: bt_att_recv: Received ATT chan 0x2000f1d8 code 0x1b len 91
--- 1 messages dropped ---
[00:00:43.689,178] <dbg> bt_att: bt_att_recv: Received ATT chan 0x2000f1d8 code 0x1b len 91
--- 5 messages dropped ---

Why exactly does using the minimal log stop messages from being dropped?

0 Einar Thorsrud 4 months ago in reply to chris.c

Hi,

The issue I am having is that the supervision timeout doesn't expire when I hard reset the peripheral whilst sending data (len 91) via notifications at 120hz .

If the peripheral disconnects after a reset, you should always get a supervision timeout. If yo don't see it, perhaps it is related to the logging issue you asked about? Or that your code does not handle it or something else happens that masks it? Have you done any debugging to learn the state of your application after the supervision timeout shoudl have occured?

chris.c said:
Why exactly does using the minimal log stop messages from being dropped?

With deferred logging, logging is not procesed in place, but deferred to a low priority thread. If you log a lot (as with BT debug loggign enabled), you will quickly fill the buffer before the logs are processed when using deferred logging. Does it help to set CONFIG_LOG_BUFFER_SIZE=8192 (or another high value) in your prj.conf?
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 chris.c 4 months ago in reply to Einar Thorsrud

Would this explain why I’m not seeing supervision timeout though?
Or why slowing down the process down by enabling log minimal + att debug would enable it to work correctly?

I’m also still confused why I don’t see supervision timeout but you do!
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Einar Thorsrud 4 months ago in reply to chris.c

Yese, it is odd that I never reproduce the same behaviour on my end. But if there are memory corruption issues (which is just a hunch for now, but I have no better suggestion at the moment), it can manifest itself in many different ways. And even if buildign teh same project but with different toolchain or platform, you will typically see different behaviours (if for instance there is an array or buffer being written out of bonds, and whatever is next to it in RAM chagne, the behaviour will change).
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 chris.c 4 months ago in reply to Einar Thorsrud

I’m not really sure where to go from here with it. Is this something you are still going to look at?

I know you don’t see the same issue as me but what part of streaming the notifications at a high data rate would break the disconnect callback ?
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Einar Thorsrud 4 months ago in reply to chris.c

Hi,

Yes, I can continue to look into this, but I suggest that you first go over the code to make sure you have controll over array boundaries etc, and other potential sources for memory corruption issues. This is time consuming but requiered in this case.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 chris.c 4 months ago in reply to Einar Thorsrud

Hi,

I’m currently going through the code and nothing has helped so far but will continue looking.

sorry for maybe a stupid question but is the network core not entirely separate and it handles the disconnect callbacks? How could I corrupt the memory on the network core when all my arrays are on the application core?
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Reply

0 chris.c 4 months ago in reply to Einar Thorsrud

Hi,

I’m currently going through the code and nothing has helped so far but will continue looking.

sorry for maybe a stupid question but is the network core not entirely separate and it handles the disconnect callbacks? How could I corrupt the memory on the network core when all my arrays are on the application core?
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Children

0 Einar Thorsrud 3 months ago in reply to chris.c

Hi,

Yes, the network core is a separate CPU with it's own RAM (though comunication with the application core happend via a shared RAM buffer). That you do not see the callback on your applicaiton core application does not mean that the issue has to be on the network core, though. And I saw various other issues on the applicaiton core side when I tested your application.

I do not have hard proof that this is caused by a memory corruption issue, it is just that the observed behaviour (various dififcult to explain crashes, different behaviours when building on different machines and with seemingly unrelated changes), point in that direction.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 chris.c 3 months ago in reply to Einar Thorsrud

I've rebuilt it from the example Bluetooth LE Central and Peripheral HRS ''and I'm still able to break the supervision timeout.

what are the exact prj and ipc_radio configs that I need to increase if it wanted to stream data at the full 2Mbps limit?
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Einar Thorsrud 3 months ago in reply to chris.c

Hi,

You can refer to the througput sample configuration files for that. You have the prj.conf for the app, and this prj.conf for ipc_radio.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel