Supervision timeout does not expire during 120hz data streaming

Hi,

I am using an two nrf5340 with hci_ipc enabled one acting as central and one as peripheral built on v2.9.0

The issue I am having is that the supervision timeout doesn't expire when I hard reset the peripheral whilst sending data (len 91) via notifications at 120hz .

My current fix is to set CONFIG_BT_ATT_LOG_LEVEL_DBG=y which then doesn't miss a supervision timeout event. This doesn't feel like a good long term solution to fixing whatever timing issue this is. Please could you help me.  

Problem Flow from Central perspective:

  1. Scan and connect
  2. Characteristic and CCC descriptor discovery
  3. Enable Notifications
  4. Central MTU Updated TX: 247 RX: 247 bytes
  5. PHY updated: TX PHY 2M, RX PHY 2M
  6. Connection Params:interval 7.50 ms, latency 1, timeout 100 ms
  7. start stream data over notifications at 120hz
  8. wait 3 seconds
  9. Hit reset on peripheral 
  10. Central Hangs indefinitely with no supervision timeout 
Parents
  • If I use CONFIG_LOG_MODE_MINIMAL it all works fine however if I use CONFIG_LOG_MODE_DEFERRED I can see that messages are being dropped:

    [00:00:43.546,661] <dbg> bt_att: bt_att_recv: Received ATT chan 0x2000f1d8 code 0x1b len 91
    --- 2 messages dropped ---
    [00:00:43.554,168] <dbg> bt_att: bt_att_recv: Received ATT chan 0x2000f1d8 code 0x1b len 91
    --- 3 messages dropped ---
    [00:00:43.569,213] <dbg> bt_att: att_notify: chan 0x2000f1d8 handle 0x0012
    --- 2 messages dropped ---
    [00:00:43.586,059] <dbg> bt_att: att_notify: chan 0x2000f1d8 handle 0x0012
    --- 3 messages dropped ---
    [00:00:43.599,151] <dbg> bt_att: att_notify: chan 0x2000f1d8 handle 0x0012
    --- 3 messages dropped ---
    [00:00:43.666,656] <dbg> bt_att: bt_att_recv: Received ATT chan 0x2000f1d8 code 0x1b len 91
    --- 1 messages dropped ---
    [00:00:43.689,178] <dbg> bt_att: bt_att_recv: Received ATT chan 0x2000f1d8 code 0x1b len 91
    --- 5 messages dropped ---



    Why exactly does using the minimal log stop messages from being dropped? 

  • Hi,

    The issue I am having is that the supervision timeout doesn't expire when I hard reset the peripheral whilst sending data (len 91) via notifications at 120hz .

    If the peripheral disconnects after a reset, you should always get a supervision timeout. If yo don't see it, perhaps it is related to the logging issue you asked about? Or that your code does not handle it or something else happens that masks it? Have you done any debugging to learn the state of your application after the supervision timeout shoudl have occured?

    chris.c said:
    Why exactly does using the minimal log stop messages from being dropped? 

    With deferred logging, logging is not procesed in place, but deferred to a low priority thread. If you log a lot (as with BT debug loggign enabled), you will quickly fill the buffer before the logs are processed when using deferred logging. Does it help to set CONFIG_LOG_BUFFER_SIZE=8192 (or another high value) in your prj.conf?

  • It's started intermittently working on my end now too 

    any chance you could run it few more time to make sure 

  • Hi,

    I tested some more times now, but still don't see the issue with missing disconnected event (I do hovere sometimes see a fault in BT RX WQ thread after connection which I also saw yesterday.

    [00:01:57.255,279] <err> os: ***** BUS FAULT *****
    [00:01:57.255,279] <err> os:   Precise data bus error
    [00:01:57.255,279] <err> os:   BFAR Address: 0x0
    [00:01:57.255,310] <err> os: r0/a1:  0x00000000  r1/a2:  0x00000014  r2/a3:  0x00000013
    [00:01:57.255,310] <err> os: r3/a4:  0x01010002 r12/ip:  0x0000000e r14/lr:  0x00036bd9
    [00:01:57.255,340] <err> os:  xpsr:  0x21000000
    [00:01:57.255,340] <err> os: s[ 0]:  0x00000020  s[ 1]:  0x00041471  s[ 2]:  0x20009420  s[ 3]:  0x0004159d
    [00:01:57.255,371] <err> os: s[ 4]:  0x00000001  s[ 5]:  0x00036993  s[ 6]:  0x00000000  s[ 7]:  0x0002dbcd
    [00:01:57.255,371] <err> os: s[ 8]:  0x0002db75  s[ 9]:  0x00000000  s[10]:  0x000410f7  s[11]:  0x2000e1f0
    [00:01:57.255,371] <err> os: s[12]:  0x00000000  s[13]:  0x0002fcad  s[14]:  0x200120e8  s[15]:  0x0003f683
    [00:01:57.255,401] <err> os: fpscr:  0x0000002a
    [00:01:57.255,401] <err> os: Faulting instruction address (r15/pc): 0x000127e6
    [00:01:57.255,432] <err> os: >>> ZEPHYR FATAL ERROR 25: Unknown error on CPU 0
    [00:01:57.255,462] <err> os: Current thread: 0x2000bb68 (BT RX WQ)
    [00:01:57.639,190] <err> os: Halting system

    With addr2line I see it is in Bluetooth_Central/src/bluetooth.c:674, your notify_func().

    This is a separate issue from what you are describing though, but maybe it could be related to some of the many log related warnings you get when building the project, so I suggest cleaning up the logging and making sure the project builds without warnings.

  • Hi Einar,

    I have removed all the build errors and pushed to the git repo.

    I now don't even get it to intermittently work!

    I have also never seen that fault occurs after connection which is strange

    I do get these 3 problems:

  • Hi,

    I see the same behaviour as before when I check out your updated code. Typically I am able to connect, start streaming and reset the peripheral and re-connect 3 times. After that I get a fault. Interestingly, if I do changes the fault moves around (as seen with addr2line on the faulting address). This erratic behaviour indicate a memory corruption issue to me (perhaps writing to invalid memory by corrupt pointers, writing out of bound to arrays, or similar), but I have not found the root cause of it.

  • Would this explain why I’m not seeing supervision timeout though? 
    Or why slowing down the process down by enabling log minimal + att debug would enable it to work correctly? 

    I’m also still confused why I don’t see supervision timeout but you do! 

Reply Children
  • Yese, it is odd that I never reproduce the same behaviour on my end. But if there are memory corruption issues (which is just a hunch for now, but I have no better suggestion at the moment), it can manifest itself in many different ways. And even if buildign teh same project but with different toolchain or platform, you will typically see different behaviours (if for instance there is an array or buffer being written out of bonds, and whatever is next to it in RAM chagne, the behaviour will change).

Related