Is the More Data (MD) bit not being set when it should be?

I am seeing cases where the More Data (MD) bit is not set in the LLID Start packet when it is followed by a LLID Continuation packet.  I can see this happen when the max packet data length (maxTxOctets and maxRxOctets) is 251 bytes and the ATT payload is greater than 251 bytes, requiring at least one packet of 251 bytes to be transmitted. I do not see this happen if I simply lower the maxOctets to 250 bytes.  Instead, at 250 bytes, I see a ATT payload excessively broken up into 250 bytes + 1 byte + the remaining bytes (for example, 18 remaining bytes).

For the data below:

  • NCS 2.0
  • Nordic Bluetooth controller
  • Custom board with nRF5340 acting as a peripheral/GATT server
  • PHY 1M
  • Connection Interval 30 msec.
  • MTU 500 bytes. 

 In the screenshot below of an X240 sniffer capture, we are interested in the peripheral transmission, Side 2, consisting of GATT Notifications. 

  • Frame 537 (highlighted) is an LLID Start of 251 bytes.  It’s MD bit is not set, as shown in the MD column.
  • Frame 539 is an LLID Continuation of 18 bytes, and concludes the upper layer transmission (in this case, a GATT Notification requiring 251+18 bytes).

Because frame 537 does not have the MD bit set, the central stops the connection event, causing frame 539 (LLID Continuation) to be deferred to the next connection interval, reducing throughput.  We typically see the central allow the connection event to continue if the MD bit is set at this duration into the connection event.

Question 1: Shouldn’t frame 537 have the MD bit set since it’s followed by an LLID Continuation?

On the other hand,

  • Frame 543 is another LLID Start of 251 bytes followed by an LLID Continuation.  In this case, MD bit is set, as expected, and both frames are able to be sent in the same connection interval.

 

With max data length set to 250 bytes (instead of 251), from spot checking, I think all LLID Starts that are followed by LLID Continuation do indeed have the MD bit set.  But I see peculiar behavior as shown below from another log:

  • Frame 4,060 (highlighted) is an LLID Start of 250 bytes.  It’s MD bit is set, but the central chose to end the connection event anyway here (a typical stopping point, timewise, for the central being used).
  • Frame 4,062 is an LLID Continuation of just 1 byte, followed by yet another LLID Continuation of 18 bytes in frame 4064. 

Question 2:  Why are frames 4062 and 4064 broken up? Isn’t this inefficient for throughput?

 

  • Hi,

    variant said:
    I can provide the complete autoconf.h files for both cores, if necessary.  However, to do so, could I create a separate private ticket for that transfer?

    Yes, please make a private ticket and upload the configs there, and refer to this thread.

    variant said:
    I can see in the BT Spec that an LL Control PDU length has a max length of 250 bytes.  But I don't see any limit for an LL Data PDU.  Am I misunderstanding something here? 

    The spec says 251 ("The Payload shall be less than or equal to 251 octets in length." from page 2700 in the core spec version 5.3), but there seems to be a limit in the SoftDevice controller. The team is looking more into this now and I will update here when I have something.

    variant said:
    Regardless, I assume you would like me to continue to come up with a way for you to reproduce the missing MD bit, right?  That was the first question in this ticket.

    Yes, that is right. It would be good to be able to reproduce here so that we can understand more about the issue.

  • I uploaded the conf files and two X240 logs in Case ID: 296806.  It's still not clear to me what triggers the case where MD is not set in an LLID Start which is followed by an LLID Continuation.

    As an aside: On page 2701 of core spec v5.3, the LL Control PDU explicitly shows a max length of 250 bytes.  In the section immediately above it, the LL Data PDU section shows no max length.  Instead, the max length for LL Data PDU is covered by the general case you pointed to on page 2700.  Having the explicit length in one spot in the spec but not the other led me to ask the embarrassing question where the 251 came from.  :)

  • Regarding the missing MD, it looks like this is because the controller (LL) doe snot get data fast enough from the host.

    This is made more configurable in NCS 2.1.0, which you can see from this commit. So other than migrating to 2.1.0, you could consider cherry picking this and setting higher values for BT_CTLR_SDC_TX_PACKET_COUNT and BT_CTLR_SDC_RX_PACKET_COUNT.

  • I will certainly try pulling in this change and increasing two packet counts, but I have concerns in the wording in the Kconfig description:

    The number Link Layer ACL TX packets reserved per connection.
    With the default count, the application is able to refill the buffers during
    a connection event. That is, non-default values should only be used if
    reduced throughput is accepted, or when the CPU utilization is so high
    that the application is not able to provide data fast enough during
    connection events.

    The use of the phrase "the application" seems inappropriate here, as the application just provides the GATT Notify data, and does not deal with the LL data packets. Your reply used the term "host", which makes more sense. Then there is the warning about reduced throughput.  I hope an increase in the BT_CTLR_SDC_TX_PACKET_COUNT does not reduce the throughput, as we are trying to increase the throughput.

    It seems plausible that an increase in the number of could help.  However,

    1. What things do we need to worry about as we increase this number (aside from the obvious increase in memory use)?
    2. Does any relationship need to be maintained between CONFIG_BT_CTLR_SDC_TX_PACKET_COUNT and CONFIG_BT_BUF_ACL_TX_COUNT?
    3. What do I have to loose (other than memory) by just setting the  CONFIG_BT_CTLR_SDC_TX_PACKET_COUNT to max 20?
  • Hi,

    variant said:
    here is the warning about reduced throughput.  I hope an increase in the BT_CTLR_SDC_TX_PACKET_COUNT does not reduce the throughput, as we are trying to increase the throughput.

    The warning applies only if you reduce the number of buffers, not increase them, and that is what these configs were originally intended for.  Due to IPC there is additional overhead before data comes from the host layer (on the app core) to the link layer (on the net core), which is why additional buffers are useful in this case.

    variant said:
    What things do we need to worry about as we increase this number (aside from the obvious increase in memory use)?

    The only down-side of increasing the buffers is the increased memory consumption.

    variant said:
    Does any relationship need to be maintained between CONFIG_BT_CTLR_SDC_TX_PACKET_COUNT and CONFIG_BT_BUF_ACL_TX_COUNT?

    There is no need to maintain any ratio or relationship between CONFIG_BT_CTLR_SDC_TX_PACKET_COUNT and CONFIG_BT_BUF_ACL_TX_COUNT. Also, the latter is only used by the Zephyr LL. (Generally, the difference between BT_BUF_ACL_TX_COUNT and BT_CTLR_SDC_TX_PACKET_COUNT is that the latter is per connection while the former is shared. This does not matter for a single connection, though.)

    variant said:
    What do I have to loose (other than memory) by just setting the  CONFIG_BT_CTLR_SDC_TX_PACKET_COUNT to max 20?

    Nothing.

Related