Is the More Data (MD) bit not being set when it should be?

I am seeing cases where the More Data (MD) bit is not set in the LLID Start packet when it is followed by a LLID Continuation packet.  I can see this happen when the max packet data length (maxTxOctets and maxRxOctets) is 251 bytes and the ATT payload is greater than 251 bytes, requiring at least one packet of 251 bytes to be transmitted. I do not see this happen if I simply lower the maxOctets to 250 bytes.  Instead, at 250 bytes, I see a ATT payload excessively broken up into 250 bytes + 1 byte + the remaining bytes (for example, 18 remaining bytes).

For the data below:

  • NCS 2.0
  • Nordic Bluetooth controller
  • Custom board with nRF5340 acting as a peripheral/GATT server
  • PHY 1M
  • Connection Interval 30 msec.
  • MTU 500 bytes. 

 In the screenshot below of an X240 sniffer capture, we are interested in the peripheral transmission, Side 2, consisting of GATT Notifications. 

  • Frame 537 (highlighted) is an LLID Start of 251 bytes.  It’s MD bit is not set, as shown in the MD column.
  • Frame 539 is an LLID Continuation of 18 bytes, and concludes the upper layer transmission (in this case, a GATT Notification requiring 251+18 bytes).

Because frame 537 does not have the MD bit set, the central stops the connection event, causing frame 539 (LLID Continuation) to be deferred to the next connection interval, reducing throughput.  We typically see the central allow the connection event to continue if the MD bit is set at this duration into the connection event.

Question 1: Shouldn’t frame 537 have the MD bit set since it’s followed by an LLID Continuation?

On the other hand,

  • Frame 543 is another LLID Start of 251 bytes followed by an LLID Continuation.  In this case, MD bit is set, as expected, and both frames are able to be sent in the same connection interval.

 

With max data length set to 250 bytes (instead of 251), from spot checking, I think all LLID Starts that are followed by LLID Continuation do indeed have the MD bit set.  But I see peculiar behavior as shown below from another log:

  • Frame 4,060 (highlighted) is an LLID Start of 250 bytes.  It’s MD bit is set, but the central chose to end the connection event anyway here (a typical stopping point, timewise, for the central being used).
  • Frame 4,062 is an LLID Continuation of just 1 byte, followed by yet another LLID Continuation of 18 bytes in frame 4064. 

Question 2:  Why are frames 4062 and 4064 broken up? Isn’t this inefficient for throughput?

 

Parents
  • Hi,

    I need to check with the SoftDevice controller team about this and will get back to you

  • My preliminary finding is that I don't see those two items happening when I use the Zephyr BT controller:  I see the MD bit set when I expect it, and I don't see the 'break' up of the continuation.  I will continue to check, but it's not as easy to prove that something is not happening.

  • Question 2:  Why are frames 4062 and 4064 broken up? Isn’t this inefficient for throughput?

    We have managed to reproduce this now:

    This is the result of an unfortunate host/link-layer interaction. The Host uses le_read_buffer_size to determine how it splits the transfer across the HCI boundary. In this case the LL is configured to have a buffer size of 251 (CONFIG_BT_CTLR_DATA_LENGTH_MAX) so the host splits the data into 251 and 25 byte buffers in this example. The maximum data that the LL is allowed to transfer is only 250 bytes however so it transmits the first part of the first buffer (250 bytes) in the first event, followed by the last part of the first buffer (1 byte) and then finally the second buffer (25 bytes). The Link-Layer makes no attempt at optimizing this and does not copy data around in buffers (the RADIO peripheral also has no gather functionality so we cannot use the HW to merge this either).

    If we change the above CONFIG_BT_CTLR_DATA_LENGTH_MAX to 250 then we get the following which is probably what you expect:

  • Hi, and thanks for the reply.

    Here are some relevant network core config settings that we are using. 

    CONFIG_BT_BUF_ACL_TX_SIZE=251  (for both cores)
    CONFIG_BT_BUF_ACL_RX_SIZE=251
    CONFIG_BT_CTLR_DATA_LENGTH_MAX=251
    • Not surprisingly, these are the same settings as in your Throughput sample app.
    • I can provide the complete autoconf.h files for both cores, if necessary.  However, to do so, could I create a separate private ticket for that transfer?
    Ideally, we would just like to stay with 251 bytes. The runtime 250 byte data length case I showed was just to see what happens to the MD issue if we switch from 251 to 250.  But sure enough, it looks like 250 brings up a separate issue.
    -----
    The 250 issue, you said, was related to "The maximum data that the LL is allowed to transfer is only 250 bytes." I can see in the BT Spec that an LL Control PDU length has a max length of 250 bytes.  But I don't see any limit for an LL Data PDU.  Am I misunderstanding something here? 
    • That is, where does the 250 number that you mentioned come from, and why does that number apply to the sending of a GATT Notify?
    • Also, why does this break up not occur with the Zephyr BT controller?
    Regardless, I assume you would like me to continue to come up with a way for you to reproduce the missing MD bit, right?  That was the first question in this ticket.
  • Hi,

    variant said:
    I can provide the complete autoconf.h files for both cores, if necessary.  However, to do so, could I create a separate private ticket for that transfer?

    Yes, please make a private ticket and upload the configs there, and refer to this thread.

    variant said:
    I can see in the BT Spec that an LL Control PDU length has a max length of 250 bytes.  But I don't see any limit for an LL Data PDU.  Am I misunderstanding something here? 

    The spec says 251 ("The Payload shall be less than or equal to 251 octets in length." from page 2700 in the core spec version 5.3), but there seems to be a limit in the SoftDevice controller. The team is looking more into this now and I will update here when I have something.

    variant said:
    Regardless, I assume you would like me to continue to come up with a way for you to reproduce the missing MD bit, right?  That was the first question in this ticket.

    Yes, that is right. It would be good to be able to reproduce here so that we can understand more about the issue.

  • I uploaded the conf files and two X240 logs in Case ID: 296806.  It's still not clear to me what triggers the case where MD is not set in an LLID Start which is followed by an LLID Continuation.

    As an aside: On page 2701 of core spec v5.3, the LL Control PDU explicitly shows a max length of 250 bytes.  In the section immediately above it, the LL Data PDU section shows no max length.  Instead, the max length for LL Data PDU is covered by the general case you pointed to on page 2700.  Having the explicit length in one spot in the spec but not the other led me to ask the embarrassing question where the 251 came from.  :)

  • Regarding the missing MD, it looks like this is because the controller (LL) doe snot get data fast enough from the host.

    This is made more configurable in NCS 2.1.0, which you can see from this commit. So other than migrating to 2.1.0, you could consider cherry picking this and setting higher values for BT_CTLR_SDC_TX_PACKET_COUNT and BT_CTLR_SDC_RX_PACKET_COUNT.

Reply Children
  • I will certainly try pulling in this change and increasing two packet counts, but I have concerns in the wording in the Kconfig description:

    The number Link Layer ACL TX packets reserved per connection.
    With the default count, the application is able to refill the buffers during
    a connection event. That is, non-default values should only be used if
    reduced throughput is accepted, or when the CPU utilization is so high
    that the application is not able to provide data fast enough during
    connection events.

    The use of the phrase "the application" seems inappropriate here, as the application just provides the GATT Notify data, and does not deal with the LL data packets. Your reply used the term "host", which makes more sense. Then there is the warning about reduced throughput.  I hope an increase in the BT_CTLR_SDC_TX_PACKET_COUNT does not reduce the throughput, as we are trying to increase the throughput.

    It seems plausible that an increase in the number of could help.  However,

    1. What things do we need to worry about as we increase this number (aside from the obvious increase in memory use)?
    2. Does any relationship need to be maintained between CONFIG_BT_CTLR_SDC_TX_PACKET_COUNT and CONFIG_BT_BUF_ACL_TX_COUNT?
    3. What do I have to loose (other than memory) by just setting the  CONFIG_BT_CTLR_SDC_TX_PACKET_COUNT to max 20?
  • Hi,

    variant said:
    here is the warning about reduced throughput.  I hope an increase in the BT_CTLR_SDC_TX_PACKET_COUNT does not reduce the throughput, as we are trying to increase the throughput.

    The warning applies only if you reduce the number of buffers, not increase them, and that is what these configs were originally intended for.  Due to IPC there is additional overhead before data comes from the host layer (on the app core) to the link layer (on the net core), which is why additional buffers are useful in this case.

    variant said:
    What things do we need to worry about as we increase this number (aside from the obvious increase in memory use)?

    The only down-side of increasing the buffers is the increased memory consumption.

    variant said:
    Does any relationship need to be maintained between CONFIG_BT_CTLR_SDC_TX_PACKET_COUNT and CONFIG_BT_BUF_ACL_TX_COUNT?

    There is no need to maintain any ratio or relationship between CONFIG_BT_CTLR_SDC_TX_PACKET_COUNT and CONFIG_BT_BUF_ACL_TX_COUNT. Also, the latter is only used by the Zephyr LL. (Generally, the difference between BT_BUF_ACL_TX_COUNT and BT_CTLR_SDC_TX_PACKET_COUNT is that the latter is per connection while the former is shared. This does not matter for a single connection, though.)

    variant said:
    What do I have to loose (other than memory) by just setting the  CONFIG_BT_CTLR_SDC_TX_PACKET_COUNT to max 20?

    Nothing.

  • I finally got back to this subject.

    By increasing CONFIG_BT_CTLR_SDC_TX_PACKET_COUNT to be 9 (the default was 3), I was able to get to the point where I could no longer detect cases where More Data was 0 when it could have been 1.

    In my case, I was just worried about Notifies, so the value of BT_CTLR_SDC_RX_PACKET_COUNT didn't matter.  Though, i went ahead and doubled that count from 2 to 4 just for more margin.

    In both cases, I did notice an increase in RAM use for the network core.

Related