Optimizing BLE-MIDI with regards to timing

Optimizing BLE-MIDI with regards to timing

MIDI is a well-known protocol used by musical instruments to communicate. A typical modern application is connecting a musical keyboard to a DAW, letting you record directly into software and edit your notes. MIDI devices do not transmit sound, but rather small commands telling the receiver for example that a key has been pressed or released or that a controller knob has been turned.  

MIDI can be transmitted over multiple transports, the most common today being USB. Originally it was designed for serial connections over 5 pin DIN-connectors. MIDI is also used virtually within, and between, software applications. Bluetooth Low Energy is an emerging transport for MIDI, and there are already several products on the market using the relatively new BLE-MIDI specification. The wireless nature of Bluetooth LE opens a lot of new possibilities for musical instruments as wearable and portable devices. 

There are good resources available online for setting up a BLE-MIDI service. However, we have yet to come across any good resources on how the actual data should be handled or how to deal with the timestamps to reduce jitter and optimize for low latency. In this student project we have made the sample described in this blogpost as an attempt to set up a fully functional serial/BLE MIDI converter. It is by no means a perfect solution yet, but we hope it can be helpful. 

Resources

The sample code can be found in here:

https://github.com/BLE-MIDI/NCS-MIDI

Resources we have found to be useful: 

Useful tool for BLE-MIDI development:

WebMIDI:

In addition, we developed a web application that prints incoming BLE-MIDI messages. This was primarily made in order to learn more about webMIDI, and was designed to be used for demonstration at a Maker Faire, but it is also a simple and useful tool for debugging that requires no additional applications.

Overview

The samples are serial MIDI/BLE-MIDI converters. They take a normal serial MIDI stream (via UART), parses the data into separate MIDI messages, and attaches a timestamp to each message. The messages are then encoded into BLE-MIDI packets before they are transmitted. Received BLE-MIDI data are decoded, parsed, and the attached timestamps are interpreted before the messages are transmitted as a serial MIDI stream again. 

The samples consists of a central and peripheral with the BLE-MIDI service and I/O characteristic as described in the BLE-MIDI spec. We will not go in depth about how to set up the service as it has been described before. Our sample is bidirectional, and the peripheral and central does not differ in how the MIDI data is handled. 

Parser

The purpose of the parser is to analyze a stream of MIDI data, and output separated messages. The parser can be found in the lib folder. The parser analyzes MIDI data received both in BLE-MIDI packets and as a serial MIDI stream. The size of the messages varies mainly between 1 and 3 bytes, except for System Exclusive messages.

Channel voice- and System common messages are the most common messages. Note on, note off, control/mode change, etc. They are parsed by the main parsing function: midi_parse_byte(), which is executed for each byte. The midi_parser_t struct holds the state of the parser. It keeps track of running status, the current message type, and it holds the message buffer currently being parsed. When a message is completed, the parser returns a pointer to the message buffer. The specifics on how a general MIDI parser works can be found in the resources given above. 

System Real-Time messages are short and time sensitive messages like start, stop and timing clock. They may show up anywhere in a serial stream, even in the middle of another message, and should be dealt with immediately. midi_parse_byte() does also support RTM messages. 

Active sense (AS) messages are designed to detect loss of a serial connections and prevent stuck/inadvertently sustained notes. AS implementation is not demanded by the MIDI specification, most vendors do not utilize it, but some do. These messages are basically empty messages confirming the presence of the serial connection. Meaning they make sure that at least 1 message is sent every 300ms; they fill the silence if you will.  

Active sensing is redundant for Bluetooth LE connections, forwarding these messages only increase unnecessary data, chances of failure and complexity. However, we thought it could be useful to implement it as a method of signaling to a connected serial device the state of the Bluetooth LE connection. Active sense messages are therefore transmitted and tracked on the serial connection, but not forwarded to the BLE-MIDI packets. 

When an AS message is received a work routine is scheduled 330ms in the future, any message received after that will reschedule this routine to 330ms into the future. If the work routine executes, the serial connection can be considered lost. Typically, this sort of routine would do things like turning off all notes currently playing. 

Transmitting Active sense message is done in a similar fashion. A work routine is continuously rescheduled whenever data is transmitted, if the work routine is executed an AS message is transmitted. Loss of Bluetooth LE connection could then terminate transmission of active sense. 

System exclusive messages have been excluded in this sample and are currently ignored.

Encoding

The next step after parsing serial data is to encode messages into BLE-MIDI packets, as defined by the BLE-MIDI specification. One packet may contain any number of messages and is only limited by the MTU size. The BLE-MIDI specification does not limit the number of packets sent per connection interval either, which is limited by the Bluetooth LE stack, chipset and the connection interval. A very common implementation we have come across is using 1 packet per message. This impairs the throughput, as each message will require a whole Bluetooth LE packet to be sent with all its overhead. Meaning MIDI data only represents 13-23% of the total Bluetooth LE packet, depending on the packet format and message content. According to our calculations a typical triad chord would need roughly 619 µs more when divided into 3 packets, much of the time being occupied by the Inter Frame Spacing (IFS). Low throughput will cause jitter if many simultaneous MIDI events are triggered. Some of it can be reduced, i.e. by implementing 2M PHY, but reducing packets per connection event is crucial. 

BLE-MIDI Encoding is done by ble_midi_encode_work_handler() which is scheduled as soon as messages are parsed. MIDI messages are passed through a FIFO and formatted, with the attached timestamps, into a BLE-MIDI packet in accordance with the BLE-MIDI specification.  

Radio notifications are used to signal when packets should be completed. Messages should be sent at the earliest occurring connection event, but ideally only one packet should be sent per interval. This is achieved by using radio notifications, which are events that triggers an ISR at a given time before the radio turns on. However, ble_midi_send() cannot be called from an ISR, in addition it creates the danger of sending an incomplete packet if the encoding work routine is interrupted by the ISR. To counter these issues, we made the radio notification ISR routine schedule yet another work routine to the same work queue as the encoder work routine. The encoder routine and the routine that calls ble_midi_send() then worked in a cooperative manner.

This illustrates some of the challenges we suspect to be the reasons for the suboptimal implementations we have seen in BLE-MIDI apps and software. We rely on the radio notifications to tell exactly when the radio will turn on for a transmission, so that we know when to transmit a packet. This type of information might not be available in other operating systems.  

Timestamps

The purpose of timestamping MIDI events is to make sure spacing between events at the input is conserved at the output, to eliminate connection interval induced jitter. Dealing with BLE-MIDI timestamps can be challenging, and the BLE-MIDI Specification is sparse on details regarding how they should be handled by a receiver. Also, here we have come across many implementations that completely ignore or set the timestamps to zero. 

In our sample the timestamps are set when the first byte of a message is received. It is then passed along with the parsed message to the encoder. The timestamp and message are formatted into a BLE-MIDI packet. BLE-MIDI timestamps have a 13-bit millisecond resolution. The six most significant bits of the first message in a packet is used in the packet header. The rest of the messages in the packet are preceded by the seven least significant bits of their timestamps. 

Timestamps in a BLE-MIDI packet must be incremental, this creates a challenge when dealing with System real-time messages, as they can interrupt any MIDI message. Both messages are kept, but since the RTM is parsed before the interrupted message the RTM is placed first in line. The interrupted message may have a lower timestamp than the RTM but are placed in the FIFO buffer last. Therefore, if an RTM interrupts another message, the interrupted message has its timestamp replaced by the timestamp of the RTM. 

The real challenge, and what the specification leaves to the developer, is how to interpret and use the timestamps at the receiving end. There are multiple ways to solve this. We will now discuss three approaches: 

Ignore method: By ignoring timestamps all messages are passed on as soon as possible as illustrated above. This has the advantage of minimizing latency to a bare minimum, but at the cost of inducing jitter. The jitter will be proportional to the connection interval, therefore the lower the connection interval the more it makes sense to use this method, but it is not really acceptable before going below 7.5ms intervals. 

Event spacing method: By storing previous timestamps and finding the difference between the current and the previous timestamp, we can calculate how long to wait before the current message should be played. This has the advantage of maintaining the interval between events, reducing the BLE-connection-interval induced jitter, but at the cost of added latency, because latency might build up over time. This can be corrected for, and this is likely to be a viable method. We developed a solution like this but have not yet refined it. 

Clock synchronization method: By finding the offset between a local clock, and the incoming timestamp. This is done by defining a window where timestamps are expected to occur. This window will be from the local time when the packet was received plus the offset. The end of the window will be the start of the window plus one interval. Whenever a received timestamp is outside this window of expected timestamps, the offset is adjusted. The offset will stabilize itself at the real offset between the clock of the sender and the receiver. This has the advantage of being simple and efficient and is the one used in the sample. 

In a perfect world, the last two of these methods would be the perfect tool for reducing jitter and keeping the latency as low as possible as shown in the illustration above. In the real world however, these methods are prone to errors. All forms of synchronization done, based on the BLE-MIDI-specification compliant timestamps, have inaccuracies. Especially when dealing with unknown devices. We’ve found 3 limiting factors or issues when dealing with timestamps which are results of packet loss, timestamp resolution and message length. 

The millisecond resolution of the timestamps induces rounding differences up to 1ms. Meaning two consecutive messages can have a 1ms shorter or longer time difference than the input had. For larger connection intervals this jitter is negligible, but at shorter intervals they’re more prominent. When combined with Low Latency Packet Mode (LLPM), synchronization increases latency with nearly no obvious advantages. The resolution also creates an artifact on the serial output, given a perfect synchronization method each message will be spaced on a 1ms time grid. 

The variable length of MIDI messages, and the baud rate of serial MIDI, affects the theoretical biggest difference timestamps in the same Bluetooth LE packet. This timestamp window can be skewed as much as 625µs depending on the precision of the serial transmitters baud rate, and when timestamps are generated. This is possible to compensate for by adding latency to messages based on their length, if wanted. However, this is only applicable if the timestamps time of creation is known, meaning if the timestamp were generated at the first byte received, the last byte or any other time. It can also be avoided by adjusting timestamps when they are generated, by the sender.  

Packet loss results in retransmission and arriving too late. This is the largest contribution to problems with synchronization, as they make the synchronization unstable. This is the result of a dynamic synchronization method, which causes jitter not only for the lost message, but subsequent messages until the offset value stabilizes again. Detecting lost packets and mitigating their effects synchronization might be possible, but we have not implemented this as of yet.  

Test Results

None of the developers are trained pianists, but it could have been interesting to see how our sample feels to a trained pianist. Preferably in a double-blind test. Still to our untrained fingers and ears the latency is not noticeable. That is not to say it's not there, which our measurements prove true. 

The test setup consists of two development kits with MIDI-shields, a keyboard with MIDI out, a logic analyzer, and a modified version of the sample. The keyboard is connected to a nRF52840 DK, via a MIDI-shield, flashed with the central sample. The Bluetooth LE central transmits BLE-MIDI to a nRF52832 DK, flashed with the peripheral sample. The BLE peripheral outputs serial MIDI, via a MIDI shield, received from the nRF52840. A logic analyzer is connected to the UART Rx pin on the nRF52840, and the UART Tx pin on the nRF52832. Since the available keyboard had active sensing, and we are trying to compare input and output, the sample code has been modified to transport incoming active sense messages. Active sensing is also turned off at the UART Tx port. 

The testing itself consist of one developer hammering keys on the keyboard, utilizing available buttons, keys, knobs, and levers. Ideally the test data should have been a SMF file, transmitted from an audio device. Unfortunately, all available devices used running status which creates differences between output and input. To adapt to the limits, the test subject hammered on for a minimum of 6 minutes creating at least 30000 bytes of data. The 6 minutes were divided into the following playing styles, 1 minute per technique. 

  1. Simple melody: test subject attempts to play a nursery rhyme, with one hand. 
  2. Chord and melody: test subject attempts to play a song with one hand playing chords and the other playing the melody. 
  3. "I don't like Mondays" routine: the test subject plays glissando in black keys, with both hands; fast. As if playing the intro to "I don't like Mondays", a song by Boomtown Rats. 
  4. Ragtime imposer: Test subject attempts to look like he's playing a ragtime tune. 
  5. Seagull technique: Test subject smashes his flat hands on the keys, as if a seagull were to play the keyboard. 
  6. Freestyle technique: the test subject uses all the techniques above and the bender/modulation lever. 

Result: 7.5ms connection interval with ignored timestamps

The first test shows the result of ignoring timestamps as a receiver. Each byte passed through the test setup is plotted as a blue point. As one can read the results the latency varies between about 2ms and 9.5ms which is exactly one connection interval. The running average latency, the blue line, is just below 6,5ms. There is also a clearly defined line visible, which repeats itself, this line consists of active sense messages. They arrive every 270,01ms resulting in the time until next connection event being offset by 0,01ms every 270ms. These messages consist of a single byte, and the effect of messages having variable length is unfolding in the measurements. AS-messages have an average latency, 625µs lower than the 3-byte long messages. 

Result: 7.5ms with synchronization

In the next test we introduce synchronization. The jitter is drastically reduced, and most messages has a latency between 7.5ms and 9.5ms but with a clear 1ms thick band where most of the messages end up. The thickness of this band is a result of the millisecond resolution of the timestamps. The band is slowly descending, or expanding downwards, for which we have yet to find a reasonable cause for. The band also makes sudden jumps, which correlates with AS-messages, this could be the result of the synchronization not being able to stabilize. These jumps of 1ms up and down are somewhat negligible compared to the bytes measured with a latency up to 17.5ms. Our current conclusion is that this is a result of lost packets, causing the messages to arrive one connection interval too late. The number of lost packets clearly correlate to the data rate illustrated by the red line and points. 

Result: Low Latency Packet Mode with ignored timestamps

Now, what if you could have both low jitter and latency. By implementing Low Latency Packet Mode, the connection interval is reduced to 1ms, which result in an expected dramatic decrease in latency. The timestamp resolution and connection interval are equal; therefore, synchronization only adds unnecessary latency. The synchronization has therefore been omitted from this test. Active sense messages are more prominent in this test, and the spacing between them are shorter than the 7,5ms test results. There is also a noticeably decrease in lost packets, compared to other tests. 

Result: LLPM with us timestamp resolution

In the last test, we got experimental. We reintroduced synchronization and changed the timestamp resolution to microseconds instead of milliseconds. This is of course not compliant to the specification, but it shows some interesting results. Notice the synchronization counters the difference between active sense message and the rest of the test data. But also notice that now all the data follows the same sawtooth pattern. Our main hypothesis as to the cause of the pattern is the scheduling resolution of the kernel which at about 31us. 

Comparing the results

Lastly, we’ve gathered all measurements in the same graphic to compare them to each other. The ideal result would be a vertical line, placed all the way to the left. 

Known issues and improvements:

The sample contains some known issues and bugs, which at this date is: 

  • Increased latency when left idle for long periods (2 hours). 
  • Packet loss is a major issue. These are often in conjunction with large amounts of data. A strong connection is to be recommended for MIDI applications. Further investigation of software precautions to reduce losses and to get less affected by them is needed. 
  • Artifacts in the test results that we are not able to explain fully. These need to be investigated further, and more tests need to be done. Further calibration is needed. 

Features that can be added to the code, other than use case specific features, could be: 

  • Better disconnect handling, send all notes off, on all MIDI channels, to all affected outputs. 
  • Universal Sysex parsing, currently the parser ignores sysex messages.
  • MIDI Reset messages should reset the device to initial conditions. No running status, all notes off, no active sensing, and correction in the convert-timestamp function. 
  • MIDI CI Support. 
  • Support for MIDI 2.0, and translations between versions.

Conclusion

This blog post covers the result of a student project, researching and developing BLE-MIDI for the nRF Connect SDK platform. The project started out just as MIDI 2.0 was announced, so a lot of new specifications are expected to be published in the future. Hopefully we will see the BLE-MIDI specification developed further to address some of the challenges we have pointed out in this blogpost, regarding synchronization, throughput and lost packet treatment. Perhaps new Bluetooth LE features such as LLPM and multilink broadcasting could be incorporated into it as well!

Have a look at, and feel free to follow and contribute to the BLE-MIDI github organization. There you can also find the aforementioned webMIDI application, and a repository adding USB-MIDI support for nRF5 SDK 17. The plan is to keep adding tools, samples and features related to BLE-MIDI development on any platform in this organization. Let us know in the comments if there is anything you would want to see! 

Lastly, we'd like to thank Nordic Semiconductor for supporting this project.

All the best,
Magnus Elkjær Stentsøe & Ole R. Bjerkemo