On nRF Connect for iOS and its Unnecessary BitField Collection in Swift

On nRF Connect for iOS and its Unnecessary BitField Collection in Swift

For the longest time we’ve had this idea, of turning one of our most-used Swift Collection Types into a simple one-instruction comparison call, down at the assembly level, to check whether an element belongs to or not in a Set-like data structure.  

This is the tale of how and why nRF Connect for iOS uses said modified Set-like Collection Type, and the rabbit hole we went down to put that idea to rest.

The Start 

As we have mentioned in other texts, nRF Connect for iOS 2.x follows the Unidirectional Data Flow (UDF) paradigm. We do this without the use of Apple’s own tools, such as Combine, because thus far we have chosen to support all iOS devices running on iOS 9 and newer. What it boils down to, is that nRF Connect is built around one or more Single Source(s) of Truth, called Data Store(s), which handle specific parts of app’s logic, away from all of the User Interface complexity. These stores are all Grand Central Dispatch-protected entities, availalbe to our UIViewController(s) and UIView(s) through an AppContext struct. 

The Data Store(s) need to be saved to disk at regular intervals, because they contain important information that we don’t want to recreate using default values, such as Advertisers and Settings. And to keep the code as simple and reusable as possible, we elected to use Swift’s Codable protocol for automatic storage in JSON format. Whilst easy and simple to use, this JSON format has its drawbacks; namely, that it can cause the amount of data we have to write to balloon in size pretty quickly.

Rise of the OptionSet

One of the things all of us do in our apps is to check for elements belonging to a Collection, for which the obvious solution would be the use of a Set data structure, which provides us with O(1) constant-time to verify whether an element is present.  

The following state, tracking a device's @appearance property, belongs to one of our Data Store(s). We use this, for example, to determine whether a device passes a Filter, such as having DFU capabilities. Hence, why we'd need to store this information in a Set to make this operation fast. This is how a Set<Bluetooth.Appearance> would be saved to disk in JSON form: 

appearance: [ 

    thingy, dfu, mcumgr 

]

This is acceptable, of course. But we were not satisfied with it. Multiple structures in nRF Connect used a similar data representation based on enum values, and it felt like a small-enough problem to tweak and play with that no matter how much we screwed up, we could find our way out of it. Which, we agree, is the very definition of fixing something that isn’t broken. Furthermore, no matter what we choose to do to optimise this use case, the inherit verboseness of the JSON format would render all of our efforts irrelevant. After all, what difference does it make to save a few characters in one section, when in another, the Codable protocol will we auto-generate 5 times the number of boilerplate text? 

If you think about it in this way, the path we’re embarking on does not make any sense. But, programming is also an art form. And in our opinion, as a programmer we should also attempt to have a bit of fun and learn new things along the way. So we started looking for ways to do better. One way we could reduce the number of characters required to represent a Set of elements, was to make their RawRepresentable type an Int. This would yield the following kind of change: 

appearance: [ 

    1, 3, 4 

]

Not bad. However, we decided to go one-step further (the theme of this blog post), and inspired by NSHipster’s OptionSet data structure, we went ahead and defined an OptionSet as a struct containing a Set of elements conforming to the new Option protocol. 

protocol Option: RawRepresentable, Hashable, CaseIterable {} 

struct OptionSet<T: Option>: Hashable, Codable, ExpressibleByArrayLiteral { 

    private var set: Set<T> 

} 

In both Swift’s definition of an OptionSet and NSHipster’s, an OptionSet is defined as a collection of values that we choose to represent via a bit extension of its @rawValue property. So if the thingy enum case’s @rawValue is ‘1’, in the OptionSet it’d be represented as 2 << 1, meaning ‘2’. Applying NSHipster’s OptionSet without any more changes, would lead to the following result after being serialized as JSON: 

appearance: [ 

    2, 8, 16 

]

Baring some extra computation, this is exactly the same result as Swift’s Set. But we can do better. Inspired by how a Status register works in a CPU, we can use a bit to represent whether an element is present or not in the Set. And because we are representing each possible element’s @rawValue as a power of two value, no two elements will overlap, because each one is represented as a bit being set in an integer value, like a CPU’s Status register, where each bit designates a certain condition, with ‘0’ meaning the absence of such condition, and ‘1’ its presence.

Source: Curly Brace Coder

This means it should be possible to reduce the appearance field in JSON like this: 

appearance: 26 // 2^1 (1 << 1) + 2^3 (1 << 3) + 2^4 (1 << 4)

This is how: 

extension Set where Element: Option { 

    var rawValue: Int { 

        var rawValue = 0 

        for (index, element) in Element.allCases.enumerated() { 

            if self.contains(element) { 

                rawValue |= (1 << index) 

            } 

        } 

        return rawValue 

    } 

}

That is, we’re taking the @rawValue of each Option and adding it to form the @rawValue we encode as JSON. With this extension, and a little bit of Codable Protocol trickery involving a custom init() to reverse this effect, we can get an OptionSet to encode and decode itself from a rawValue in this manner. 

There are caveats, of coursewe need to guarantee Unsigned Integer (UInt) RawRepresentable values, or else this bit-logic does not make a lot of sense. And also, we must take care that in each OptionSet, its Generic Type has as many possible values as bits on every CPU register (CPU word size) or less. Otherwise, we’d be stepping into the world of double-word operations, which are more expensive. 

Note: If you need a quick refresher on how a CPU works, this video might be useful.

If this works, what else is there to do, then? 

For the longest time, we were happy with this arrangement. The code worked, and it was solid to a fault. The use of OptionSet was extended across previous features when we first implemented it, then onto new features. Bugs and crashes were found, but none ever caused, or even remotely related, to our custom OptionSet implementation. 

However, we still had this dangling idea on our minds. “If our OptionSet was so efficient, why did we have to rely on a Set for its implementation?” 

Think about it. If we are mimicking’s the CPU’s behavior, why do we need a higher-level abstraction type, such as a Set, to check whether an element is present or not? We can just do it the same way the CPU checks its own Status register: using an Unsigned Integer, we can represent each value being a part of the Set by its corresponding bit being set to ‘1’. We already have the code to represent an element as a bit, and it works. And to check whether an element is present, all we have to do is perform a bitwise AND operation to check whether its corresponding bit is enabled. Excusing the overhead introduced by a higher-level programming language like Swift instead of C/C++, the comparison check itself should be reduced to a single instruction - no overhead ! 

In other words, we’d go from this: 

func contains(_ member: T) -> Bool { 

    return set.contains(member) 

}

To this: 

typealias RegisterValue = UInt 

extension Option where RawValue == RegisterValue {    

var bitwiseValue: RegisterValue { 1 << rawValue } } struct BitField<T: Option>: Hashable, Codable, ExpressibleByArrayLiteral where T.RawValue == RegisterValue { private var bitField: RegisterValue … func contains(_ member: T) -> Bool { return bitField & member.bitwiseValue == member.bitwiseValue } … }

As for item insertion and removal, we just need to add and substract the power-of-two value of the element’s @bitwiseValue, which is the same as turning on and off that bit. Like so: 

mutating func insert(_ newMember: T) { 

    precondition(newMember.rawValue < RegisterValue.bitWidth) 

    guard !contains(newMember) else { return } 

    bitField += newMember.bitwiseValue 

}  

mutating func remove(_ member: T) { 

    precondition(member.rawValue < RegisterValue.bitWidth) 

    guard contains(member) else { return } 

    bitField-= member.bitwiseValue 

} 

Wait a minute! That’s not a lot of fun now, is it? If we need to turn on and off a single bit, why don’t we just use the XOR operation instead?

mutating func insert(_ newMember: T) { 

    guard !contains(newMember) else { return } 

flip(newMember) } mutating func remove(_ member: T) { guard contains(member) else { return } flip(member) } mutating func flip(_ member: T) {
precondition(member.rawValue < RegisterValue.bitWidth) bitField = bitField ^ member.bitwiseValue }

And with these ingredients put together, we are free to kill the Set used as the internal data structure, and just use a CPU word - a UInt which we aliased as RegisterValue - as storage. This use of an Integer of CPU Register/Word size (64 bits in a 64-bit machine architecture), to represent types of known bit-width, all fitting within the CPU’s Word size, is known as a BitField. 

In theory, this change should give us multiple advantages. For one, the code should be faster, because Set is designed to work with any Generic type conforming to the Hashable protocol. Since we only need bit-wise comparisons, we are getting rid of a lot of extra code and API we don’t use from the Swift Standard Library Set. Also, it’s possible LLVM could reduce the in-memory size of our entire BitField to a single CPU register word size: 64 bits in the case of modern Macs and iPhones/iPads (ARM included). We don’t know how much added complexity the Swift Standard Library’s Set means in terms of memory use. 

We are not going to lie, though: completing this transition was not that easy. Not because of the logic pertaining checking for insertion, removal and existence of an element, but because we’d used our internal Set to make OptionSet conform to the Collection protocol. When it was time to adapting BitField for Collection conformance, it did not work. Most of it was because we did not understand the API regarding Collection indices, so we had to learn, and a change tactics towards  copy/pasting some blind code from StackOverflow (kidding). And as usual, headbutting ourselves against when and how the endIndex should return for the Sequence/Collection protocol to understand when a BitField was over, or when it was empty, took an hour of slamming our expensive gaming keyboard on our table, plus 15 seconds of mental brilliance in which we not only solved the entire issue, but also deleted a lot of unnecessary code.

And You Did All of This For...?  

Wait. Just wait until you see this. With our new BitField structure working, with its Unit Tests verifying it, plus all of nRF Connect's massive suite of Integration Tests confirming most of the app’s features worked as expected, we cracked our nuckles and typed in the following in LLDB: 

print(MemoryLayout<BitField<Bluetooth.Appearance>>.size) // 8 bytes (64 bits)

print(MemoryLayout<BitField<Bluetooth.Appearance>>.stride) // 8 bytes (64 bits)

print(MemoryLayout<BitField<Bluetooth.Appearance>>.alignment) // 8 bytes (64 bits)

Aha! Success! Now, let's rollback to a previous commit, so we can compare this to the old and busted OptionSet:

print(MemoryLayout<OptionSet<Bluetooth.Appearance>>.size) // 8 bytes (64 bits)

print(MemoryLayout<OptionSet<Bluetooth.Appearance>>.stride) // 8 bytes (64 bits)

print(MemoryLayout<OptionSet<Bluetooth.Appearance>>.alignment) // 8  bytes (64 bits)

Uhhhh. That’s not what we were expecting 

But all of this is true. We reached the purported end of the rabbit hole, only to discover our new implementation of OptionSet as a BitField is as efficient memory-wise as the Swift Standard Library tools. When Apple engineers insist on Swift being optimised for our perusal, and that we shouldn’t be worrying about unnecessary tricks, they’re not lying. They mean it. 

Saving Grace

There was one more thing we were curious about regarding the BitField, and that is, how is our BitField translated into assembly code by the compiler? Does it really turn into a single bit-comparison instruction, like we pictured it to be? Or were we also wrong about that? 

To verify, we opted for the easiest and simplest path we could think of. iOS apps are not only encripted when compiled and packaged for release on the AppStore, they also have their symbols stripped or mangled, making them harder to reverse-engineer and inspect with tools such as IDA Pro. So we took a different route - we know Apple is transitioning to ARM, but we’re more comfortable with x86-64 assembly, so using this little checkbox that automagically transforms an iOS App into a Mac App, we produced a Mac Release build of nRF Connect, and inspected the code executed for BitField’s contains() function using Hopper: 

00000001001ad9f0         push       rbp                                         ; CODE XREF=_$s11nRF_Connect6BitFieldV13bitFieldMembers33_7ECDA5C4BCAAD9C5789D6122AEC1A18ALLSayxGyFSbxXEfU_TA+17 

00000001001ad9f1         mov        rbp, rsp 

00000001001ad9f4         push       r15 

00000001001ad9f6         push       r14 

00000001001ad9f8         push       r13 

00000001001ad9fa         push       r12 

00000001001ad9fc         push       rbx 

00000001001ad9fd         push       rax 

00000001001ad9fe         mov        r15, rdx 

00000001001ada01         mov        r14, rsi 

00000001001ada04         mov        r13, rdi 

00000001001ada07         mov        r12, qword [rcx+0x18] 

00000001001ada0b         lea        rax, qword [rbp+var_30] 

00000001001ada0f         mov        rdi, rdx 

00000001001ada12         mov        rsi, r12 

00000001001ada15         call       imp___stubs__$sSY8rawValue03RawB0QzvgTj     ; dispatch thunk of Swift.RawRepresentable.rawValue.getter : A.RawValue 

00000001001ada1a         mov        rcx, qword [rbp+var_30] 

00000001001ada1e         xor        eax, eax 

00000001001ada20         xor        edx, edx 

00000001001ada22         cmp        rcx, 0x40 

00000001001ada26         setg       al 

00000001001ada29         seta       dl 

00000001001ada2c         test       rcx, rcx 

00000001001ada2f         cmovs      eax, edx 

00000001001ada32         test       al, al 

00000001001ada34         je         loc_1001ada3a 

  

                     loc_1001ada36: 

00000001001ada36         xor        ebx, ebx                                    ; CODE XREF=_$s11nRF_Connect6BitFieldV8containsySbxF+83 

00000001001ada38         jmp        loc_1001ada4d 

  

                     loc_1001ada3a: 

00000001001ada3a         test       rcx, rcx                                    ; CODE XREF=_$s11nRF_Connect6BitFieldV8containsySbxF+68 

00000001001ada3d         js         loc_1001adaa7 

  

00000001001ada3f         cmp        rcx, 0x40 

00000001001ada43         jge        loc_1001ada36 

  

00000001001ada45         mov        ebx, 0x1 

00000001001ada4a         shl        rbx, cl 

  

                     loc_1001ada4d: 

00000001001ada4d         lea        rax, qword [rbp+var_30]                     ; CODE XREF=_$s11nRF_Connect6BitFieldV8containsySbxF+72 

00000001001ada51         mov        rdi, r15 

00000001001ada54         mov        rsi, r12 

00000001001ada57         call       imp___stubs__$sSY8rawValue03RawB0QzvgTj     ; dispatch thunk of Swift.RawRepresentable.rawValue.getter : A.RawValue 

00000001001ada5c         mov        rcx, qword [rbp+var_30] 

00000001001ada60         xor        eax, eax 

00000001001ada62         xor        edx, edx 

00000001001ada64         cmp        rcx, 0x40 

00000001001ada68         setg       al 

00000001001ada6b         seta       dl 

00000001001ada6e         test       rcx, rcx 

00000001001ada71         cmovs      eax, edx 

00000001001ada74         test       al, al 

00000001001ada76         je         loc_1001ada7c 

  

                     loc_1001ada78: 

00000001001ada78         xor        eax, eax                                    ; CODE XREF=_$s11nRF_Connect6BitFieldV8containsySbxF+149 

00000001001ada7a         jmp        loc_1001ada8f 

  

                     loc_1001ada7c: 

00000001001ada7c         test       rcx, rcx                                    ; CODE XREF=_$s11nRF_Connect6BitFieldV8containsySbxF+134 

00000001001ada7f         js         loc_1001adaa9 

  

00000001001ada81         cmp        rcx, 0x40 

00000001001ada85         jge        loc_1001ada78 

  

00000001001ada87         mov        eax, 0x1 

00000001001ada8c         shl        rax, cl 

  

                     loc_1001ada8f: 

00000001001ada8f         and        rbx, r14                                    ; CODE XREF=_$s11nRF_Connect6BitFieldV8containsySbxF+138 

00000001001ada92         cmp        rbx, rax 

00000001001ada95         sete       al 

00000001001ada98         add        rsp, 0x8 

00000001001ada9c         pop        rbx 

00000001001ada9d         pop        r12 

00000001001ada9f         pop        r13 

00000001001adaa1         pop        r14 

00000001001adaa3         pop        r15 

00000001001adaa5         pop        rbp 

00000001001adaa6         ret 

                        ; endp 

  

                     loc_1001adaa7: 

00000001001adaa7         ud2                                                    ; CODE XREF=_$s11nRF_Connect6BitFieldV8containsySbxF+77 

                        ; endp 

  

                     loc_1001adaa9: 

00000001001adaa9         ud2                                                    ; CODE XREF=_$s11nRF_Connect6BitFieldV8containsySbxF+143 

                        ; endp 

00000001001adaab         align      16 

This is a lot to take in. But we can focus only in what we’re interested in. The first batch of push instructions load the registers with the necessary parameters, so we can skip those. Then, as seen by the comment given to us by Hopper, we can read it’s a call to get the rawValue of the Option Generic, which is the ‘bit’ in the register we need to check against. Following that, there’s the code calculating the @bitwiseValue representative of that Option that we want to AND against, and then, there’s the magic we were waiting to see: 

00000001001ada8f         and        rbx, r14                                    ; CODE XREF=_$s11nRF_Connect6BitFieldV8containsySbxF+138 

00000001001ada92         cmp        rbx, rax 

00000001001ada95         sete       al

This section is pretty much a mirror image of our code: 

func contains(_ member: T) -> Bool { 

    return bitField & member.bitwiseValue == member.bitwiseValue 

} 

We perform a bitwise-AND, and then compare it. The RBX register contains the BitField’s value, whilst r14 is a parameter pushed onto the stack, so this is the member’s @bitwiseValue already calculated in the register (1 << rawValue), which is AND’ed, and its result stored back into RBX. RAX is also a parameter pushed onto the stack at the start of the call, and judging by the code, we can presume it contains the same value r14 contained, meaning member.bitwiseValue, which is 1 << rawValue’s result. The results of the AND (RBX) are compared (substracted) from RAX (member.bitwiseValue) via the CMP instruction, and if they are equal, the Flags register inside the CPU must get its ZERO bit set to ‘1’ if said element is contained in the bitField, or zero otherwise. A quick StackOverflow search reveals the SETE instruction writes the value of the ZERO flag to the given register, meaning we’re writing into AL the result of the call to the contains() function. 

TL;DR: Yes. Our theory regarding LLVM translating our code in assembly down to an AND instruction is true. We haven’t checked if the same thing applies to our XOR operation for insertion and deletion, but it’s safe to presume that’d be the case. Now however, is time to check how the previous iteration of BitField, when it was called OptionSet and relied on an internal Swift Standard Library Set, implements the same functionality: 

 

The first problem we encounter is thatwe cannot find the contains() procedure inside the OptionSet. It’s not there. When we looked for its implementation in BitField, we noticed the assembly procedures were defined in the same order as in the source file, so we know where to look. So where is the code? It must exist somwhere. And it does. What we think the Swift compiler is doing is folding the OptionSet into internal the Set itself when compiled, which can explain how the OptionSet struct doesn’t seemingly add any extra baggage in its in-memory representation due to its hidden internal data structure. 

So what we can do, instead, is check for the Swift.Set’s contains() procedure, which surely, is the call executed. Searching for a Swift.Set.contains() procedure in Hopper, reveals multiple implementations, or specializations in Swift parlance, of the contains() procedure for each possible Generic Type it's used with, so we’re on the right track. If we check the code of a generic that we use with the OptionSet, like Bluetooth.Appearance whose possible enum values we’ve used as an example throughout the text, we should be looking at the code we want: 

int _$sSh8containsySbxF11nRF_Connect9BluetoothO10AppearanceO_Tg5(int arg0, int arg1, int arg2, int arg3) { 

    rsi = arg1; 

    rdi = arg0; 

    COND = *(rsi + 0x10) == 0x0; 

    if (COND) goto loc_100157732; 

  

loc_100157678: 

    rbx = rsi; 

    r14 = rdi; 

    Swift.Hasher.init(*(rsi + 0x28)); 

    Swift.Hasher._combine(r14 & 0xff); 

    rax = Swift.Hasher._finalize(); 

    rdx = !(0xffffffffffffffff << *(int8_t *)(rbx + 0x20)); 

    rax = rax & rdx; 

    if (COND) goto loc_100157732; 

  

loc_1001576c1: 

    rsi = *(rbx + 0x30); 

    COND = *(int8_t *)(rsi + rax) == r14; 

    if (COND) goto loc_100157736; 

  

loc_1001576cb: 

    rax = rax + 0x1 & rdx; 

    if (COND) goto loc_100157732; 

  

loc_1001576e3: 

    COND = *(int8_t *)(rsi + rax) == r14; 

    if (COND) goto loc_100157736; 

  

loc_1001576e9: 

    rax = rax + 0x1 & rdx; 

    if (COND) goto loc_100157732; 

  

loc_100157710: 

    COND = *(int8_t *)(rsi + rax) == r14; 

    if (COND) goto loc_100157736; 

  

loc_100157716: 

    rax = rax + 0x1 & rdx; 

    if (COND) goto loc_100157710; 

  

loc_100157730: 

    rcx = 0x0; 

    goto loc_100157738; 

  

loc_100157738: 

    rax = rcx; 

    return rax; 

  

loc_100157736: 

    rcx = 0x1; 

    goto loc_100157738; 

  

loc_100157732: 

    rcx = 0x0; 

    goto loc_100157738; 

} 

Now, this is not the assembly code of the contains() function. We’ve gone a different route here, because the assembly code for this function is nearly twice the length as our BitField's. Instead, we're looking at the pseudocode interpretation of this procedure, which is a handy feature of Hopper. Here we can summarise the code as being split into two parts: at the start, we get the parameters, same as our BitField assembly, and then, Set’s contains() calculates the hash value of the value we want to check against. What follows is a series of if-statements and conditions, wherein loc_100157738 represents the code segment to call once the final result is known in the RCX register. Before returning from the procedure, the RCX value is copied (MOVed) into RAX, and then we exit. 

TL;DR: yes. It looks like our BitField is shorter, and simpler (less branches & less instructions) when compiled down to assembly. At least in the x86-64 variant as built by Xcode 11.6 under macOS 10.15. 

But, is implementing a BitField worth it?  

We cannot answer that. This is an area of knowledge you might not want to pursue as a Swift developer, which is fine. We on the other hand, like to be more conscious about the hardware we’re running on, for no other reason that modern processors and compilers won’t make our code run faster if the way we’ve written our software doesn’t match their expectations. But again, this is a choice not all teams can afford to make. 

Conclusions

Learning is always fun. And also, next time you open nRF Connect for iOS, and notice that the version number is 2.3.2 or newer, know that yes, besides all the changes listed in theWhat’s New’ page, your Apple processor is executing a few instructions less, thanks to our BitField implementation. 

 

Parents Comment Children
No Data