Exposing NSMutableArray

I’ve always wondered how NSMutableArray works internally. Don’t get me wrong, immutable arrays certainly provide enormous benefits: not only are they thread safe but also copying them is essentially free. It doesn’t change the fact that they’re quite dull – their contents can’t be modified. I find the actual memory manipulation details fascinating and this is why this article focuses on mutable arrays.

Since I more or less describe the complete process I used to investigate NSMutableArray, this post is fairly technical. There is an entire section discussing the ARM64 assembly, so if you find that boring then do not hesitate to skip it. Once we’re through the low level details, I present the non obvious characteristics of the class.

Implementation details of NSMutableArray are private for a reason. They’re subject to change at any time, both in terms of underlying subclasses and their ivar layouts, as well as underpinning algorithms and data structures. Regardless of those caveats, it’s worth peeking under the hood of NSMutableArray and figuring out how it works and what can be expected of it. The following study is based on iOS 7.0 SDK.

As usual, you can find the accompanying Xcode project on my GitHub.

The Problem of Plain Old C Arrays

Every self-respecting programmer knows how C arrays work. It boils down to a continuous segment of memory that can be easily read from and written into. While arrays and pointers are not the same (see “Expert C Programming” or this article), it’s not much of an abuse to consider “malloc-ed block of memory” as an array.

One of the most obvious drawbacks of using linear memory is that inserting an element at index 0 requires shifting all the other elements via means of memmove:

Inserting to C array at index 0

Inserting to C array at index 0

Similarly, removing the first element requires shifting action as well, assuming one wants to keep the same memory pointer as an address of the first element:

Removing from C array at index 0

Removing from C array at index 0

For very large arrays this quickly becomes a problem. Obviously, direct pointer access is not necessarily the highest level of abstraction in the world of arrays. While C-style arrays are often useful, the daily bread and butter of Obj-C programmers wanting a mutable, indexed container is NSMutableArray.

NSMutableArray

Diving in

Even though Apple publishes source code for many libraries, Foundation and its NSMutableArray is not open sourced. However, there are some tools that make unrevealing its mysteries a little bit easier. We begin our journey at the highest possible level, reaching into the lower layers to get otherwise inaccessible details.

Getting & Dumping Class

NSMutableArray is a class cluster – its concrete implementations are actually subclasses of NSMutableArray itself. Instances of which class does actually +[NSMutableArray new] return? With LLDB we don’t even have to write any code to figure that out:

(lldb) po [[NSMutableArray new] class]
__NSArrayM

With class name in hand we reach for class-dump. This handy utility forges the class headers obtained by analyzing provided binaries. Using the following one-liner we can extract the ivar layout we’re interested in:

./class-dump --arch arm64 /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS7.0.sdk/System/Library/Frameworks/CoreFoundation.framework/CoreFoundation | pcregrep -M "^[@\w\s]*__NSArrayM[\s\w:{;*]*}"

I suck at regular expressions so the one used above is probably not a masterpiece, but it provides fruitful results:

@interface __NSArrayM : NSMutableArray
{
    unsigned long long _used;
    unsigned long long _doHardRetain:1;
    unsigned long long _doWeakAccess:1;
    unsigned long long _size:62;
    unsigned long long _hasObjects:1;
    unsigned long long _hasStrongReferences:1;
    unsigned long long _offset:62;
    unsigned long long _mutations;
    id *_list;
}

The bitfields in the original output are typed with unsigned int, but obviously you can’t fit 62 bits into a 32-bit integer – class-dump hasn’t been patched yet to parse ARM64 libraries correctly. Despite those minor flaws, one can tell quite a lot about the class just by looking at its ivars.

Disassembling Class

The most important tool in my investigation was Hopper. I’m in love with this disassembler. It’s an essential tool for curious souls who have to know how everything works. One of the best features of Hopper is that it generates C-like pseudocode that very often is clear enough to grasp the gist of the implementation.

The crucial method for understanding __NSArrayM is - objectAtIndex:. While Hopper does great job providing pseudocode for ARMv7, the feature doesn’t work for ARM64 yet. I figured it would be an excellent exercise to do this manually, with some hints from its ARMv7 counterpart.

Dissecting the Method

With ARMv8 Instruction Set Overview (registration required) in one hand and a bunch of educated guesses in another I think I’ve deciphered the assembly correctly. However, you should not treat the following analysis as an ultimate source of wisdom. I’m new at this.

Passing Arguments

As a start point let’s note that every Obj-C method is actually a C function with two additional parameters. The first one is self which is a pointer to the object being receiver of the method call. The second one is _cmd which represents the current selector.

One might say that the equivalent C-style declaration of the - objectAtIndex: function is:

id objectAtIndex(NSArray *self, SEL _cmd, NSUInteger index);

Since on ARM64 these types of parameters are passed in the consecutive registers, we can expect the self pointer to be in x0 register, the _cmd in x1 register and the index of object in x2 register. For details of parameter passing please reference the ARM Procedure Call Standard, noting that Apple’s iOS version has some divergences.

Analyzing Assembly

This will look scary. Since analyzing a large chunk of assembly at once is not sane, we’re going to walk through the following code step by step, figuring out what each line does.

0xc2d4         stp        x29, x30, [sp, #0xfffffff0]!
0xc2d8         mov        x29, sp
0xc2dc         sub        sp, sp, #0x20
0xc2e0         adrp       x8, #0x1d1000
0xc2e4         ldrsw      x8, [x8, #0x2c]
0xc2e8         ldr        x8, [x0, x8]
0xc2ec         cmp        x8, x2
0xc2f0         b.ls       0xc33c

0xc2f4         adrp       x8, #0x1d1000
0xc2f8         ldrsw      x8, [x8, #0x30]
0xc2fc         ldr        x8, [x0, x8]
0xc300         lsr        x8, x8, #0x2
0xc304         adrp       x9, #0x1d1000
0xc308         ldrsw      x9, [x9, #0x34]
0xc30c         ldr        x9, [x0, x9]
0xc310         add        x9, x2, x9, lsr #2
0xc314         cmp        x8, x9
0xc318         csel       x8, xzr, x8, hi
0xc31c         sub        x8, x9, x8
0xc320         adrp       x9, #0x1d1000
0xc324         ldrsw      x9, [x9, #0x38]
0xc328         ldr        x9, [x0, x9]
0xc32c         ldr        x0, [x9, x8, lsl #3]
0xc330         mov        sp, x29
0xc334         ldp        x29, x30, [sp], #0x10
0xc338         ret  

The Setup

We begin with what seems to be an ARM64 function prologue. We’re saving x29 and x30 registers on the stack then we move the current stack pointer sp to x29 register:

0xc2d4         stp        x29, x30, [sp, #0xfffffff0]!
0xc2d8         mov        x29, sp

We make some room on the stack (subtracting, since stack grows downwards):

0xc2dc         sub        sp, sp, #0x20

The path code we’re interested in doesn’t seem to be using this space. However, the “out of bounds” exception throwing code does call some other functions thus the prologue has to facilitate both options.

Fetching Count

The next two lines perform program counter relative addressing. The specifics of addresses encoding are quite complicated and the literature is scarce, however, Hopper automatically calculates more sane offsets:

0xc2e0         adrp       x8, #0x1d1000
0xc2e4         ldrsw      x8, [x8, #0x2c]

The two lines will fetch contents of memory located at 0x1d102c and store it into x8 register. What’s over there? Hopper is kind enough to help us:

_OBJC_IVAR_$___NSArrayM._used:
0x1d102c         dd         0x00000008

This is the offset of _used ivar inside the __NSArrayM class. Why go through the trouble of additional fetch, instead of simply putting the value 8 into the assembly? It’s because of fragile base class problem. The modern Objective-C runtime handles this by giving itself an option to overwrite the value at 0x1d102c (and all the other ivar offsets for that matter). If either NSObject, or NSArray, or NSMutableArray adds new ivars, the old binaries will still work.

The runtime can modify the ivars’ offsets dynamically without breaking compatibility

The runtime can modify the ivars’ offsets dynamically without breaking compatibility

While the CPU has to do an additional memory fetch, this is a beautiful solution as explained in more details on Hamster Emporium and Cocoa with Love.

At this point we know the offset of _used within the class. And since the Obj-C objects are nothing more than structs, and we’ve got pointer to this struct in x0, all we have to do is fetch the value:

0xc2e8         ldr        x8, [x0, x8]

The C equivalent of the above code would be:

    unsigned long long newX8 = *(unsigned long long *)((char *)(__bridge void *)self + x8);

I think I prefer the assembly version. Quick analysis of disassembled - count method of __NSArrayM reveals that the _used ivar contains the number of elements in __NSArrayM, and as of this moment we have this value in x8 register.

Checking Bounds

Having requested index in x2 and count in x8 the code compares the two:

0xc2ec         cmp        x8, x2
0xc2f0         b.ls       0xc33c

When the value of x8 is lower or same as x2 then we jump to the code at 0xc33c, which handles the exception throwing. This is essentially the bounds checking. If we fail the test (count is lower or equal to index), we throw exception. I’m not going to discuss those parts of disassembly since they don’t really introduce anything new. If we pass the test (count is greater than index) then we simply keep executing instructions in order.

Calculating Memory Offset

We’ve seen this before, this time we fetch the offset of _size ivar located at 0x1d1030:

0xc2f4         adrp       x8, #0x1d1000
0xc2f8         ldrsw      x8, [x8, #0x30]

Then we retrieve its contents and shift it to the right by two bits:

0xc2fc         ldr        x8, [x0, x8]
0xc300         lsr        x8, x8, #0x2

What’s the shift for? Let’s take a look at the dumped header:

unsigned long long _doHardRetain:1;
unsigned long long _doWeakAccess:1;
unsigned long long _size:62;

It turns out, all three bitfields share the same storage, so to get the actual value of _size, we have to shift value to the right, discarding the bits for _doHardRetain and _doWeakAccess. Ivar offsets for _doHardRetain and _doWeakAccess are exactly the same, but their bit accessing code is obviously different.

Going further it’s the same drill, we get the contents of _offset ivar (at 0x1d1034) into x9 register:

0xc304         adrp       x9, #0x1d1000
0xc308         ldrsw      x9, [x9, #0x34]
0xc30c         ldr        x9, [x0, x9]

In the next line we add the requested index stored in x2 to the 2-bits-right-shifted _offset (it’s also a 62 bits-wide bitfield) and then we store it all back in x9. Isn’t assembly just amazing?

0xc310         add        x9, x2, x9, lsr #2

The next three lines are the most important ones. First of all, we compare the _size (in x8) with _offset + index (in x9):

0xc314         cmp        x8, x9

Then we conditionally select value of one register based on the result of previous comparison:

0xc318         csel       x8, xzr, x8, hi

This is more or less equal to ?: operator in C:

x8 = hi ? xzr : x8;      /* csel       x8, xzr, x8, hi */

The xzr register is a zero register which contains value 0, and the hi is the name of condition code the csel instruction should inspect. In this case we’re checking whether the result of comparison was higher (if value of x8 was greater than that of x9).

Finally we subtract the new value of x8 from the _offset + index (in x9) and we store it in x8 again:

0xc31c         sub        x8, x9, x8

So what just happened? First of all let’s look into equivalent C code:

int tempIndex = _offset + index;        /* add        x9, x2, x9, lsr #2   */
BOOL isInRange = _size > tempIndex;     /* cmp        x8, x9   */
int diff = isInRange ? 0 : _size;       /* csel       x8, xzr, x8, hi   */
int fetchIndex = tempIndex - diff;      /* sub        x8, x9, x8   */

In the C code we don’t have to shift neither _size nor _offset to the right, since compiler does this automatically for bitfields access.

Getting the Data

We’re almost there. Let’s fetch the contents of _list ivar (0x1d1038) into x9 register:

0xc320         adrp       x9, #0x1d1000
0xc324         ldrsw      x9, [x9, #0x38]
0xc328         ldr        x9, [x0, x9]

At this moment the x9 points to the beginning of memory segment containing the data.

Finally, shift the value of fetch index stored in x8 to the left by 3 bits, add it to the x9 and put the contents of memory at that location into x0:

0xc32c         ldr        x0, [x9, x8, lsl #3]

Two things are important here. First of all, every data offset is in bytes. Shifting a value to the left by 3 is equivalent to multiplying it by 8, which is the size of a pointer on a 64-bit architecture. Secondly, the results goes into x0 which is the register which stores the return value of a function returning NSUInteger.

At this point we are done. We have fetched the correct value stored in array.

Function Epilog

There rest are some boilerplate operations that restore the state of registers and stack pointer before the call. We’re reversing the function’s prologue and returning:

0xc330         mov        sp, x29
0xc334         ldp        x29, x30, [sp], #0x10
0xc338         ret        

Putting it all together

I explained what the code does, but the question we will now answer is why?

The Meaning of ivars

Let’s do a quick summary of what each ivar means:

  • _used is count
  • _list is a pointer to the buffer
  • _size is size of the buffer
  • _offset is an index of the first element of the array stored in the buffer

The C code

With ivars in mind and the disassembly analyzed we now can write an equivalent Objective-C code that performs the very same actions:

- (id)objectAtIndex:(NSUInteger)index
{
    if (_used <= index) {
        goto ThrowException;
    }
    
    NSUInteger fetchOffset = _offset + index;
    NSUInteger realOffset = fetchOffset - (_size > fetchOffset ? 0 : _size);
    
    return _list[realOffset];
    
ThrowException:
    // exception throwing code
}

Assembly was definitely much more long-winded.

Memory Layout

The most crucial part is deciding whether the realOffset should be equal to fetchOffset (subtracting zero) or fetchOffset minus _size. Since staring at dry code doesn’t necessarily paint the perfect picture, let’s consider two examples of how object fetching works.

_size > fetchOffset

In this example, the offset is relatively low:

A simple example

A simple example

To fetch object at index 0 we calculate the fetchOffset as 3 + 0. Since _size is larger than fetchOffset the realOffset is equal to 3 as well. The code returns value of _list[3]. Fetching object at index 4 makes fetchOffset equal to 3 + 4 and the code returns _list[7].

_size <= fetchOffset

What happens when offset is large?

A more difficult example

A more difficult example

Fetching object at index 0 makes fetchOffset equal to 7 + 0 and the call returns _list[7] as expected. However, fetching object at index 4 makes fetchOffset equal to 7 + 4 = 11, which is larger than _size. Obtaining realOffset requires subtracting _size value from fetchOffset which makes it 11 - 10 = 1 and the method returns _list[1].

We’re essentially doing modulo arithmetic, circling back to the other end of the buffer when crossing the buffer boundary.

Data Structure

As you might have guessed, __NSArrayM makes use of circular buffer. This data structure is extremely simple, but a little bit more sophisticated than a regular array/buffer. The contents of a circular buffer can wrap around when either end is reached.

Circular buffers have some very cool properties. Notably, unless the buffer is full, insertion/deletion from either end doesn’t require any memory to be moved. Let’s analyze how the class utilizes circular buffer to be superior in its behavior in comparison to a C array.

__NSArrayM Characteristics

While reverse-engineering the rest of the disassembled methods would provide the definite explanation of __NSArrayM internals, we can employ the discovered points of data to investigate the class at a higher level.

Inspecting at Runtime

To inspect __NSArrayM at runtime we can’t simply paste in the dumped header. First of all, the test application won’t link without providing at least empty @implementation block for the __NSArrayM class. Adding this @implementation block is probably not a good idea. While the app builds and actually runs, I’m not entirely sure how does the runtime decide which class to use (if you do know, please let me know). For the sake of safety I’ve renamed the class name to something unique - BCExploredMutableArray.

Secondly, ARC won’t let us compile the id *_list ivar without specifying its ownership. We’re not going to write into the ivar, so prepending id with __unsafe_unretained should not interfere with ARC’s memory management. However, I’ve chosen to declare the ivar as void **_list and it will soon become clear why.

Printout code

We can create a category on NSMutableArray that will print the contents of ivars as wells as the list of all the pointers contained within the array:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
- (NSString *)explored_description
{
    assert([NSStringFromClass([self class]) isEqualToString:@"__NSArrayM"]);

    BCExploredMutableArray *array = (BCExploredMutableArray *)self;

    NSUInteger size = array->_size;
    NSUInteger offset = array->_offset;
    
    NSMutableString *description = [NSMutableString stringWithString:@"\n"];
    
    [description appendFormat:@"Size: %lu\n", (unsigned long)size];
    [description appendFormat:@"Count: %llu\n", (unsigned long long)array->_used];
    [description appendFormat:@"Offset: %lu\n", (unsigned long)offset];
    [description appendFormat:@"Storage: %p\n", array->_list];
    
    for (int i = 0; i < size; i++) {
        [description appendFormat:@"[%d] %p\n", i, array->_list[i]];
    }

    return description;
}

The Results

Insertion and deletion is fast at both ends

Let’s consider a very simple example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
NSMutableArray *array = [NSMutableArray array];

for (int i = 0; i < 5; i++) {
    [array addObject:@(i)];
}

[array removeObjectAtIndex:0];
[array removeObjectAtIndex:0];

NSLog(@"%@", [array explored_description]);

The output shows that removing object at index 0 twice simply clears the pointers and moves the _offset ivar accordingly:

Size: 6
Count: 3
Offset: 2
Storage: 0x178245ca0
[0] 0x0
[1] 0x0
[2] 0xb000000000000022
[3] 0xb000000000000032
[4] 0xb000000000000042
[5] 0x0

Here’s the visual interpretation of what’s going on:

Removing object at index 0 twice

Removing object at index 0 twice

What about adding objects? Let’s run another test on a brand new array:

1
2
3
4
5
6
    NSMutableArray *array = [NSMutableArray array];

    for (int i = 0; i < 4; i++) {
        [array addObject:@(i)];
    }
    [array insertObject:@(15) atIndex:0];

Inserting object at index 0 uses the circular buffer magic to put the newly inserted object at the end of the buffer:

Size: 6
Count: 5
Offset: 5
Storage: 0x17004a560
[0] 0xb000000000000002
[1] 0xb000000000000012
[2] 0xb000000000000022
[3] 0xb000000000000032
[4] 0x0
[5] 0xb0000000000000f2

And visually:

Adding object at index 0

Adding object at index 0

This is fantastic news! It means that __NSArrayM can be processed from either side. You can use __NSArrayM as either stack or queue without any performance hits.

On a side note, you can see how on 64-bit architectures NSNumber uses tagged pointers to do its storage.

Non integral growth factor

OK, for this one I’ve cheated a little bit. While I’ve done some empirical testing as well, I wanted to have the exact value and I’ve peeked into the disassembly of insertObject:atIndex:. Whenever the buffer gets full, it is reallocated with 1.625 times larger size. I was quite astonished for it not to be equal to 2.

Update: Mike Curtiss has provided a very good explanation of why making resizing factor equal to 2 is suboptimal.

Once grown, doesn’t shrink

This is a shocker – __NSArrayM never reduces its size! Let’s run the following test code:

1
2
3
4
5
6
NSMutableArray *array = [NSMutableArray array];

for (int i = 0; i < 10000; i++) {
    [array addObject:[NSObject new]];
}
[array removeAllObjects];

Even though at this point the array is empty, it still keeps the large buffer:

Size: 14336

This is not something you should worry about unless you use the NSMutableArray to load in some enormous amounts of data and then clear the array with the intention of freeing the space.

Initial capacity almost doesn’t matter

Let’s allocate new arrays with initial capacity set to consecutive powers of two:

1
2
3
for (int i = 0; i < 16; i++) {
    NSLog(@"%@", [[[NSMutableArray alloc] initWithCapacity:1 << i] explored_description]);
}

Surprise surprise:

Size:2     // requested capacity - 1
Size:2     // requested capacity - 2
Size:4     // requested capacity - 4
Size:8     // requested capacity - 8
Size:16    // requested capacity - 16
Size:16    // requested capacity - 32
Size:16    // requested capacity - 64
Size:16    // requested capacity - 128
... // Size:16 all the way down

It doesn’t clean up its pointers on deletion

This is barely important, but I found it nonetheless interesting:

1
2
3
4
5
6
7
8
NSMutableArray *array = [NSMutableArray array];

for (int i = 0; i < 6; i++) {
    [array addObject:@(i)];
}
[array removeObjectAtIndex:1];
[array removeObjectAtIndex:1];
[array removeObjectAtIndex:1];

And the output:

Size: 6
Count: 3
Offset: 3
Storage: 0x17805be10
[0] 0xb000000000000002
[1] 0xb000000000000002
[2] 0xb000000000000002
[3] 0xb000000000000002
[4] 0xb000000000000042
[5] 0xb000000000000052

The __NSArrayM doesn’t bother clearing previous space when moving its objects forward. However, the objects do get deallocated. It’s not the NSNumber doing its magic either, NSObject acts correspondingly.

This explains why I’ve opted for defining _list ivar as void **. If the _list was declared as id * then the following loop would crash on the object assignment:

for (int i = 0; i < size; i++) {
    id object = array->_list[i];
    NSLog("%p", object);
}

ARC implicitly inserts a retain/release pair causing it to access deallocated objects. While prepending id object with __unsafe_unretained fixes the problem, I absolutely didn’t want anything to call any methods on this bunch of stray pointers. That’s my void ** reasoning.

Worst case scenario is adding/removing from the middle

In these two examples we will remove elements from roughly the middle of the array:

1
2
3
4
5
6
NSMutableArray *array = [NSMutableArray array];

for (int i = 0; i < 6; i++) {
    [array addObject:@(i)];
}
[array removeObjectAtIndex:3];

In output we see that the top part shifts down, where down are lower indexes (notice the stray pointer at [5])

[0] 0xb000000000000002
[1] 0xb000000000000012
[2] 0xb000000000000022
[3] 0xb000000000000042
[4] 0xb000000000000052
[5] 0xb000000000000052
Removing object at index 3

Removing object at index 3

However, when we call [array removeObjectAtIndex:2] instead, the bottom part shifts up, where up are higher indexes:

[0] 0xb000000000000002
[1] 0xb000000000000002
[2] 0xb000000000000012
[3] 0xb000000000000032
[4] 0xb000000000000042
[5] 0xb000000000000052
Removing object at index 2

Removing object at index 2

Inserting object in the middle has very similar results. The reasonable explanation is __NSArrayM tries to minimize the amount of memory moved, thus it will move at most half of its elements.

Being a Good Subclassing Citizen

As discussed in NSMutableArray Class Reference every NSMutableArray subclass must implement the following seven methods:

  • - count
  • - objectAtIndex:
  • - insertObject:atIndex:
  • - removeObjectAtIndex:
  • - addObject:
  • - removeLastObject
  • - replaceObjectAtIndex:withObject:

Unsurprisingly, __NSArrayM fulfills this requirement. However, the list of all the methods implemented by __NSArrayM is quite short and doesn’t contain 21 additional methods listed in NSMutableArray header. Who’s responsible for executing those methods?

It turns out they’re all part of the NSMutableArray class itself. This is extremely convenient – any subclass of NSMutableArray can implement only the seven most rudimentary methods. All the other higher-level abstractions are built on top of them. For instance - removeAllObjects method simply iterates backwards, calling - removeObjectAtIndex: one by one. Here’s pseudo code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
// we actually know this is safe, since count is stored on 62 bits
// and casting to NSInteger will *not* overflow
NSInteger count = (NSInteger)[self count];
if (count == 0) {
	return;                          
}

count--;
do {
	[self removeObjectAtIndex:count];     
	count--;
} while (count >= 0);

However, where it makes sense __NSArrayM does reimplement some of its superclasses’ methods. For instance, while NSArray provides the default implementation for - countByEnumeratingWithState:objects:count: method of NSFastEnumeration protocol, __NSArrayM has its own code-path as well. Knowing its internal storage __NSArrayM can provide much more efficient implementation.

Foundations

I’ve always had this idea of Foundation being a thin wrapper on CoreFoundation. My argument was simple – there is no need to reinvent the wheel with brand new implementations of NS* classes when the CF* counterparts are available. I was shocked to realize neither NSArray nor NSMutableArray have anything in common with CFArray.

CFArray

The best thing about CFArray is that it’s open sourced. This will be a very quick overview, since the source code is publicly available, eagerly waiting to be read. The most important function in CFArray is _CFArrayReplaceValues. It’s called by:

Basically, CFArray moves the memory around to accommodate the changes in the most efficient fashion, similarly to how __NSArrayM does its job. However, the CFArray does not use a circular buffer! Instead it has a larger buffer padded with zeros from both ends which makes enumeration and fetching the correct object much easier. Adding elements at either end simply eats up the remaining padding.

Final Words

Even though CFArray has to serve slightly more general purposes, I find it fascinating that its internals don’t work the same way as __NSArrayM does. While I suppose it would make sense to find common ground and make a single, canonical implementation, perhaps there are some other factors that influenced the separation.

What those two have in common? They’re the concrete implementation of abstract data type known as deque. Despite its name, NSMutableArray is an array on steroids, deprived of C-style counterpart’s drawbacks.

Personally, I’m most delighted with constant-time performance of insertion/deletion from either end. I no longer have to question myself using NSMutableArray as queue. It works perfectly fine.

Drawing Bézier Curves

A few months ago I’ve released my latest iPad app – Revolved. The core idea of Revolved is super simple – you draw curves on the right hand side of the screen and they get revolved around the axis to create a 3D model.

In this article I’m going to walk through the process of displaying Bézier curves on the iPad screen as it happens in the app. Since the line drawing engine powering Revolved is OpenGL based, presenting curves on the display is not as easy as calling some Core Graphics functions here and there.

To make the story more interesting I made a few interactive demos that should make understanding the presented concepts much easier. Here’s the small part of Revolved in your browser. This is what we’re going to build:

Note: the demos are made in HTML5 using the canvas API which (unfortunately) is not a match for the full-fledged OpenGL based rendering. While some small glitches and artifacts may occur in the browser, Revolved doesn’t suffer from those problems.

The OpenGL code powering the line drawing in Revolved is available on my GitHub. As an added bonus I’ve also published the canvas/JS version that runs the demos in this article.

Bézier Curves

I’m absolutely fascinated with Bézier curves. Not only are they ubiquitous in computer graphics, but also they’re just so much fun to play with. Dragging the control points and watching the curve wiggle is an experience on its own.

The most popular Bézier curves out there are cubic Bézier curves. They’re defined in terms of four control points. While the first and last control points specify locations of the endpoints of the drawn curve, the second and the third control points influence the tangency of the curve.

If you’re interested in mastering Bézier curves, I highly recommend diving into A Primer on Bézier Curves, which is the resource on the matter. However, this article doesn’t require any meaningful background apart from intuitive understanding of how the curves work.

Having decided to utilize the curves in Revolved, I’ve faced the challenge of rendering smooth cubic Bézier curves on the GPU, using the OpenGL primitives.

Subdivisions

One of the easiest way to approximate any curvy shape is to split it into line segments. We’re going to divide a Bézier curve into a set of connected linear components, with the intention that large enough number of segments will look sufficiently smooth for the user. The most important problem we have to solve is finding the connection points for those segments.

Fortunately, Bézier curves are parametric that is their x and y components are defined in terms of a parameter t, which varies from 0 to 1. For the value of t equal to 0, we land on the first end point of the curve and for the value of t equal to 1, we land on the last end point of the curve. For in-between values of t we get some points that lie on the curve in-between those end points.

Bézier curve parametrization

Bézier curve parametrization

To approximate a shape of a Bézier curve using 2 line segments, calculate the position of 3 connection points for t equal to 0, 0.5 and 1. Using 10 segments requires 11 points for t values of 0, 0.1, 0.2, …, 0.9 and 1.0 respectively.

Subdivision in Action

The little simulator below depicts the method. Try dragging both end points and control points. You can adjust the number of segments with a slider. The faint gray line is the exact (hopefully) Bézier curve drawn with HTML5 canvas primitives.

A very nice feature of Bézier curves is that the subdivision points accumulate near areas of large curvature. Try making a V-shaped curve and notice how little red knots tend to stick together near the tip, they don’t divide the curve into equal-length segments. Moreover, even for a small number of linear segments the drawn shape is not far off from the exact curve.

Drawing Lines

Even though OpenGL does support line drawing, I didn’t want to use this feature for two reasons. First of all, the line width can only be set once per draw call. Animating curves’ thicknesses would require separating the drawing commands into different batches thus potentially requiring multiple draw calls to render all the segments. This is unnecessarily complex.

The most important factor is that the lines rendered by OpenGL are plain wrong:

This is actual slanted line drawn by OpenGL

This is actual slanted line drawn by OpenGL

The “algorithm” seems to clone pixels in either horizontal or vertical direction, making lines look like parallelograms instead of rotated rectangles. Needless to say, this doesn’t fulfill my quality requirements. While simple line drawing would work for very thin lines, I wanted the curves in Revolved to be bold and prominently presented on the screen.

Adding Width

Having nailed approximating the shape of Bézier curves with line segments, we should focus on making them wider. Rejecting OpenGL line drawing method forces us to jump into the usual GPU suspect – a triangle. A single line segment will consist of two triangles that will form a quadrilateral. We just have to calculate 4 corner points for a given segment and connect them with edges:

  1. Start with a linear segment.
  2. Extend it in a perpendicular direction.
  3. Create vertices at new endpoints.
  4. Connect them to create two triangles.

While this high-level overview sounds relatively simple, the devil, as usual, is in the details.

The Perpendicular

Given a vector, finding a vector perpendicular to it is one of the easiest trick in vector math. Just exchange the coordinates and negate one of them. That’s it:

Perpendicular vector has its coordinates flipped and one of them is negated

Perpendicular vector has its coordinates flipped and one of them is negated

Having the perpendicular direction makes it easy to calculate the positions of vertices that will get further used for triangles rendering.

The Bad Tangent

We’re just one step short of rendering the Bézier curve with some width. Unfortunately, making the straight line segments wider by applying the above-mentioned perpendicularity wouldn’t work well. Try making V-shape again and notice the discontinuities:

The problem is we’re treating the linear segments as independent entities, forgetting that they are part of the curve.

Two rectangular segments create cracks

Two rectangular segments create cracks

While we could somehow merge the vertices in the midpoint, this wouldn’t be perfect. Since Bézier curves are defined in pristine mathematical terms we can get the vertex position much more precisely.

The Good Tangent

What’s wrong in the above approach is that we’re ignoring the fact that a curve is, well, a curve. As mentioned before, to get direction perpendicular to the direction of the curve we have to calculate the actual local direction of the curve itself. In other words, if we were to draw an arrow at the given point of the curve showing its local direction, which way should the arrow point? What’s the tangent at this point?

A tangent vector

A tangent vector

Derivatives come to the rescue. Since a Bézier curve is parametrized in terms of t in a form of x(t) and y(t), we can simple derive the equations with respect to t. I won’t bother you with the equations, you can look up the exact solution directly in the source code.

With the tangent vector in hand it’s just the matter of generating the correct positions of vertices:

Correct segments have their vertices connected

Correct segments have their vertices connected

Having derived the exact direction of a curve at a given point, we can generate the exact perpendicular direction finally receiving the exact position of the vertex. Awesome!

Notice that for small number of segments the tight curves look really weird. This is actually easy to explain – for sufficiently long segments, the tangent vector may vary greatly, thus the surplus width may be added in completely different directions. However, when the segments are small enough it’s really difficult for the tangent vector to mess things up.

Auto-Segmentation

At this point we’ve almost recreated the line drawing technique used in Revolved. For the final step we’re not going to add anything new to our drawing routine. Instead, we’re going to reduce the interaction complexity. We’re going to get rid of the slider.

The purpose of the slider is to define how many segments should the curve be split into. We don’t want to stick in a single large value and call it a day. Rendering a few dozen triangles for even the shortest segment is surely a waste. Instead, we’re going to figure out a way to automatically calculate the number of required segments.

Estimating Length

If you look closely at a Bézier curve you’ll notice that its shape always resembles the shape of line connecting all four control points.

Curve&rsquo;s shape resembles the line connecting control points

Curve’s shape resembles the line connecting control points

The length of the curve is never longer than the length of the connecting polyline. Calculating the total length of AB, BC, and CD segments we derive an upper bound for curve’s length:

float estimatedLength = (GLKVector2Length(GLKVector2Subtract(b, a)) +
                         GLKVector2Length(GLKVector2Subtract(c, b)) +
                         GLKVector2Length(GLKVector2Subtract(d, c)));

As you might have noticed, I’m using Apple’s GLKit for the vector operations. While most parts of GLKit quickly become too primitive for any serious OpenGL programming, the math part of GLKit is wonderful. The only real drawback is its verbosity, but this is something most Cocoa devs should be used to.

Estimating number of segments

Having calculated the estimated length it’s just a matter of getting the number of segments required to display a curve. While we could simply divide the estimated length and take the ceil of the value, it would mean that for very short curves we could end up with only one or two segments. Instead, we should opt for slightly biased function that will boost smaller input values. We’re going to pick hyperbola – it’s just so awesome:

Hyperbola

Hyperbola

For small input values hyperbola bumps the output value, for large input values the function is asymptotically similar to plain old linear function. A few minutes spend on parameters tweaking crafts the final solution. Notice how the number of knots changes when the curve gets longer or shorter:

Block Based Subdivider

While writing Revolved I’ve found yet another great application for Objective-C blocks. Instead of passing the model object through the rendering system, I’ve opted for using a subdivider block. Given a value of t (the same one that parametrizes the curve), it returns the SegmentSubdivision which is a struct that defines both position and perpendicular vector at a given point of the curve. Naturally, the block is a typedef. I don’t seem to be the only one who has problems remembering block syntax.

typedef struct SegmentSubdivision {
    GLKVector2 p;
    GLKVector2 n;
} SegmentSubdivision;

typedef SegmentSubdivision (^SegmentDivider)(float t);

The line mesh generator can then utilize the passed block to generate the vertex positions:

for (int seg = 0; seg <= segmentsCount; seg++) {
	
	SegmentSubdivision subdivision = subdivider((double)seg/(double)segmentsCount);
	v[0] = GLKVector2Add(subdivision.p, GLKVector2MultiplyScalar(subdivision.n, +_lineSize /2.0f));
	v[1] = GLKVector2Add(subdivision.p, GLKVector2MultiplyScalar(subdivision.n, -_lineSize /2.0f));
	...	
}

Why not Core Graphics?

Having read all that you might be wondering: why bother with OpenGL drawing at all? The obvious choice for any seasoned iOS developer would be enrolling Core Graphics and drawing the curves using its functions. Let’s discuss the options I could have used.

For the stage setting purpose I should note that Revolved is currently limited to 60 curves and the drawing area of the app has dimensions of 448x768 points (896x1536 pixels on a retina device).

A Single Layer for all the Curves

The first Quartz-based solution is quite obvious – a single view with custom - drawRect: implementation. Whenever user drags anything, the entire view would have to be redrawn. Segment deletion? Redraw the entire thing. A single animation? Redraw the entire thing, 60 times per second.

Good luck pulling off redrawing that many segments that fast and that often. Did I mention that the rasterization of Core Graphics calls happens on the CPU, not on the GPU (Apple Dev account required)?

One Layer per Curve

A single view is definitely not a good idea. Alternatively I could dedicate a single view per segment.

First of all, let’s consider the memory requirements. Each pixel has four 8-bits components (RGBA). Storing the data drawn in - drawRect: for each curve would take a total of 60*896*1536*4 bytes, that is 315 MB! Even though modern iPads are capable of handling those huge RAM demands, we still have to draw a 3D model. Adding space for OpenGL buffers doesn’t really help.

Secondly, the overdraw is huge. Since layers have to be transparent, every pixel on the right side of the screen is overdrawn up to 60 times. In total, the GPU has to push over 26 full-screens worth of pixels, at 60 frames per second. This is not something you want in your app.

Optimizations

I tested these approaches on the latest & greatest iPad Air and the stuttering was absolutely abysmal.

The performance of both methods could be improved by implementing various optimizations, mostly in terms of keeping the redrawn areas small. This would require huge amounts of code that would dynamically resize the views, clip the Bézier paths and do other crazy tricks, just with an expectation of making things better. All those efforts would still become worthless for curves covering the entirety of the drawing area.

Drawing dozens of animated curves, as it exists in Revolved, is a case in which the abstractions starts to leak, forcing the programmer to fight against the clean, high-level API to do the work that could be done on a lower level much more efficiently.

Throwing Stuff at GPU

Revolved features 3D models, so it was obvious that some parts of the app would rely on OpenGL. Generating models’ meshes required converting the abstract, mathematical notion of Bézier curves into concrete set of vertices and faces. This got me thinking – if I already have to go through the trouble of transforming the Béziers into the vertex representation of 3D models, why not to enroll the same process for 2D lines? This was an important reason for going the GPU way.

The crucial factor for choosing OpenGL is that it made the animation and interaction possibilities endless. Some of the line effects I’ve implemented and/or had in mind would simply be next to impossible to craft using Core Graphics.

Final Words

Cushioned with easy to use, high level API one often ignores the underlying concepts and nitty-gritty details of how stuff works. Consciously depriving myself of Core Graphics conveniences forced me to think about drawing Bézier curves in terms of “how?”. Understanding the mathematical foundations guiding the drawn shapes is something worth going through, even just for the sake of curiosity.

The presented method is not the state of the art. There are actually more complex and robust methods of drawing the curves, but the solution implemented in Revolved fits the purpose completely (the fact that I found the linked resource just a few days ago certainly didn’t help).

As for the rendering implementation, OpenGL was the right choice. While some less important elements of Revolved like buttons, tutorial and credits panels are UIKit based, the essential part of the application runs entirely in OpenGL. This is still not a full Brichter, but writing an app this way was an absolute joy.

If you get bored dissecting the Bézier curves I encourage you to try playing with them, this time in 3D, on your iPad.