__NSDictionaryI objectForKey:
This is a companion article for my blog post on NSDictionary
. It contains the disassembly analysis for objectForKey:
method of __NSDictionaryI
class. The disassembly was generated using Hopper and it’s based on iOS 7.1 SDK targeting ARM64 architecture.
The Code
Here’s what we’re going to digest:
0x14e90 stp x29, x30, [sp, #0xfffffff0]!
0x14e94 mov x29, sp
0x14e98 stp x20, x19, [sp, #0xfffffff0]!
0x14e9c stp x22, x21, [sp, #0xfffffff0]!
0x14ea0 stp x24, x23, [sp, #0xfffffff0]!
0x14ea4 mov x19, x2
0x14ea8 mov x8, x0
0x14eac movz x0, #0x0
0x14eb0 adrp x9, #0x1d1000
0x14eb4 ldrsw x9, [x9, #0x60]
0x14eb8 ldrb w9, [x8, x9]
0x14ebc lsr w21, w9, #0x2
0x14ec0 cbz w21, 0x14f80
0x14ec4 mov x0, x8
0x14ec8 bl imp___stubs__object_getIndexedIvars
0x14ecc mov x20, x0
0x14ed0 adrp x8, #0x1ad000
0x14ed4 add x8, x8, #0x768
0x14ed8 ldr x1, [x8]
0x14edc mov x0, x19
0x14ee0 bl imp___stubs__objc_msgSend
0x14ee4 mov x8, x0
0x14ee8 movz x0, #0x0
0x14eec and w9, w21, #0xff
0x14ef0 cmp w9, #0x3f
0x14ef4 b.eq 0x14f80
0x14ef8 movz x23, #0x0
0x14efc adrp x9, #0x156000
0x14f00 add x9, x9, #0xf38
0x14f04 add x9, x9, w21, uxtb #3
0x14f08 ldr x22, [x9]
0x14f0c udiv x9, x8, x22
0x14f10 msub x24, x9, x22, x8
0x14f14 adrp x8, #0x1ad000
0x14f18 add x8, x8, #0x770
0x14f1c ldr x21, [x8]
0x14f20 add x8, x20, x24, lsl #4
0x14f24 ldr x0, [x8]
0x14f28 cmp x0, #0x0
0x14f2c ccmp x0, x19, #0x4, ne
0x14f30 b.eq 0x14f68
0x14f34 mov x1, x21
0x14f38 mov x2, x19
0x14f3c bl imp___stubs__objc_msgSend
0x14f40 tbnz w0, #0x0, 0x14f68
0x14f44 movz x0, #0x0
0x14f48 add x8, x24, #0x1
0x14f4c cmp x8, x22
0x14f50 csel x9, xzr, x22, lo
0x14f54 sub x24, x8, x9
0x14f58 add x23, x23, #0x1
0x14f5c cmp x23, x22
0x14f60 b.lo 0x14f20
0x14f64 b 0x14f80
0x14f68 movz x0, #0x0
0x14f6c lsl x8, x24, #0x1
0x14f70 cmp x8, x22, lsl #1
0x14f74 b.hs 0x14f80
0x14f78 orr x8, x8, #0x1
0x14f7c ldr x0, [x20, x8, lsl #3]
0x14f80 ldp x24, x23, [sp], #0x10
0x14f84 ldp x22, x21, [sp], #0x10
0x14f88 ldp x20, x19, [sp], #0x10
0x14f8c ldp x29, x30, [sp], #0x10
0x14f90 ret
This is quite a lot, but we’re going to take it chunk by chunk and I will more or less describe what each instruction does.
The Setup
We begin the procedure by with ARM64 function prolog. We’re also storing x19
, x20
, x21
, x22
, x23
and x24
registers on the stack. Those are callee-saved registers and since the function is going to make use of those, we have to store their previous value, thus fulfilling caller’s expectations of those register maintaining their value:
0x14e90 stp x29, x30, [sp, #0xfffffff0]!
0x14e94 mov x29, sp
0x14e98 stp x20, x19, [sp, #0xfffffff0]!
0x14e9c stp x22, x21, [sp, #0xfffffff0]!
0x14ea0 stp x24, x23, [sp, #0xfffffff0]!
When the function is called, the x0
register contains self
pointer and x2
register has a pointer to key
. Since we’re going to make a few calls from the function itself, and those registers are used for arguments passing, we need to keep their current value aside. We move the values from x2
and x0
registers to x19
and x8
register respectively:
0x14ea4 mov x19, x2
0x14ea8 mov x8, x0
We clear out the x0
register, effectively making it store nil
value:
0x14eac movz x0, #0x0
Fetching Size Index
As it’s been discussed previously, the next three instructions simply fetch the one byte contents of _szidx
ivar into w9
register:
0x14eb0 adrp x9, #0x1d1000
0x14eb4 ldrsw x9, [x9, #0x60]
0x14eb8 ldrb w9, [x8, x9]
Notice that we’ve fetched 8 bits of memory, but _szidx
is defined as 6-bits bitfield. We have to shift the w9
register 2 bits to the right, discarding the unwanted bits, and storing the result into w21
register:
0x14ebc lsr w21, w9, #0x2
At this point we can compare whether the value of _szidx
is equal to 0
, if so, we jump to instruction at offset of 0x14f80
, which is a function epilog. At this point we still have 0
in x0
so the function will return nil
:
0x14ec0 cbz w21, 0x14f80
As we will see very soon, it’s rare for _szidx
to be equal to zero, so we can safely keep interpreting instructions one by one.
Dynamic Linking
We restore the self
pointer kept in x8
back into x0
. It will be passed as the first argument to the imp___stubs__object_getIndexedIvars
function. After the function returns its result is kept in x0
, but since this register will be used heavily, it is moved into the x20
:
0x14ec4 mov x0, x8
0x14ec8 bl imp___stubs__object_getIndexedIvars
0x14ecc mov x20, x0
One might wonder why we’re calling the imp___stubs__object_getIndexedIvars
function instead of object_getIndexedIvars
. It all boils down to dynamic linking. At the point of compilation the compiler merely knows that when the program is run, there will be an object_getIndexedIvars
function available in the system. It creates a very short stub function that fetches a memory address and jumps to it.
On the first run this memory address points to a helper function that will run the dyld machinery to figure out where the function is and then the memory address gets overwritten to point to the correct location of object_getIndexedIvars
. This process is called lazy binding and provides very fast booting (since the application doesn’t have to link anything that hasn’t been yet needed) at the cost of very slight call overhead.
The Hash
Another round of program counter relative memory fetching. This time the hash
selector lands in x1
:
0x14ed0 adrp x8, #0x1ad000
0x14ed4 add x8, x8, #0x768
0x14ed8 ldr x1, [x8]
We’re moving back the key
pointer from x19
into x0
and with hash
selector in x1
we’re calling everyone’s favorite objc_msgSend
(through a stub), then we move the hash
’s return value from x0
to x8
:
0x14edc mov x0, x19
0x14ee0 bl imp___stubs__objc_msgSend
0x14ee4 mov x8, x0
Using pseudo C we might summarize those few instructions with:
x8 = [key hash];
We conveniently put nil
into the x0
register:
0x14ee8 movz x0, #0x0
I’m not entirely sure why it’s needed, but we clear all but last 8 bits of w21
(which contains the value of _szidx
) by binary ANDing it with 0xff
. Then we compare the value with 0x3f
. In case the values are equal, we jump to the function epilog, returning a nil
value stored previously in x0
register:
0x14eec and w9, w21, #0xff
0x14ef0 cmp w9, #0x3f
0x14ef4 b.eq 0x14f80
Very soon we’ll use x23
register as a loop counter, so it makes sense to zero it out:
0x14ef8 movz x23, #0x0
A quick fetch of some address into x9
:
0x14efc adrp x9, #0x156000
0x14f00 add x9, x9, #0xf38
What’s at 0x156f38?
Hopper comes to help:
___NSDictionarySizes:
0000000000156f38 db 0x00 ;
0000000000156f39 db 0x00 ;
...
I’ve explained what __NSDictionarySizes
contains in the main article, but for the purpose of assembly analysis suffice it to say, it’s an array of some values.
With _szidx
still in w21
, we left-shift it by 3, which is the same as multiplying by 8 which is the size of NSUInteger
on 64-bit architectures. Finally we fetch the contents of memory at that location into x22
register:
0x14f04 add x9, x9, w21, uxtb #3
0x14f08 ldr x22, [x9]
While the fetching arithmetics looks quite complicated, the C equivalent is super simple:
x22 = __NSDictionarySizes[_szidx];
Division
Here comes the fun part! With return value of hash
in x8
and some kind of size in x22
we divide the former by the latter, storing the result in x9
. Then we use multiply-subtract instruction which in this case subtracts x9 * x22
from x8
storing the result in x24
:
0x14f0c udiv x9, x8, x22
0x14f10 msub x24, x9, x22, x8
What just happened? Here’s the C pseudocode:
x9 = x8 / x22;
x24 = x8 - x9 * x22;
Treating the code algebraically one might argue that the value of x24
is simply 0
, since values of x22
in both nominator and denominator cancel each other out. However, the udiv
instruction does integer division, discarding the remainder. It turns out, there is an operator in C that does the very same calculation in a more compact way:
x24 = x8 % x22;
That’s it. We’re just doing a modulo calculation in a form of hash % size
. As you might suspect, we’re checking which slot does the hash fall into. We can expect x22
to contain some kind of limit of the storage size.
The Most Important Selector
We’re just one step shy of fetching value for a key, all that’s missing is the isEqual:
selector that the next three instructions fetch into x21
register:
0x14f14 adrp x8, #0x1ad000
0x14f18 add x8, x8, #0x770
0x14f1c ldr x21, [x8]
The Loop
At this point we have the address of indexed ivars in x20
register and the modulo calculation result in x24
from now on called index. We’re going to shift it to the left by four bits. The three bit shift is responsible for making the index equal to 64-bit pointer size (8 bytes), but the one addition shift multiplies the index by two. The fetch result is stored in x0
:
0x14f20 add x8, x20, x24, lsl #4
0x14f24 ldr x0, [x8]
Here’s the conditional comparison magic of ARM64. At first we compare if the previously fetched value is equal to 0
. The conditional compare instruction works as follow: if the result of the previous operation was not equal, then the NZCV flags (negative, zero, carry, overflow) will contain the comparison result of x0
and x19
, otherwise the NZCV flags will be equal to 0x4 which is 0100 in binary. If you look at the pattern you might notice that only the Zero flag will be set. The b.eq
instruction actually checks the Z
flag jumping to the 0x14f68
address only if it’s set to 1:
0x14f28 cmp x0, #0x0
0x14f2c ccmp x0, x19, #0x4, ne
0x14f30 b.eq 0x14f68
Let’s go through this once again. The b.eq
will jump if Z
flag is set. The Z
flag will be set either when x0
is equal to 0
(cmp
instruction also sets Z
to 1 when true) or when x0
is equal to key
stored in x19
. Here’s the equivalent C pseudocode:
if (x0 == 0 || x0 == key)
goto 0x14f68;
In other words, if the value fetched somewhere from within index ivars is nil
, or it is equal to key, then we’re done. This is super important, it means that we’ve actually fetched the stored key and we’re comparing it with the key passed into objectForKey:
call.
We’re going to go into 0x14f68
shortly, but for now we can assume that the fetched key is not nil
and is not equal (as in pointer equality) to the query key, so let’s keep going. We’re getting ready for another Objective-C method call. With fetched key
in x0
(will map to self
), isEqual:
selector in x21
(will map to _cmd
) and query key in x19
(will map to object
argument), it’s just the matter of preparing x1
and x2
registers and executing the call:
0x14f34 mov x1, x21
0x14f38 mov x2, x19
0x14f3c bl imp___stubs__objc_msgSend
In other words we’re doing:
x0 = [fetchedKey isEqualTo:queryKey];
If the method call returns YES (the 0th bit is not zero), we’re jumping into the 0x14f68
.
0x14f40 tbnz w0, #0x0, 0x14f68
Here’s where we are at this point: we’ve fetched a non-nil key, that is not equal to query key in terms of both pointer and isEqual:
comparison.
We clear out x0
:
0x14f44 movz x0, #0x0
We add one to x24
(which keeps the modulo operation result) at store it into x8
:
0x14f48 add x8, x24, #0x1
We do a bounds check. If x8
is lower than the size limit kept in x22
then we store 0
(kept in xzr
zero register) into x9
, otherwise we feed it with x22
:
0x14f4c cmp x8, x22
0x14f50 csel x9, xzr, x22, lo
0x14f54 sub x24, x8, x9
The equivalent C code is less complicated:
x9 = x8 < x22 ? 0 : x22;
x24 = x8 - x9;
This is a very simple iteration that advances the index one by one, but when we hit the end it’s wrapped around back to the beginning.
At 0x14ef8
we cleared the loop counter x23
and now it finally gets used. We increment it and compare to the limit in x22
. If we haven’t yet crossed the iteration limit we jump back to the beginning of the loop:
0x14f58 add x23, x23, #0x1
0x14f5c cmp x23, x22
0x14f60 b.lo 0x14f20
This check prevents the dictionary from iterating over its contents endlessly.
Getting Past the Loop
The only way we got to this point is when we’ve iterated over all the keys stored in the indexed ivars, every key was non-nil and no key was equal to query key. We haven’t find what we’ve been looking for and at this point with 0
still in x0
register we can jump to the function prolog, returning the nil
from the function:
0x14f64 b 0x14f80
Here comes the long awaited 0x14f68
part. As a reminder, at this point the x24
register contains the index, that when multiplied by two, can be used to fetch the key that is either nil
or is equal to the query key.
Next comes the very convenient x0
clearing, we’ll see why it’s handy in a second:
0x14f68 movz x0, #0x0
We shift the fetch index to the left and store the result in x8
:
0x14f6c lsl x8, x24, #0x1
Then we compare this to 1-bit-left-shifted value of x22
register (x22
contains the storage limit). If the doubled fetch index is larger than doubled limit then we jump to the epilog at 0x14f80
:
0x14f70 cmp x8, x22, lsl #1
0x14f74 b.hs 0x14f80
This is why we needed the x0
clearing at 0x14f68
. If we succeed the test, i.e. when the fetch index is out of bounds, then we jump to the epilog at 0x14f80
, and we will safely return nil
from the function.
Assuming we are within the bounds we can finally digest the last two crucial instructions.
The Object Fetch
We’re doing a bitwise OR operation on the fetch index in x8
, but since x8
has been recently shifted to the left (multiplied by two) then this is no different than adding one to x8
. With address of indexed ivars in x20
we add it to the 3-bits-left shifted x8
(pointer size shift) and then we dereference the entire thing into x0
.
0x14f78 orr x8, x8, #0x1
0x14f7c ldr x0, [x20, x8, lsl #3]
We’ve just fetched the object corresponding to the query key into x0
.
Epilog
We’re done at this point. With the return value safely stored in x0
we can clean up the stack and return:
0x14f80 ldp x24, x23, [sp], #0x10
0x14f84 ldp x22, x21, [sp], #0x10
0x14f88 ldp x20, x19, [sp], #0x10
0x14f8c ldp x29, x30, [sp], #0x10
0x14f90 ret