Low Level Etude One – Hello Worlds (Part 2)
Hello World - puts or printf?
bl - branch with link
Let's get back on track and learn about bl. Consider the following simple program:
.globl      _start
.p2align    2
say_hello:
            mov x0, #1
            adrp x1, msg@PAGE
            add x1, x1, msg@PAGEOFF
            adrp x2, msg_sz@PAGE
            add x2, x2, msg_sz@PAGEOFF
            ldr x2, [x2]
            mov x16, #4
            svc 0x80
_start:
            b say_hello
            mov x16, #1
            svc 0x80
.data
msg:        .asciz  "Hello World!"
msg_sz:     .word   .-msg-1
If you build and start this it will print Hello World forever. The b - branch instruction will jump to say_hello and continue execution after say_hello with the next line which is the same branch instruction, thus repeating forever.
So we need to change b to bl - branch with link and at the end of the say_hello block we add a ret instruction. Now the execution will continue right after the bl say_hello instruction. This happens because bl saves the address of the next instruction into the lr register and ret jumps to the address saved in the lr register.
But! What if we override the lr registers content with another bl instruction? Let's add the following:
.globl      _start
.p2align    2
print_newline:
            mov x0, #1
            adrp x1, newline@PAGE
            add x1, x1, newline@PAGEOFF
            mov x2, #1
            mov x16, #4
            svc 0x80
            ret
say_hello:
            mov x0, #1
            adrp x1, msg@PAGE
            add x1, x1, msg@PAGEOFF
            adrp x2, msg_sz@PAGE
            add x2, x2, msg_sz@PAGEOFF
            ldr x2, [x2]
            mov x16, #4
            svc 0x80
            bl print_newline
            ret
_start:
            bl say_hello
            mov x16, #1
            svc 0x80
.data
msg:        .asciz  "Hello World!"
msg_sz:     .word   .-msg-1
.align 4
newline:    .asciz  "\n"
Can you already see the problem? With bl print_newline we save another address to the lr register and overwrite what was already saved. So once we call ret from print_newline we'll fall on the ret instruction at the end of say_hello which is another ret statement that will jump to that very location, again. So we're in an endless loop.
The easy fix is to just save the content of the lr register before the bl instruction and restore if before the ret instruction:
say_hello:
            mov x0, #1
            adrp x1, msg@PAGE
            add x1, x1, msg@PAGEOFF
            adrp x2, msg_sz@PAGE
            add x2, x2, msg_sz@PAGEOFF
            ldr x2, [x2]
            mov x16, #4
            svc 0x80
            mov x3, lr
            bl print_newline
            mov lr, x3
            ret
Beware that this will only work if the code that we jump into will not fiddle with the x3 register that we used to save the content of the lr register.
Function call convention
So what is the right way to make a proper function call in arm64 assembler? Stephens Book has a nice summary:
For the calling routine:
- Save registers 
x0 - x18if you use them. - Move the first eight parameters into the registers 
x0 - x7. Functions with varadic parameters might be handled differently, we'll come to that - Push additional parameters on the stack.
 - Use 
blto call the function. - Evalute the return code in 
x0. - Restore 
x0 - x18, if needed. 
For the called function:
- Push 
lrandx19 - x30onto the stack if used in the routine. - Do the work.
 - Put return code in 
x0 - Pop 
lrandx19 - x30if pushed in step 1. - Use 
retinstruction. 
So that's no quite what we have been doing. Let's double check what clang did for us, again:
.LC0:
        .string "Hello World!"
main:
        stp     x29, x30, [sp, -16]!
        mov     x29, sp
        adrp    x0, .LC0
        add     x0, x0, :lo12:.LC0
        bl      puts
        ldp     x29, x30, [sp], 16
        ret
The first line stores a pair (st ore p air) of registers on the stack after subtracting 16 from sp. sp is the stack pointer that holds the currect position of the stack. Since the stack grows in negative direction subtracting 16 makes room to save the contents of two registers.
Let's check what the two registers are. Start lldb two and enter b start and r. Now type re r.
General Purpose Registers:
        x0 = 0x0000000000000001
        x1 = 0x000000016fdff618
        x2 = 0x000000016fdff628
        x3 = 0x000000016fdff778
        [...]
       x27 = 0x0000000000000000
       x28 = 0x0000000000000000
        fp = 0x000000016fdff5f0
        lr = 0x000000010000d08c  dyld`start + 520
        sp = 0x000000016fdff4b0
        pc = 0x0000000100003f8c  zero`start
      cpsr = 0x60001000
So x29 is the frame pointer and x30 is the link register. Let's explore this with a minimal sample program:
.globl      _start
.p2align    2
_start:
            stp fp, lr, [sp, -16]!
            mov fp, sp
            ; work
            ldp fp, lr, [sp], 16
            ret
Build the program and start it in the debugger breaking on start. Let's explore the stack:
(lldb) re r sp
      sp = 0x000000016fdff460
(lldb) m read 0x000000016fdff460
0x16fdff460: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
0x16fdff470: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Execute the first instruction and check again:
(lldb) re r sp
      sp = 0x000000016fdff450
(lldb) m read 0x000000016fdff450
0x16fdff450: a0 f5 df 6f 01 00 00 00 8c d0 00 00 01 00 00 00  ...o............
0x16fdff460: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Let's check if that looks like it should:
(lldb) re r fp
      fp = 0x000000016fdff5a0
(lldb) re r lr
      lr = 0x000000010000d08c  dyld`start + 520
Looks pretty good. Indeed the stack pointer got decremented by 16 bytes making room for the two 8 byte values saved in fp and lr and then they got pushed onto the stack. (In little endian byte order!) Next we move the fp - frame pointer to the new position of the stack pointer so that the called function can construct a stack frame that can hold its local variables if needed. After the work is done we pop back fp and lr and are safe to call ret.
So let's rewrite our last hello world example a little bit:
say_hello:
            stp fp, lr, [sp, -16]!
            mov fp, sp
            mov x0, #1
            adrp x1, msg@PAGE
            add x1, x1, msg@PAGEOFF
            adrp x2, msg_sz@PAGE
            add x2, x2, msg_sz@PAGEOFF
            ldr x2, [x2]
            mov x16, #4
            svc 0x80
            bl print_newline
            
            ldp fp, lr, [sp], 16
            ret
So far so good!
Variadic parameters
Now let's write a printf driven Hello World program. Since printf uses variadic parameters we cannot use the registers x1 - x7 for all but the first parameter. The call convention simply differs. The variadic parameters go on the stack. Let's see how this is done:
.globl            _start
.p2align          2
                  .equ variadic_param_1, 0
say_hello:
                  stp fp, lr, [sp, #-16]!
                  sub sp, sp, #16
                  mov fp, sp
                  adrp x0, format_str@PAGE
                  add x0, x0, format_str@PAGEOFF
                  adrp x1, msg@PAGE
                  add x1, x1, msg@PAGEOFF
                  str x1, [fp, #variadic_param_1]
                  bl _printf
                  add sp, sp, #16
                  ldp fp, lr, [sp], #16
                  ret
_start:
                  stp fp, lr, [sp, #-16]!
                  mov fp, sp
                  bl say_hello
                  ldp fp, lr, [sp], #16
                  ret
format_str:       .asciz      "%s\n"
.data
msg:              .asciz      "Hello World!"
The equ directive gives a symbolic name to a numeric constant. We will reserve some space on the stack for the variadic parameters, and the first one will go in the first bucket, hence the 0 offset. After storing fp and lr onto the stack we move the stack pointer and our frame pointer 16 bytes further. This will give us room for 2 64 bit values. We only need one, but the sp needs to be 16 byte aligned on Dariwn. After we loaded the address of msg into the x1 register we can save it to our stack-frame (which can hold 2 64bit values). Since that is where the stack-pointer points to, that's also where printf will be looking for it's first variadic parameter if the format string requires it.
You can play with a second variadic parameter and make another symbolic name: .equ variadic_param_2, 8 or just store the second value to our stack frame using: str reg, [fp, #8] instead of str reg, [fp, #variadic_param_2].