x86-64 Assembly

In this lecture, we briefly overview x86-64 Assembly. A brief reference is found here (from Standford University).

x86-64/32 Registers

64-bit Register Name	Rnn Name	Name (Description)	32-bit register name
RAX	R0	Accumulator (Return value)	EAX
RCX	R1	Counter (frequently used as `i` in iterations/loops)	ECX
RDX	R2	Data (frequently used as the 3rd Argument)	EDX
RBX	R3	Base (frequently used as base address or counter)	EBX
RSP	R4	Stack Pointer (keep track of top of CPU stack)	ESP
RBP	R5	Base Pointer (keep track of bottom of CPU stack)	EBP
RSI	R6	Source Index, 2nd Argument	ESI
RDI	R7	Destination Index, 1st Argument	EDI
R8-R15	R8-R15	Additional general-purpose registers only avail. in 64-bit

In addition, recall that RIP (the instruction register) contains the address of the instruction to be executed.

Hello World!

Consider the following simple C program hello.c.


#include <stdio.h>

int main()
{
  printf("Hello World!\n");
  return 0x1234;
}

We will use GDB to see how the assembly code for the above code looks like, through which to try to understand the x86-64.

Remember to compile the code with -g option to enable GDB debugging.

gcc -g hello.c -o hello

GDB

As you see below, you can use list, break, run commands to pause the execution right before line 5.

The disassemble command shows the actualy assembly code.

lea (load effective address)

We will start with an easy one: lea at <+4>.

lea    0x9f(%rip),%rdi

This works as follows:

When the instruction is executed, $rip will contain the address of the next instruction <+11>, which means 0x8000645.
So, rip+0x9f is evaluated to 0x8000645 + 0x9f = 0x80006e4.
So, rdi will contain 0x80006e4. As the hd command shows, at this address, the string "Hello World!" resides.

Overall, this instruction sets the rdi to contain the address of "Hello World!".

callq

Now let's consider the next instruction.

callq   0x8000510 <puts@plt>

This instruction makes $rip jump to the code at 0x8000510.

Important. The callq instruction also pushes the return addressto the stack, before jumping to the target function address.

GDB kindly tells us 0x800510 is where function puts@plt is.

puts is a standard C function that prints a string on the screen.
plt stands for Procedure Linkage Table. This technique is used to call external procedures/functions whose address isn't known in the time of linking, and is left to be resolved by the dynamic linker at run time.
The function you're calling is located in another module (typically, libc.so.x), therefore the actual address of the function must be provided when the program is loaded for execution.

$ gdb hello
...
(gdb) list
1       #include <stdio.h>
2
3       int main()
4       {
5         printf("Hello World!\n");
6         return 0x1234;
7       }
(gdb) break 5
Breakpoint 1 at 0x63e: file hello.c, line 5.
(gdb) run
Starting program: .../hello

Breakpoint 1, main () at hello.c:5
5         printf("Hello World!\n");
(gdb) disassemble
Dump of assembler code for function main:
   0x000000000800063a <+0>:     push   %rbp
   0x000000000800063b <+1>:     mov    %rsp,%rbp
=> 0x000000000800063e <+4>:     lea    0x9f(%rip),%rdi    # 0x80006e4
   0x0000000008000645 <+11>:    callq  0x8000510 <puts@plt>
   0x000000000800064a <+16>:    mov    $0x1234,%eax
   0x000000000800064f <+21>:    pop    %rbp
   0x0000000008000650 <+22>:    retq
End of assembler dump.
(gdb) hd 0x80006e4 40
0x80006e4: 48 65 6C 6C       H e l l
0x80006e8: 6F 20 57 6F       o . W o
0x80006ec: 72 6C 64 21       r l d !
0x80006f0: 00 00 00 00       . . . .
0x80006f4: 01 1B 03 3B       . . . ;
0x80006f8: 38 00 00 00       8 . . .
0x80006fc: 06 00 00 00       . . . .
0x8000700: 0C FE FF FF       . . . .
0x8000704: 84 00 00 00       . . . .
0x8000708: 2C FE FF FF       , . . .
(gdb) quit

Overall, the lea and call instructions to gether implement the C code printf("Hello World!\n");. As you probably guessed, puts function refers to the register edi to pull up the string argument "Hello World!\n".

The two instructions before lea and call

Now, let's look at the two instructions on the top.

push

The push instruction takes the content of a given register as input (in this case, rbp) and pushes it into the stack.

To see better what's going on, we get set a breakpoint at the address of the first instruction (i.e., 0x8000063a).

Note: If you want to set a breakpoint at an address, you have to add * in front of the address (see the right).

(gdb) break *0x800063a
Breakpoint 3 at 0x800063a: file hello.c, line 4.
(gdb) c
Continuing.
Hello World!
[Inferior 1 (process 185) exited with code 064]
(gdb) run
Starting program: ../hello

Breakpoint 3, main () at hello.c:4
4       {
(gdb) disassemble
Dump of assembler code for function main:
=> 0x000000000800063a <+0>:     push   %rbp
   0x000000000800063b <+1>:     mov    %rsp,%rbp
   0x000000000800063e <+4>:     lea    0x9f(%rip),%rdi   # 0x80006e4
   0x0000000008000645 <+11>:    callq  0x8000510 <puts@plt>
   0x000000000800064a <+16>:    mov    $0x1234,%eax
   0x000000000800064f <+21>:    pop    %rbp
   0x0000000008000650 <+22>:    retq
End of assembler dump.

(gdb) p $rbp
$3 = (void *) 0x8000660 <__libc_csu_init>
(gdb) hd $rsp 40
0x7ffffffee048: F7 1B 02 FF       . . . .
0x7ffffffee04c: FF 7F 00 00       . . . .
0x7ffffffee050: 01 00 00 00       . . . .
0x7ffffffee054: 00 00 00 00       . . . .
0x7ffffffee058: 28 E1 FE FF       ( . . .
0x7ffffffee05c: FF 7F 00 00       . . . .
0x7ffffffee060: 00 80 00 00       . . . .
0x7ffffffee064: 01 00 00 00       . . . .
0x7ffffffee068: 3A 06 00 08       : . . .
0x7ffffffee06c: 00 00 00 00       . . . .
(gdb) p $rsp
$4 = (void *) 0x7ffffffee048

On the left, you can see the value of rbp, and and the stack contents. Note that the stack starts at 0x7f...48, which is also the value rsp.

On the right, the GDB command stepi executes one assembly instruction, which is the push instruction.

Note that the hexdump of the stack shows that $rbp is pushed on the top of the stack. This also changed $rsp to be 0x7ff...40.

(gdb) stepi
0x000000000800063b      4       {
(gdb) hd $rsp 40
0x7ffffffee040: 60 06 00 08       ` . . .
0x7ffffffee044: 00 00 00 00       . . . .
0x7ffffffee048: F7 1B 02 FF       . . . .
0x7ffffffee04c: FF 7F 00 00       . . . .
0x7ffffffee050: 01 00 00 00       . . . .
0x7ffffffee054: 00 00 00 00       . . . .
0x7ffffffee058: 28 E1 FE FF       ( . . .
0x7ffffffee05c: FF 7F 00 00       . . . .
0x7ffffffee060: 00 80 00 00       . . . .
0x7ffffffee064: 01 00 00 00       . . . .
(gdb) p $rsp
$5 = (void *) 0x7ffffffee040

mov

The next instruction is as follows:

mov    %rsp,%rbp

This instruction moves $rbp ← $rsp. See the GDB log below to see how $rbp changed.

(gdb) stepi

Breakpoint 1, main () at hello.c:5
5         printf("Hello World!\n");
(gdb) p $rbp
$6 = (void *) 0x7ffffffee040
(gdb) p $rsp
$7 = (void *) 0x7ffffffee040

Overall, what's going on? Setting up a new stack frame

In essense, right before the acutal code of the main function is executed, the above two instructions set up a new stack frame for the main function.

Recall that $rbp is the address the bottom of the stack.
$rbp ← $rsp means that the top of the stack of the old stack frame ($rsp) becomes the bottom of the stack of the new stack frame.
When the function is finished and returned, the program should clean up the stack frame and restore $rbp to have the old bottom of the stack. For this, $rbp was pushed on to the stack to be recovered right before returning.

The three instructions after lea and call

Breakpoint 5, main () at hello.c:6
6         return 0x1234;
(gdb) disassemble
Dump of assembler code for function main:
   0x000000000800063a <+0>:     push   %rbp
   0x000000000800063b <+1>:     mov    %rsp,%rbp
   0x000000000800063e <+4>:     lea    0x9f(%rip),%rdi    # 0x80006e4
   0x0000000008000645 <+11>:    callq  0x8000510 <puts@plt>
=> 0x000000000800064a <+16>:    mov    $0x1234,%eax
   0x000000000800064f <+21>:    pop    %rbp
   0x0000000008000650 <+22>:    retq
End of assembler dump.

mov

mov    $0x1234,%eax

This instruction moves the value 0x1234 into register eax (32-bit portion of rax). See the GDB log before and after executing the instrucion.

(gdb) p $eax
$9 = 13
(gdb) stepi
7       }
(gdb) p $eax
$10 = 4660

pop

pop    rbp

This pop instruction

Read the top of the stack and move it to register rbp.
pop the stack.

(gdb) p $rbp
$11 = (void *) 0x7ffffffee040
(gdb) stepi
0x0000000008000650      7       }
(gdb) p $rbp
$12 = (void *) 0x8000660 <__libc_csu_init>
(gdb) p $rsp
$13 = (void *) 0x7ffffffee048

retq

This instruction is used to return to the caller.

The retq instruction pops the return address from the stack into %rip, thus resuming at the saved return address.

What's going on?

These three instructions implement the C code: return 0x1234;. In particular:

Set the return value 0x1234 by storing it in eax.
Clean-up the stack frame for main by recovering rbp to contain the old one (recall it was pushed to the stack).