Stack Smashing I

Shellcode Injection Attack

To launch a shellcode injection attack against a target program, we need to do:

Inject a shellcode: If the target program contains functions such as scanf, fread etc, we can inject data containing a shellcode.
Have the program execute the shellcode.
However, the injected shellcode lies in somewhere in the memory. The main challenge is how to change the program control so that the injected shellcode can be executed..

Changing the program control by modifying the return address

We will take advantage of the following aspect of programs to achieve the second task above:

Main Attack Idea

When the function is done, it returns to the place where the function is called.
Interestingly, this return address is stored in the memory!
This means that if we can change the return address in the memory, we can have the program return to a different place!

Therefore, we first need to know the detail about what's going on when a function is called and then returned. Specifically, we need to know where the return address is stored in the memory.

Review

rip, rbp, rsp

The register $rip is contains the address of the instruction that the program is currently running. We call sometimes call $rip the "program counter".
The register $rsp contains the address of the top of the stack, which we sometimes call "the stack pointer".
The register $rbp contains the address of the bottom of the stack to the view of the current function (i.e., the function "func_foo" in the right figure). We sometimes call it the "base pointer" or "frame pointer".

We show the memory layout in the picture on the right:

`callq` and `ret` instructions

callq fn. When a function fn is called through the callq instruction, the following takes place.
1. push %rip: Push %rip onto the stack. In other words, this instructon stores the return address to the stack.
2. jmp to fn: Jump to where fn is so that in the next CPU cycle, the function fn may be executed.
ret. When a function returns to the call via the ret instruction the following takes place
1. pop %rip. What's on top of stack is popped into %rip. Of course, what's on the stack should be the return address so that %rip contains the correct return address and in the next CPU cyle, the control gets back to where the return address is.
2. Note: we will see how we can modify this return address on the stack so that shellcode can be executed!

Sample Source Code

We will see what's going on in more detail by using the GDB. In particular, let's consider the following program:

Download: callstack.c

Compile:

 > gcc -g -fno-stack-protector -o callstack callstack.c

(gdb) l
...
9       int main()
10      {
(gdb)
11          char buf_main[] = "in main";
12          printf("%s\n", buf_main);
13          f1(10);
14          return 0;
15      }
16
17      int f1(int n)
18      {
19          char buf_f1[] = "in f1";
20          printf("%s: n= %d\n", buf_f1, n);
21          return 0;
22      }
...

Let's see how the program works

(gdb) r
Starting program: .../callstack
in main
in f1: n= 10
[Inferior 1 (process 30869) exited normally]

Let's set breakpoints to see the detailed memory dumps. In particular, we set the breakpoints right before and after the function call, and also right before returning.

// in main: right before calling f1 in main
(gdb) b 13
Breakpoint 1 at 0x400588: file callstack.c, line 13.
// in f1: right before returing to main
(gdb) b 21
Breakpoint 2 at 0x4005ca: file callstack.c, line 21.
// in main: after f1 is done
(gdb) b 14
Breakpoint 3 at 0x400592: file callstack.c, line 14.

Breakpoints 1 -> 2 -> 3: Stack and Registers

Below we show the hexdump of the stack at the moment of the above three breakpoints. To show how the stack changes, we aligned the dumps so that the same address should be shown in the same line. Notice the different fonts and back-ground colors.

Yellow background; This is about the return address.
Aqua background: This is about the %rbp for the main function.
Blue font: This is about the stack frame of the main function.
Red font: This is about the stack frame of the f1 function.
Purple font: This is about variable n in the f1 function.
Green font: This is about variable buf_1 in the f1 function.

(gdb) p &main
$1 = (int (*)()) 0x400566 <main>
(gdb) p &f1
$2 = (int (*)(int)) 0x400599 <f1>
(gdb) r
Starting program: .../callstack 
in main

Breakpoint 1, main () at callstack.c:13
13          f1(10);
(gdb) p $rip
$3 = (void (*)()) 0x400588 <main+34>
(gdb) p $rsp
$4 = (void *) 0x7fffffffe7c0
(gdb) p $rbp
$5 = (void *) 0x7fffffffe7d0








(gdb) hd $rsp $rbp-$rsp
0x7fffffffe7c0: 69 6E 20 6D       i n . m
0x7fffffffe7c4: 61 69 6E 00       a i n .
0x7fffffffe7c8: 00 00 00 00       . . . .
0x7fffffffe7cc: 00 00 00 00       . . . .
(gdb) c
Continuing.

Breakpoint 2, f1 (n=10) at callstack.c:21
21          return 0;
(gdb) p $rip
$6 = (void (*)()) 0x4005ca <f1+49>
(gdb) p $rsp
$7 = (void *) 0x7fffffffe790
(gdb) p $rbp
$8 = (void *) 0x7fffffffe7b0
(gdb) p &n
$9 = (int *) 0x7fffffffe79c
(gdb) p &buf_f1
$10 = (char (*)[6]) 0x7fffffffe7a0
(gdb) hd $rsp 64
0x7fffffffe790: C0 E7 FF FF       . . . .
0x7fffffffe794: FF 7F 00 00       . . . .
0x7fffffffe798: FA D7 A7 F7       . . . .
0x7fffffffe79c: 0A 00 00 00       . . . .
0x7fffffffe7a0: 69 6E 20 66       i n . f
0x7fffffffe7a4: 31 00 00 00       1 . . .
0x7fffffffe7a8: D0 E7 FF FF       . . . .
0x7fffffffe7ac: FF 7F 00 00       . . . .
0x7fffffffe7b0: D0 E7 FF FF       . . . .
0x7fffffffe7b4: FF 7F 00 00       . . . .
0x7fffffffe7b8: 92 05 40 00       . . @ .
0x7fffffffe7bc: 00 00 00 00       . . . .
0x7fffffffe7c0: 69 6E 20 6D       i n . m
0x7fffffffe7c4: 61 69 6E 00       a i n .
0x7fffffffe7c8: 00 00 00 00       . . . .
0x7fffffffe7cc: 00 00 00 00       . . . .
(gdb) c
Continuing.

Breakpoint 3, main () at callstack.c:14
14          return 0;
(gdb) p $rip
$11 = (void (*)()) 0x400592 <main+44>
(gdb) p $rsp
$12 = (void *) 0x7fffffffe7c0
(gdb) p $rbp
$13 = (void *) 0x7fffffffe7d0
















(gdb) hd $rsp $rbp-$rsp
0x7fffffffe7c0: 69 6E 20 6D       i n . m
0x7fffffffe7c4: 61 69 6E 00       a i n .
0x7fffffffe7c8: 00 00 00 00       . . . .
0x7fffffffe7cc: 00 00 00 00       . . . .
(gdb) c
Continuing.
[Inferior 1 (process 31033) exited normally]

Remarks

[callq instruction] The return address has been written on top of the stack.

(gdb) p $rip
$3 = (void (*)()) 0x400588 <main+34>

(gdb) hd $rsp $rbp-$rsp
0x7fffffffe7c0: 69 6E 20 6D       i n . m
0x7fffffffe7c4: 61 69 6E 00       a i n .
0x7fffffffe7c8: 00 00 00 00       . . . .
0x7fffffffe7cc: 00 00 00 00       . . . .



0x7fffffffe7b8: 92 05 40 00     . . @ .
0x7fffffffe7bc: 00 00 00 00      . . . .
0x7fffffffe7c0: 69 6E 20 6D      i n . m
0x7fffffffe7c4: 61 69 6E 00      a i n .
0x7fffffffe7c8: 00 00 00 00      . . . .
0x7fffffffe7cc: 00 00 00 00      . . . .

When function f1 is finished, the program needs to jump to the main function back.

In breakpoint 1, the program counter points to line 13 (i.e., 0x400588 <main+34>).

Note after returning from f1 (i.e., in breakpoint 3) the program counter $rip must point to line 14 (i.e., 0x400592 <main+44>).

Breakpoint 1, main () at callstack.c:13
13          f1(10);
(gdb) p $rip
$3 = (void (*)()) 0x400588 <main+34>

 How?
=====>
=====>

Breakpoint 3, main () at callstack.c:14
14          return 0;
(gdb) p $rip
$9 = (void (*)()) 0x400592 <main+44>

Note 92 05 40 00 00 00 00 00 (what's pushed on the stack) represents 0x400592, which is the return address (i.e., line 14, 0x400592 <main+44>).

[Function prolog assembly code of f1]. $rbp of function main is pushed on to the stack. When function f1 is finished and the control returns to the main function, the program now should go back to use the stack frame of function main. In particular,

Recover $rbp to point to the bottom of main's stack frame. This task cannot be done easily without any additional memory. To handle this elegantly, $rbp for main's stack frame is stored right below the f1's stack frame (data in blue color).

Breakpoint 1, main () at callstack.c:13
13          f1(10);
(gdb) p $rbp
$5 = (void *) 0x7fffffffe7d0

0x7fffffffe7a8: D0 E7 FF FF       . . . .
0x7fffffffe7ac: FF 7F 00 00       . . . .
0x7fffffffe7b0: D0 E7 FF FF       . . . .
0x7fffffffe7b4: FF 7F 00 00       . . . .
0x7fffffffe7b8: 92 05 40 00       . . @ .
0x7fffffffe7bc: 00 00 00 00       . . . .
0x7fffffffe7c0: 69 6E 20 6D       i n . m
0x7fffffffe7c4: 61 69 6E 00       a i n .
0x7fffffffe7c8: 00 00 00 00       . . . .
0x7fffffffe7cc: 00 00 00 00       . . . .

Breakpoint 3, main () at callstack.c:14
14          return 0;
(gdb) p $rip
$9 = (void (*)()) 0x400592 <main+44>
(gdb) p $rsp
$10 = (void *) 0x7fffffffe7c0
(gdb) p $rbp
$11 = (void *) 0x7fffffffe7d0

Note D0 E7 FF FF 92 05 40 00 (in the middle column of the above table) represents 0x7fffffffe7d0 which is the rbp of main's stack frame.

Recover $rsp to point to the top of the stack frame of function main. Good news is that the main's stack frame is right below the f1's stack frame. In particular, the top of the main's stack frame (i.e., $rsp value to be recovered to) should be just 16-byte below the bottom of f1's stack frame (i.e., $rbp for f1). Therefore, recovering $rsp can be easily done by referring to the current $rbp when f1 is executing.

New stack frame for f1.
When function f1 is called, a new stack frame for f1 has been set up (data in the red color in breakpoint 2). The new stack frame is on top of the old stack frame (i.e., data in the blue color, which is the stack from for function main)

Going Back To our Goal

Summary of call stacks

Please remember the figure on the right.

What if?

What if the following return address is changed somehow to something else?
```
0x7fffffffe7b8: 92 05 40 00       . . @ .
0x7fffffffe7bc: 00 00 00 00       . . . .
```
For example, what if the above data magically changes to the following?
```
0x7fffffffe7b8: 88 05 40 00       . . @ .
0x7fffffffe7bc: 00 00 00 00       . . . .
```
Answer:
Function f1 will return to line 13 (0x400588) instead of line 14 (0x400892)!!

Our goal

Recall we want to achieve the following tasks:

Inject a shellcode: If the target problem contains strcpy or fread routine, we can inject a data containing the shellcode.
Have have the program execute the shellcode.
However, the shellcode lies in somewhere in the stack (not the text area, where the normal program code lies). The main challenge is how to control the program counter (i.e., \$rip) so that it points to the shellcode. Now, we have a good idea to achieve this task:

Overflow the buffer and modify the return address to point to the shellcode! Then the funtion will return to the shellcode, and the program will execute the shellcode!

Stack Smashing: General Idea

Here's the general idea of stack smashing.

Overflow the buffer with a payload.

Take advantage of the target program's vulnerability to overflow a buffer.
- The target program has a small size buffer. For example, char buf[128].
- The target program uses an unsafe function to read the input to buffer. For example, scanf("%s", buf).
- In this case, note scanf doesn't really care about how long the user input size is. It will just fill out the input into buf. Interestingly, if the attacker feeds a long string input, that input string will be stored in buf and other parts of the memory.
- We call this technique "overflowing the buffer", and the attacker's malicious long input is called a "payload".
Obviously, the payload should contain a shellcode. The attacker wants this shellcode to be executed by the target program.
At the same time, the payload should be crafted to modify the return address to point to the shellcode as well. Then the function will return to the shellcode, and the program will execute the shellcode!

Payload structure

The payload can be constructed as follows (see also the picture above):

Shellcode
Dummy data. This region is almost useless except that it allows us to finally modify the return address. We will just put a bunch of arbitrary bytes, say, 0x90. (You can choose any other value instead of 0x90).
Address of shellcode. By modifying the return address to point to the shellcode, the attack will have the function return to the shellcode.

Things to consider in constructing the payload

Shellcode: We have it. Easy.
Dummies: We need to figure out how many dummy bytes to put in the payload.
Address of shellcode: It's a memory address that's determined in runtime. So, we need to figure this out, too.