SI110: Steganography

Steganography vs. Encryption

Encryption hides the contents of a message, but not the existence of a message. If you were a spy in a hostile country, merely sending a message back to the U. S. would be incriminating. If that message was encrypted, it'd probably be a whole bunch more incriminating, and things would only get worse when you, the spy, refused to decrypt the message for the authorities. Steganography, which literally means "hidden writing", is about hiding the existence of a message. Often this means hiding a secret message within boring, unsuspicious data.

A Bit of History

Herodotus, a famous ancient greek historian, tells of a secret message (Persian invasion plans) being tatooed on the shaved head of a slave who, after his hair grew back, was sent to Greece, where his head was shaved and the message revealed. I wonder if he went naturally bald later in life? A common pre-digital kind of steganography using the message-within-a-message idea is to write a boring piece of text such that pulling out every kth word yields the actual interesting message. The following boring text contains a secret, steganographically-hidden message. If you collect every 4th word, you'll recover the message!

Every person I meet seems very mad at lots of people. Eight out of every ten people are angry. Tonight I am sad.

← Move your mouse over the text on the left to see the secret message.

Historically, there are all sorts of clever ways people have steganographically hidden messages.

Terminology

Generally, we have some message data to hide, some message data to hide it in, and some message data combining to two, which is what's sent. The terminology for this will be:

embedded-message — the message we wish to hide
cover-medium — the data we'll hide the embedded-message in
stego-medium — the cover-medium with the embedded message inside it

Digital Steganography

The digital world offers a host of new cover-media (message carriers) and new ways to hide the embedded-message in the cover-medium. One of the most common techniques is to hide the embedded-message in an image file. The idea is that we as people don't notice small variations in color. In an image file, each pixel has a color defined by three 1-byte values: one for red (r), one for green (g), and one for blue (b). If we make small changes in those bytes, the change in the color of that one pixel is too small for us to perceive. Try playing with the following demo to convince yourself of that fact.

So now we know a few useful things:

Whatever our embedded-message is, it's represented by bits, i.e. by 1's and 0's.
The image that will serve as our cover-medium is also represented as bits, i.e. by 1's and 0's.
There are certain bit-positions in the cover-medium file (the least significant bit or bits in each R, G or B byte) that we can toggle between 1 and 0 and not change the image in a way that's discernable to human beings.

This should suggest a strategy: spread the bits that make up the embedded-message out amongst the least significant bit or bits of each R, G or B byte in the cover-medium image file. The message will be there, but the image will not look any different to the human eye. Now, if the embedded-message is N-bytes long and the cover-medium image file consists of at least 8 N R, G and B bytes, life is easy. In this case we simply set the least significant bits of each R, G and B byte to the corresponding bit of the embedded-message. However, if the embedded message is longer, we have to use the two least significant bits of each R, G and B byte; or the three least significant bits of each R, G and B byte; and so on. As we change more bits in each R, G and B byte, the effect starts to become noticable ... which is bad for steganography! The following figure demonstrates this.

The first N lines of The Complete Works of Shakespeare hidden in an image of a cat
0 Lines	500 Lines	1000 Lines	1500 Lines

This is an interesting observation, and I hope it makes sense to you, but it's not a big problem for a potential user of steganography. To solve it, just use a bigger image for your cover medium.

The bmpsteg tool. Dr. Stahl has written a nice little utility called bmpsteg that you can use to hide and reveal messages in .bmp image files. Check out the instructions for the Web Page Steganography Activity for some information on using it to hide messages on your webpages, and look under "steganography" on the SI110 Course Resources page to download it and for installation instructions.

Steganography for Facebook
Check out this Wired article about "Secretbook", an extension for Google-Chrome that makes it easy to steganographically hide secret messages in the photos you share on facebook.

More Digital Steganography

Hopefully the preceeding section made sense, i.e. hopefully you understand how we can hide a message using an image file as the cover-medium. There are different schemes for different kinds of cover-media. For many, like sound and movie files, the basic idea is the same. For other kinds of cover-media we must find different ways of hiding our embedded message. For example, if we were using an html file as the cover-medium, we might hide a single bit in each line of html-code in the following way: if the line has an even number of space characters it represents a 0, otherwise it represents a 1. If we add the occasional spurrious space character who'd ever notice? After all, it doesn't change what's rendered in the browser. If we were using network traffic as a cover-medium, we could artificially send packets in quick bursts or with long delays, and use that to send the embedded-message using something like morse code.

One fun web-based tool is www.spammimic.com, let's you enter a short text message, and then constructs what looks like a typical spam e-mail message from it. You send it to someone, and the they can paste the message into www.spammimic.com's decode box, and it'll decode the message.

Steganography and Crypto Tools

Steganography doesn't supplant symmetric encryption (secret key), hashing, asymmetric encryption (public key) or certificates, rather it provides a tool that can be used in combination with these other tools to meet your security goals. What if we wanted to be able to authenticate the sender of a steganographic message we received ... how could we do that? What if we're worried about keeping the contents of a steganographically hidden message secret from other people that know about the steganographic protocol we're following? How do we let the people we want to communicate with know where to look for images (or other cover-media) containing messages and what tool to use to extract the message? Is this last problem reminiscent of a problem we've already seen in our study of cryptography?

Think back to our pillars of IA: Does steganography provide confidentiality? Integrity? Authentication? Non-repudiation? If a criminal matermind is sending out instructions to his minions as steganographically hidden messages within images posted to a public blog, which IA pillars is he relying on? What can he do to guarantee them?

red	green	blue	The box shows the color defined by the rgb bytes displayed to the left. Change the bits in those bytes and press enter, and the color of the box will update. You should see that changing the least significant bits (i.e. the right-most bits) in each byte has no discernable effect on the color. We know it changes, but by too little for us to perceive.