Digital Data – Bits & Bytes

"Cyber Space" is a name for this global information system consisting of many familiar pieces — like web sites, distributed video games, email — and many less familiar pieces as well, that work behind the scenes. All of these systems do little more than process, store and retrieve digital data. So to even begin to understand Cyber Space, we need to get a handle on what "digital data" is all about. That's what this lesson will do.

Submarine Radio Communications


Ballistic missile submarines remain undetected beneath the ocean's surface awaiting the order to launch their missiles. The launch order must be sent via radio transmission, but sea water blocks those radio waves typically used with satellites or for long-range radio because of their high frequencies. For submarines, very low frequency (VLF) radio waves must be used (3-30kHz) to penetrate the ocean and reach the submarine's VLF antenna.

Communicating with submarines while completely submerged comes at a cost. VLF radio waves have a severely limited capacity for carrying data. VLF data transmission rates are around 300 bps. Compare that with a data transmission rate of 10 Mbps for a 4G wireless phone. Your smart phone is 33,000 times faster than VLF! In other words, it would take 2 hours and 47 minutes to download one MP3 song using submarine VLF communications, where it would only take 0.3 seconds using your 4G phone.


(Image courtesy of Jim Hawkins)
This is a picture of the VLF antenna array that used to be at Greenbury Point. The three small antennas you see today are all that's left. The rest were pulled down in the late 90's.

Bits and Bytes

Digital data consists solely of 0's and 1's. An individual 0 or 1 value is called a bit. So to represent a piece of information, you need to be able to express that information as a sequence of 0's and 1's. For the remainder of this lesson, we'll explore how this is done for many different kinds of information. First, however, there's a practical issue to take care of. All computational devices group bits into chunks of eight, and that's usually the smallest unit of data they actually operate on. An 8-bit chunk is called a byte. The difference between bit and byte is really important.

A computer is typically capable of storing and processing an immense number of bits and bytes. So we often speak of kilo, mega, giga and tera bytes or bits. What do those mean? Normally kilo means thousand, mega means million, giga means billion, and tera means trillion, and that's approximately true in the context of digital data, but not exactly. In the context of digital data:

... so "megabyte" means 220 bytes, which is 8 × 220 = 223 bits. Finally, you often see these abbreviated as K=kilo, M=mega, G=giga, T=tera and b=bit and B=byte. So, Gb means "gigabit" whereas GB means "gigabyte", which is eight times as many bits. In fact, it's not always easy to know whether the "decimal" or "binary" interpretation of "kilo", "mega" etc. is meant, especially in marketing material.

Binary Numbers


xkcd.com/74/
"There are 10 kinds of people: those who know binary, and those who don't."
On the face of it, it's pretty amazing that all information can be somehow expressed as sequences of bits. Actually though, it's all possible because numbers can be expressed as sequences of bits. A number expressed as a sequence of 0's and 1's is called a binary number, and the idea is no different from how we use sequences of decimal digits to represent numbers. Recall how that works: When we write 467 we mean 4×102 + 6×101 + 7×100. Now, in a binary number we only allow bits as digits, and instead of powers of 10, we have powers of 2. So in binary, 1101 means 1×23 + 1×22 + 0×21 + 1×20 which is 13 in decimal. Numbers of any size can be represented by sequences of 0's and 1's, though larger numbers require longer sequences.
In fact, it's easy to compute how many bits you need to represent a number of a specific size. With k bits, you can represent any number from 0 up to and including 2k-1. To represent a positive integer N, you need 1 + log2N bits. In a byte, i.e. eight bits, we can represent numbers up to 28-1 = 256 - 1 = 255.
Because of the importance of bytes, we will concentrate on being able to write numbers as 8-bit sequences, and being able to interpret an 8-bit sequence as a number. The smallest number we can represent in 8-bits is 0, which is the byte 00000000. The largest is 255 which, in binary, is 11111111. Of course, anything in between is possible as well.
Videos showing how to convert from binary to decimal and back

Hexadecimal

Bytes are all-important in computing, and after a while it becomes cumbersome to write out all eight bits of a byte. So we often write out bytes as two hexadecimal digits. Hexadecimal is actually the base 16 number system, but for our purposes that is irrelevant. The important point is that it gives us a concise representation for bytes, since each hex digit represents a 4-bit pattern. Thus two hex-digits represent an 8-bit pattern, i.e. a byte. The following table gives the mapping between the hex digits (0,1,2,3,4,5,6,7,8,9,a,b,c,d,e,f) and 4-bit patterns.
hex digit 0 1 2 3 4 5 6 7 8 9 a b c d e f
4-bit pattern 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111
Using this table, you should be able to convert 3cf6 into binary digits, and convert 01101110 into two hex digits.

ASCII Encoding and Text

Encoding is converting data from one system of communication into another. There are other encoding schemes beyond ASCII; for example: base64, Unicode, UTF-8.
Other than numbers, the most fundamental data is plain text. The method for representing text digitally (i.e. as bits and bytes) depends on the alphabet the text uses, of course. However, in the cyber world, English is the base language and everything else is an add-on. Convenient for us, eh? Basic text is represented using one byte (i.e. one number in the range 0-255, although in reality we only use 0-127) for each character, where the characters allowed and the byte values (i.e. numbers) they correspond to are given by the ASCII Table. So, for example, the letter a has ASCII value 97 which is byte 0110 0001 (spaced for readability). ASCII values 32-126 are the printable characters, and any sequence of bytes consisting solely of them is considered to be plain text. We might allow the additional values 9 ← tab, 10 ← newline, 13 ← carriage return, which provide limited formatting.
String to ASCII Demo
input a string and press enter
You can actually enter ASCII values into the address bar in Google Chrome. Although you have to write them in hexadecimal notation rather than decimal or binary. (Hexadecimal is a base 16 (rather than 10 or 2) number system, whose digits are 0,1,2,3,4,5,6,7,8,9,a,b,c,d,e,f.) For example, c has ASCII value 99 which is 63 in hex, so a c can be written in the address bar as %63. Thus, entering %63nn.com in your browser's address bar gets you go cnn.com! BTW: Mozilla Firefox no longer supports this feature for the server name (cnn.com since there are actually security implications with this.

A sequence of characters is called a string, and what we've just seen is that ASCII gives us a way to encode strings as sequences of bits (or, if you prefer, bytes).