IC210: Types & Expressions II

Representing data in a computer

Recall that inside a computer everything is 0's and 1's! (A bit is just a 0/1 value.) But how can all of these things - chars, ints, bools, and doubles - be represented by zeros and ones? Our understanding of types will really depend on being able to answer these questions.

Note: We strongly recommend that you review the class on Digital Data from the SY110 website.

You are expected to understand about bits and bytes, binary-to-decimal and decimal-to-binary conversion, and how the ASCII table defines a mapping between characters and bytes. What's here in the notes is just a brief overview of that.

Here's a link to a full ASCII table.

Binary numbers

First we'll look how 0's and 1's suffice to represent any integer number, then we'll look at other types of objects. When we deal with numbers we use the decimal number system, i.e. the base 10 number system. This means that all our numbers (lets look at non-negative integers for now) look like sequences of decimal digits, which are numbers in the range [0,9]. A number like 3027 is short-hand:

3027 → 3*10^3 + 0*10^2 + 2*10^1 + 7*10^0

Or, for another example,

1011 → 1*10^3 + 0*10^2 + 1*10^1 + 1*10^0

In the binary number system we have the same idea, but the base is now 2 rather than 10. So, binary digits are in the range [0,1], and now 1011 has a different interpretation. In binary it is short-hand for:

1011 → 1*2^3 + 0*2^2 + 1*2^1 + 1*2^0 = 2^3 + 2 + 1 = 11 (in decimal)

So, in binary the decimal number 11 is represented as 1011. The binary number 1001 = 2^3 + 1 = 9, for another example. With four bits, i.e. four binary digits, we can represent any number from 0 up to 15 (which is 2^3 + 2^2 + 2^1 + 2^0). With four decimal digits I can represent from 0 up to 9999, i.e. from 0 up to 10000 - 1. So we need more bits than decimal digits, but given enough bits we can represent any number we care to. Using k-bits, we can represent the numbers from 0 up to 2^k - 1.

Bytes - How type depends on interpreting bit-sequences

The memory of a computer is simply one long sequence of bits. However, these bits are organized into chunks of 8 called bytes. To emphasize, a byte consists of 8-bits. In a byte, we can represent the numbers from 0 to 255.

The type bool is just a way of interpreting a byte of memory. If all 8 bits of the byte are zero, the interpretation as a bool is false. Otherwise, the interpretation of a bool is true.

The type char is just a different way of interpreting a byte of memory! For example, the byte 01100001 is interpreted as the character a. This interpretation of bytes as characters is called the ASCII encoding, and this table, for example, shows you the whole thing. Interpreting 01100001 as a number in binary, we get the number 97, and if you look up 97 in the table, you'll see that it corresponds to the character a.

Already we see one of the fundamental ideas behind computing, different types of objects may be represented by treating sequences of 0's and 1's in different ways. That's why C++ needs to keep track of the types of objects, so it knows how to interpret the contents of the chunk of memory associated with each object.

A note on `char`s

In fact, you can look at a char as just being a small integer (I say small because 8-bits only allows us the range [0,255]). This interpretation pretty much tells us what to expect of conversions between chars and ints.

One interesting feature of this match-up between characters and numbers is that statements like 'b' - 'a' make perfect sense. Looking at the ASCII table, we see that 'b' corresponds to the number 98, and 'a' to the number 97. So C++ treats this as the int subtraction problem 98 - 97, which evaluates to 1. In fact, the letters of the alphabet appear in order, so that a is 97, b is 98, ..., z is 122. So, char('b' + 3) is the character e.

Other types

Technically, the int 5 could be represented as

00000000 00000000 00000000 00000101

... or it could be represented as

00000101 00000000 00000000 00000000

... depending on what's referred to as as the "endianness" of the underlying machine. That particular distinction is beyond the scope of this course, but you will encounter it in subsequent CS/IT course.

A full int on your PC consists of 4 bytes, or 32 bits, so it can represent very large numbers. We're not going to get into the question of how negative numbers are represented in binary. Essentially an int looks like the binary number representation we just talked about, but in 32 bits. So, the int 5 is represented in the computer as:

00000000 00000000 00000000 00000101

... where I've broken things up into bytes to make it all a little clearer.

A double takes up 8 bytes, or 64 bits. The format is more complex, however, and we will not go over it here, except to say that it is a binary version of the familiar scientific notation. However, instead of a base of 10, it uses a base of two. (Example: 12 is represented as 1.5 x 2^3.) Let it suffice to say that the double 1.0 is represented by the following 64 bits:

00000011 11111111 11111111 00000000  00000000 00000000 00000000 00000000

Representing data in a computer

Binary numbers

Bytes - How type depends on interpreting bit-sequences

A note on chars

Other types

A note on `char`s