## string

The types we've seen so far (int, double, float, char, bool) are all built-in types. This means that they are part of the core language rather than part of some library. The last type we'll talk about today, the type string, is not a built-in type, it is defined in the library string. It is used for strings of characters. We've already dealt with some string literals, for example "Hello World!", quotes and all, is a string literal. As long as you remember #include <string> and using namespace std;, you can use strings just like any other type. The true name of this type is std::string, but with the using statement we can drop the std::.

 operation(s) Comments + Concatenation. For example, if s is the string "Hello" and t is the string "World", then s + t is the string "HelloWorld".

In fact, there are many, many more types in C++. Even cin and cout are objects with types: cin is an object of type istream and cout is an object of type ostream. It will be a while before we discuss what these types are all about, but most things in C++ are objects with types.

## The Power of Types

One of the easiest ways to see the importance of types in C++ is to watch how they affect cin. When you use cin >> x, the behavior of cin depends on the type of x. In other words, cin uses type information to do the right thing.

 Code User Input Program Output Unread Input char c; cin >> c; cout << c << endl; 23.005% 2 23.005% int k; cin >> k; cout << k << endl; 23.005% 23 23.005% double x; cin >> x; cout << x << endl; 23.005% 23.005 23.005% string s; cin >> s; cout << s << endl; 23.005% 23.005% 23.005% bool b; cin >> b; cout << b << endl; 23.005% 1 This results in an error state for cin. Only values of 1 or 0 with whitespace or end-of-file following will read in correctly. This, of course, doesn't fit the rules for the other type. Sigh.

Look carefully at this table, and make sure that you understand why cin did what it did based on the types of the variables into which it read. NOTE: Any string tab, space, and newline characters is called whitespace. An important feature of cin is that it skips leading whitespace before it starts reading.

## Input and Output as Streams of Characters

Input like

 Jones   3:25
Smith   4:11

seems to be spread over several lines and composed of different elements - numbers, strings, and characters. However, it is in fact just one long line of characters, and by reading data we move through this line of characters called an input stream. Suppose, for example, we ran the code

string str1,str2;
int m1,m2,s1,s2;
char c1,c2;
cin >> str1 >> m1 >> c1 >> s1;
cin >> str2 >> m2 >> c2 >> s2;


with the above as input. What follows shows you this input stream, and demonstrates how we move through the input stream. The ^ carrot, shows what the "next character to be read" is. The tab character is represented by \t, and the newline character by \n.

   J  o  n  e  s  \t 3  :  2  5  \n    S  m  i  t  h  \t 4  :  1  1  \n
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
^

J  o  n  e  s  \t 3  :  2  5  \n    S  m  i  t  h  \t 4  :  1  1  \n
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
^

J  o  n  e  s  \t 3  :  2  5  \n    S  m  i  t  h  \t 4  :  1  1  \n
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
^

J  o  n  e  s  \t 3  :  2  5  \n    S  m  i  t  h  \t 4  :  1  1  \n
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
^

J  o  n  e  s  \t 3  :  2  5  \n    S  m  i  t  h  \t 4  :  1  1  \n
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
^

J  o  n  e  s  \t 3  :  2  5  \n    S  m  i  t  h  \t 4  :  1  1  \n
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
^

J  o  n  e  s  \t 3  :  2  5  \n    S  m  i  t  h  \t 4  :  1  1  \n
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
^

J  o  n  e  s  \t 3  :  2  5  \n    S  m  i  t  h  \t 4  :  1  1  \n
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
^

J  o  n  e  s  \t 3  :  2  5  \n    S  m  i  t  h  \t 4  :  1  1  \n
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
^



## Everything's an expression ... almost

Almost everything in a C++ program is an expression. An expression is textual piece of source code that can be evaluated to produce an object that has a type and value. The most familiar expressions are arithmetical. For example, if k is an int that's been assigned the value 4, (k - 2)*7 is an expression of type int and value 14.

Often the value of an expression cannot be determined until the program is run. For example, in the code below

int k;
cin >> k;
cout << k+7;

... what is the value of the expression k+7? Well, it depends what the user enters for input. That's something that cannot be known until the program is run. However, the type of the expression k+1 is something we do know before running the program: it's int. Most of the time (especially for the subset of C++ we concentrate on) the type of an expression is known at compile time, i.e. when compiling the program, as opposed to running it. Because of this, C++ is said to be a "statically typed" language.

One example of something that is less obviously an expression is k = 5, where k is once again a variable of type int. The type of this expression is int, and the value is 5. In general, an assignment expression has the same type as the object being assigned to, and the same value as that object after the assignment is carried out. Thus, oddly enough, if x is a variable of type double, then (x = 3.4)*2.0 makes perfect sense, it is an expression of type double, and after it is evaluated, it has value 6.8. This is our first explicit example of a side effect. The expression has type and value, but additionally it has the side effect that evaluating the expression changes the value of the variable x.

## What's not an expression?

At this point it may seem like everything's an expression, but that's not true. For example, anything with a ';' (semicolon) at the end is a statement, not an expression. So while k = 4 is an expression as used in (k=4)*3 will evaluate to 12, the statement k = 4; as a line of code ending in a semicolon is not an expression. Declarations of variables, something like int k for example, are not expressions - regardless of whether the ; is there. Still, most things are expressions, and understanding this fact and being able to identify the types and values of expressions are key to understanding C++ ... and most any other programming language.

## When types collide - conversion

Things get interesting when expressions involve different types. For example, what is the type and value of the expression x*k, where x is of type double with value 3.3, and k is of type int with value -2? The answer is type double and value -6.6. The explanation is this:

C++ knows how multiply two int objects, and it knows how to multiply two double objects, it doesn't know how to multiply one of each. However, it understands that an int can be converted to a double and vice versa. So it converts one and performs the multiplication on two objects of the same type. But which way should it go? For arithmetic, types are always converted in the direction that gives the most precision - this is referred to as type promotion - which in this case means that the int is converted (or promoted) to a double, and the operation is performed on two doubles. It wouldn't make nearly as much sense the other way round, would it?

This implicit type conversion (implicit meaning that it happens automatically behind the scenes, without you doing anything directly) happens in other cases. The only one affecting us right now is assignments. You can assign an object of one type to an object of a different type, as long as C++ knows how to do the conversion. If it doesn't, the compiler will let you know. So, for example, x is of type double with value 3.3, and k is of type int, then k = x is an expression of type int with value 3. C++ truncates doubles when converting to ints.

We also have explicit type conversion. Suppose, for example, that m and n are ints, n being the larger. We'd like to print out the value of m/n. Well,

cout << m/n << endl;

will just print out zero! (Make sure you know why!) We'd like to get some fractional value, in other words, we'd like these values treated as doubles. To explicitly convert them to doubles first we'd write:

cout << double(m)/double(n) << endl;

[Challenge: can you explain what type and value you get with m / double(n)?]

Explicit conversion can get tricky later on, but at this stage it's as simple as writing the new type name followed by the old object in ()'s.

 Some Quick Conversion Rules int → double : This does exactly what you'd expect. double → int : This simply truncates the value, meaning that whatever's after the decimal point just gets chopped. You can get in trouble if the double value is too big. bool → int : true goes to 1, false goes to 0. int → bool : 0 goes to false, everything else goes to true; int → char : if the int is in the range 0-127 the char value is determined by the ASCII table; char → int : the int value is determined by the ASCII table;

## Representing data in a computer

Note: I strongly recommend that you review the class on Digital Data from last year's si110 website. You are expected to understand about bits and bytes, binary-to-decimal and decimal-to-binary conversion, and how the ASCII table defines a mapping between characters and bytes. What's here in the notes is just a brief overview of that. Here's a link to a full ASCII table.

You've probably heard terms like bits and bytes used in connection with computers, and you've probably heard people say that inside a computer everything is 0's and 1's. If not, I'll say it now: Inside a computer everything is 0's and 1's! (A bit is just a 0/1 value.) But how can all of these things - chars, ints, bools, and doubles - be represented by zeros and ones? Our understanding of types will really depend on being able to answer these questions.

## Binary numbers

First we'll look how 0's and 1's suffice to represent any integer number, then we'll look at other types of objects. When we deal with numbers we use the decimal number system, i.e. the base 10 number system. This means that all our numbers (lets look at non-negative integers for now) look like sequences of decimal digits, which are numbers in the range [0,9]. A number like 3027 is short-hand:

3027 → 3*10^3 + 0*10^2 + 2*10^1 + 7*10^0


Or, for another example,

1011 → 1*10^3 + 0*10^2 + 1*10^1 + 1*10^0


In the binary number system we have the same idea, but the base is now 2 rather than 10. So, binary digits are in the range [0,1], and now 1011 has a different interpretation. In binary it is short-hand for:

1011 → 1*2^3 + 0*2^2 + 1*2^1 + 1*2^0 = 2^3 + 2 + 1 = 11 (in decimal)


So, in binary the decimal number 11 is represented as 1011. The binary number 1001 = 2^3 + 1 = 9, for another example. With four bits, i.e. four binary digits, we can represent any number from 0 up to 15 (which is 2^3 + 2^2 + 2^1 + 2^0). With four decimal digits I can represent from 0 up to 9999, i.e. from 0 up to 10000 - 1. So we need more bits than decimal digits, but given enough bits we can represent any number we care to. Using k-bits, we can represent the numbers from 0 up to 2^k - 1.

## Bytes - How type depends on interpreting bit-sequences

The memory of a computer is simply one long sequence of bits. However, these bits are organized into chunks of 8 called bytes. To emphasize, a byte consists of 8-bits. In a byte, we can represent the numbers from 0 to 255.

The type bool is just a way of interpreting a byte of memory. If all 8 bits of the byte are zero, the interpretation as a bool is false. Otherwise, the interpretation of a bool is true.

The type char is just a different way of interpreting a byte of memory! For example, the byte 01100001 is interpreted as the character a. This intepretation of bytes as characters is called the ASCII encoding, and this table, for example, shows you the whole thing. Interpreting 01100001 as a number in binary, we get the number 97, and if you look up 97 in the table, you'll see that it corresponds to the character a.

Already we see one of the fundamental ideas behind computing, different types of objects may be represented by treating sequences of 0's and 1's in different ways. That's why C++ needs to keep track of the types of objects, so it knows how to interpret the contents of the chunk of memory associated with each object.

## A note on chars

In fact, you can look at a char as just being a small integer (I say small because 8-bits only allows us the range [0,255]). This interpretation pretty much tells us what to expect of conversions between chars and ints. One interesting feature of this match-up between characters and numbers is that statements like 'b' - 'a' make perfect sense. Looking at the ASCII table, we see that 'b' corresponds to the number 98, and 'a' to the number 97. So C++ treats this as the int subtraction problem 98 - 97, which evaluates to 1. In fact, the letters of the alphabet appear in order, so that a is 97, b is 98, ..., z is 122. So, char('b' + 3) is the character e.

## Other types

A full int on your PC consists of 4 bytes, or 32 bits, so it can represent very large numbers. We're not going to get into the question of how negative numbers are represented in binary. Essentially an int looks like the binary number representation we just talked about, but in 32 bits.

So, The int 5 is represented in the computer as:

00000000 00000000 00000000 00000101
... where I've broken things up into bytes to make it all a little clearer.

A double takes up 8 bytes, or 64 bits. The format is more complex, however, and we will not go over it here, except to say that it is a binary version of the familiar scientific notation. However, instead of a base of 10, it uses a base of two. (Example: 12 is represented as 1.5 x 2^3.) Let it suffice to say that the double 1.0 is represented by the following 64 bits:

00000011 11111111 11111111 00000000  00000000 00000000 00000000 00000000