IC211: Object ecapsulation — bundling data & functions

Object Oriented Programming Lesson One

With this lesson we start learning Object Oriented Programming (OOP). Object Oriented Programming as a paradigm has four (or three, depending on how you count) fundamental tenants: encapsulation, data-hiding, inheritance, and polymorphism. Today we kick things off with encapsulation.

Encapsulation is a big deal!

Important! Encapsulation means wrapping up together data and the functions that operate on that data into a single package, which we call "Objects".

As a first illustration, we consider an implementation in Java of the C++ struct Pos you all remember fondly from IC210/SI204. Recall that we are interested in row/column positions and being able to do things like step in some direction to change our position. If we translate this from C++ into Java in the same procedural way we've done so far, we end up with the two Java files Pos.java and PosFuncs.java as shown in the table below.

No encapsulation: data and functions separate		First step towards encapsulation

This is just a first step towards encapsulation, however, because although we've packaged the Pos data together in the same class as the functions that operate on Pos objects, another aspect of encapsulation is that we want to distinguish between methods (aka functions) that belong to the class Pos from methods that are better viewed as belonging to each instance of class Pos. So let's look at that ...

The mechansim: how instance methods (a.k.a. non-static member functions) work

So far we've seen classes used in two ways: collections of functions and collections of data. Of course, we could mix the member data and the static member functions we've been dealing with into a single class, but that wouldn't get us quite to the "objects" of Object Oriented Programming. To get there, we need to understand a new mechanism: non-static member functions, which in Java parlance are called instance methods (as opposed to the static member functions which are called static methods in Java parlance).

First let's recall how instance data works. Imagine you have a class Point defined as follows:

public class Point
{
  public double x, y;
}

... and suppose in some other class file you create two instances like this:

Point A = new Point();
Point B = new Point();
A.x = 10.0; ← which "x" changes? The x belonging to the A.  The x belonging to B is unchanged.
B.y = -1.5; ← which "y" changes? The y belonging to the B.  The y belonging to A is unchanged.
x = y + .2; ← ERROR!  The names "x" and "y" on their own have no meaning. You must specify which Point you are talking about.

A function defined without the modifier static, called an instance method, works the same way. If we add to the class Point like this

public class Point
{
  public double x, y;
  public void scale(double fact) {
    x *= fact;
    y *= fact;
  }
}

and imagine the same two instances

Point A = new Point();
A.x = 3.0;
A.y = -.5;

Point B = new Point();
B.x = 1.0;
b.y = 1.2;

A.scale(.5);  ← Which "x" is changed inside scale? The notation tells us that we are calling the
                scale() method belonging to A, so when we refer to x inside scale(), it means the x belonging to A.  So
                after this call, A's x and y are 1.5 and -.25.

B.scale(10);  ← Which "x" is changed inside scale? The notation tells us that we are calling the
                scale() method belonging to B, so when we refer to x inside scale(), it means the x belonging to B.  So
                after this call, B's x and y are 10 and 12.

... the expression scale(.5), on its own, makes no sense. Why? Because nobody knows whether you mean the scale() that belongs to the Point A, or the scale() that belongs to the Point B. We think of instance methods belonging to instances just like instance data, and we specify which instance is inteded in exactly the same way: A.scale(.5) versus B.scale(.5). Looking inside the definition of scale, you see the expression x *= fact, and you might be tempted to think that it violates what I said earlier, i.e. which x?, which y?. The answer is the x and y that belong to the same instance of Point that this scale() belongs to. In other words, when you make the call

A.scale(.5)

... the x and y in x *= fact and y *= fact are the ones belonging to A. When you make the call

B.scale(.5)

... the x and y in x *= fact and y *= fact are the ones belonging to B.

Underneath the hood with instance methods

Conceptually, each instance of class Point in the above example has its own scale() function, just like it has its own x and its own y. In reality, the way Java handles instance methods works a bit differently. In the implementation, the call A.scale(.5) really acts like a call to scale(A,.5), and this is always the case.

A call to instance method foo of the form obj.foo(arg1,arg2,...,argk) ... is actually a call to a static function foo(obj,arg1,arg2,...,argk)

In fact, this is so literally true, that this implicit parameter that is the object on which the method was called (i.e. what's before the dot), which has the name this, can be used inside the function definition. For example:

public void addToMe(Point B) { x += B.x; y += B.y; } same as public void addToMe(Point B) { this.x += B.x; this.y += B.y; }

So, to clarify, when we make a call to a non-static member function (instance method in Java parlance), the object we call with, i.e. the object before the ".", is implicitly an extra parameter named "this". Hopefully the illustration below clarifies things.

Point.java	call stack during call a.addToMe(b)

You are free to use the this reference in your code even if it is not needed. Sometimes you really need it, sometimes it just makes the code a bit easier to read.

Better encapsulation in the Pos class

Returning to our example of the Pos class, we should decide which methods ought to be instance methods and which methods ought to remain static methods. Let's go through them one-by-one (except main, which has to be static). Note: Turning a static method into an instance method should result in taking one parameter of type Pos and removing it, effectively turning it into the this reference.

public static String toString(Pos p)
Remember, instance methods "belong" to the instance. In this case, the Pos instance we want to turn into a String is the parameter p, and it makes sense to think of toString() belonging to p. So we rewrite as:
```
public String toString() { ... }
```
Which means that if A is a Pos object, then we call the method as A.toString().


	public static Pos read(Scanner sc) 

	  This method doesn't have any parameters of
	  type Pos, which means we can't reasonably make
	  it an instance method.  The Pos object is created within the
	  function, rather than existing prior to the function call.
	
	public static void step(Pos p, char dir)

	  The whole purpose of this method is to modify Pos p, so it
	  makes sense to make step an instance method belonging to the
	  Pos instance we want to modify:
	  public void step(char dir) { ...}
	
	public static int distance(Pos p, Pos q)

	  This is a tricky call.  We have two Pos parameters, either
	  one of which could turn into "this".  But there is no
	  obvious concept of distance() belonging more to p than to
	  q.  So what to do?  It is standard in cases in like this
	  where there is no compelling reason to make a method static,
	  but equally well no compelling reason to make the method an
	  instance method to, default to making the method an instance
	  method.  We'll chose to turn p into "this":
	  public int distance(Pos q)


      
      
	
	  No encapsulation: data and functions separate
	  
	  First step towards encapsulation
	  
	  Second step: use instance methods

The Object Oriented Paradigm: Part 1

Following the Object Oriented Programming paradigm, a program consists of object instances communicating by calling each others' methods. Thus instead of the function being the fundamental unit of a program, as it is in procedural programming, the object instance is the fundamental unit of a program. Each instance of a class has a well-defined interface — the collection of its method prototypes (i.e. the first line of the method declaration) and, hopefully, a bit of documentation — and a well-defined implementation — the definitions of its methods and its data members (more properly called fields is Java parlance). So what's the difference? Well, object instances have data-members/fields, so they have memory or, as computer scientists more formally would say, they have state. The upshot of that is that calls to the same method for the same instance with the same arguments can give different results over time, because the instance has memory/state that can evolve as the program executes. This matches the way things work in the real-world (if p is an instance of class Person, p.weight() give a different answer after a big dinner than it did before), and matches the way we like to think of many abstractions in software systems.

So, when sitting down to design a program in Java, instead of asking yourself "what functions will I need?", you ask yourself "what classes will I need?"; and answering that question will require you think about collections of methods you want to be able to call for each different type of "thing" in your program.

Object Oriented Design Example

Instead of translating existing code, let's try starting from scratch and using encapsulation to help us design. Remember that our ultimate goal is always perfect separation of interface from implementation, and our number one pitfall to avoid is duplication of code.

The problem: I want a program that reads a current position and a goal position, and allows the user to enter sequences of "moves", which are just directions to step to change the current position, reporting the two positions after each move sequence, and ending when the current position reaches the goal positions. Her are some examples:

$ java Track
3 4   2 3
3 4   2 3: NW

$ java Track
3 4   6 6
3 4   6 6: SSS
6 4   6 6: EE

$ java Track
3 4   6 6
3 4   6 6: SE
4 5   6 6: SE
5 6   6 6: S

In Object Oriented Design, we ask "what classes will I need" and "what methods do I want in each class"? In this case, we would like a class that I'll make "Track" that has a read() method for reading in and creating a new Track object, a makeMoves() that takes a string of moves and executes them, a done() method that tells us whether we've arrived at the goal position, and a toString() method that gives us the current state of the Track object as a String that we can printout. Thus, our design is:

Design   ← Note that "the design" is all interface!  This is what other programmers can use
class Track:
  static Track read(Scanner sc)	  
  void makeMoves(String moves)
  boolean done() 
  String toString()

If we have all these things, if we have this interface (nevermind how it's implemented) we could write our whole program as:

    Scanner sc = new Scanner(System.in);
    Track T = read(sc);
    while(!T.done()) {
      System.out.print(T.toString() + ": ");
      T.makeMoves(sc.next());
    }

So now that we have our interface pinned down, let's implement it:

The implementer of "main()" is in the position of someone using our Tracker code. Notice that they need to know nothing about how we implement the interface we offer. They only need to know/use the four methods read, done, makeMoves, toString.
Notice that in implementing the Track class, we got to totally reuse the Pos class, without having to duplicate any code. Moreover, in implementing Track, we only needed to use the Pos class's interface (the methods it provided). We didn't need to worry at all about how it was implemented! Life is good, right?

An Extra Object Oriented Design example

Let's consider a simple example. I'd like to write a program that keeps track of batting results in baseball, in order to report players' batting averages. For simplicity, we'll assume that the result of a player stepping up to the plate will either be a walk, a hit, or an out. The formula for batting average is hits/atbats, where an at-bat is an appearance that resulted in a hit or an out — i.e. walks are ignored. In my program, I'd like to be able to handle many players.

In considering an OOP approach, we would identify that a player should be an object in our program. We should have a method that allows us to record the results of one or more plate appearances, and we should have a method that reports the player's current batting average. This leads us to an interface like this:

class Player
{
  void record(String outcomes) ← records outcomes of plate appearances; outcomes is a string of h/w/o's like "hoowh"
  double average()             ← returns the current batting average, no rounding
}

If we had this kind of interface, we could write code like this:

Batter b = new Batter();
b.record("owhoowoowhhwoohoo");
System.out.println(b.average());
b.record("hhowowohhwohoohoowoooh");
System.out.println(b.average());

Of course, whole teams worth of batters would work the same way. We'd just have arrays or linked lists of Batter objects, each recording and reporting their own batting averages.

So what about implementing this? The implementation has to remember things in order to be able to report a batting average. This means it has to have data-members/fields. What we want to remember is an implementation decision. One option is to keep a count of at-bats and a count of hits. That leaves us with something like the following

Sample Run

~/$ java Team
A: 0.3333333333333333, B: 0.5
A: 0.2857142857142857, B: 0.42857142857142855