Encapsulation and Information Hiding

We now know that classes can bundle data and they can contain functions. This is called encapsulation. The data and the functions are grouped together in a capsule.

Encapsulation makes it easier to provide an important property on OOP: information hiding. Information hiding is the idea that there are two groups of programmers that know about a class. There are the creators of the class and there are the users of the class.

The creators of the class know how the class works. They know how the data is represented, and they know how the methods that manipulate the data work. The users of the class don't know any of the representation details or how anything works underneath the hood, but they do know how they're supposed to use it. The know what the arguments are and what the methods should return for those arguments.

But why would we want to hide data? Isn't more information and capability always better?

In my research, I sometimes use simulators of various physical processes. For example, I may use some code that simulates the riding of a bicycle; the user has the option of turning the handlebars or shifting hte rider's weight in order to alter the bicycle. Now, inside the Bicycle class, is a series of fields that represent the current state of the Bicycle: the bike's position, its angle from the ground, it's angular velocity and acceleration, the handlebar's angle, etc. As the user, I should NOT be able to change those directly! I should only be allowed to change those indirectly by changing the rider's actions; this is what keeps the simulation accurate. Information hiding!

In any program we write, we are always members of both groups, creators and users. With the code that we actaully write, we are the creators, and we have to know how it works, but with any built-in methods that we call, we are just the users.

Do you know how to take the sine of a number in java? (hint: it's Math.sin(angle)). In order to do that you need to know the specification of the sin() function. What is the type of the argument? (it's a double) Is the argument in degrees or radians? (it's in radians) What is the type of the return? (also a double).

Do you know how the the sin() method is implemented? No. Is that a problem? No. In many cases knowing how to use something is all you need to know. Does it matter to you if the implementation used a lookup table or used the power series or something else? Probably not.

Can you without looking in a book manual or internet determine how sine is implemented? No. You don't need to know that information, and it is hidden from you. And frankly, you don't care.

So we agree that the creators hiding information from the user doesn't hurt much, but does it help? I.e., why should we bother?

  1. Information hiding relieves users have having to know more than necessary.
  2. Information hiding makes it possible to change the implementation without affecting the user directly.

Can we see some examples?

  public class Angle {
    double rad;
    public Angle (double r) {
      rad = r;
    }
    double sin(int terms) {
    //compute sine using power series
      double s = rad;
      int sign = -1;
      for(int i = 3 ; i < terms ; i +=2){
        s = s + sign*pow(rad,i)/fact(i);
        sign = sign*-1;
      }
      return s;
    }
  }

What if we want our angle to always be between -π and π? We have a problem because our user can initialize rad to anything. So we can put a check in the constructor:

  public Angle (double r) {
    while (r > Math.pi) { r -= 2*Math.pi; }
    while (r < -Math.pi) { r += 2*Math.pi; }
    rad = r;
  }

Problem solved! Except, our user can still change it directly...

  Angle a = new Angle(0);
  a.rad = 42;

Our user has ruined our carefully laid plans... Fortunately, we have an encapsulation technique to deal with this: We prevent the user from accessing rad directly, by making it private:

  public class Angle {
    private double rad;
    public Angle (double r) {
      while (r > Math.pi) { r -= 2*Math.pi; }
      while (r < -Math.pi) { r += 2*Math.pi; }
      rad = r;
    }
    double sin(int terms) {
    //compute sine using power series
      double s = rad;
      int sign = -1;
      for(int i = 3 ; i < terms ; i +=2){
        s = s + sign*pow(rad,i)/fact(i);
        sign = sign*-1;
      }
      return s;
    }
  }

what this does is prevents the user from changing this field at all; in other words, if we have an Angle a, the compiler will no longer allow us to access a.rad at all. But what if we the users want to change the field? We have to use a method called a "setter" to do it:

  public class Angle {
    private double rad;
    public Angle (double r) {
      while (r > Math.pi) { r -= 2*Math.pi; }
      while (r < -Math.pi) { r += 2*Math.pi; }
      rad = r;
    }
    public setRad(double r) {
      while (r > Math.pi) { r -= 2*Math.pi; }
      while (r < -Math.pi) { r += 2*Math.pi; }
      rad = r;
    }
    double sin(int terms) {
    //compute sine using power series
      double s = rad;
      int sign = -1;
      for(int i = 3 ; i < terms ; i +=2){
        s = s + sign*pow(rad,i)/fact(i);
        sign = sign*-1;
      }
      return s;
    }
  }

Now, it is impossible for the user to screw up our data. That's pretty big, because users can be pretty stupid. And even if they're not, we've saved them from needing to care what domain of angles we're going to use, or possibly making an avoidable mistake.

We can use the word private to prevent the user of our class from messing around with our data. The word public means the user can mess with out data, and if you use no word at all, then files in the same package can mess with the data.

Next lets look at our implementation of sine. It takes an argument which is an upper bound on the power of the term in the power series. Doe we really want to bother the user with that? Do you want to be bothered with that? Of course not, that is bad information hiding. The implementation of this function is not hidden from you. You have to know what a power series is, and how it works, to know what integer you might want as an argument.

The obvious thing is to move the terms inside the class:

  double sin() {
  //compute sine using power series
    double s = rad;
    int sign = -1;
    for(int i = 3 ; i < 12 ; i +=2){
      s = s + sign*pow(rad,i)/fact(i);
      sign = sign*-1;
    }
    return s;
  }

Now the detail that this is using power series is hidden from the user, and the user is relieved.

Now lets say our sin() function is being called, A LOT. We might want to make it faster, so we decide to go with a lookup table instead:

  double sin() {
  //compute sine using precomputed lookup table
    return table[rad*1000];
  }

The interesting thing here is that even though we've completely changed the implementation, the code that calls sin() doesn't change at all, it's exactly the same. If we had used the version with the terms argument, would would have had to change every single piece of code that was ever written that used the sin() function, which is horrible.