Designing a system for handling I/O is a daunting problem for language/library/API developers. These operations are so ubiquitous in programs that getting it wrong means making pretty much every program anyone ever writes more difficult, and getting it right means making pretty much every program anyone ever writes easier. So, let's consider some disreable properties, some design goals, regarding reading, i.e. the "input" half of I/O. In particular, we'll look at stream-oriented input and output.
| class InputStream | class Reader | class Scanner |
| InputStream is for byte-based input. | Reader is for character-based input. | Scanner is for token-based input. |
int read(); // should be byte, but uses int
// to return -1 to signal EOF
void read(byte[] b, int off, int len);
void close();
|
int read(); // should be char, but uses int
// to return -1 to signal EOF
void read(char[] b, int off, int len);
void close();
|
String next(); int nextInt(); double nextDouble(); ... |
InputStream
/ \
/ \
/ \
FileInputStream ByteArrayInputStream
So if, for example, you want to write code to
search for the bytes 0x7F 0x45 0x4C 0x46, which indicates the
beginning of a Unix executable, you would write your method
to take an InputStream argument. That way the same method works
for both files and byte arrays.
Similarly, specific sources of characters give rise to classes that extend Reader. For example, in the API we have:
Reader_____
/ \ \____
/ \ \
/ \ \
StringReader CharArrayReader InputStreamReader
So, if you wanted to write code to count the number of non-alphabetical characters in text, you would write that method to take a Reader argument. That way the same method works for Strings, arrays of chars and (and this is interesting!) any InputStream — because we can make an InputStream the source of characters for a Reader via the InputStreamReader class! If you look at the API documentation for InputStreamReader, the InputStreamReader constructors take an InputStream as a parameter.
Readable interface.
However, Reader
implements Readable, so this constructor works
with Readers, but is in fact a bit more general
than that.
2. The Scanner constructor that takes an InputStream as an argument is
actually just a convenience thing. You only really need the
constructor that takes a Reader as an argument. Why is that enough?
ScannerClass Scanner has constructors that take InputStreams or Readers as arguments. So if you wanted to write code to do something like add all the integers in some text, you would write that method to take a Scanner as an argument. That way it would work with files, byte arrays, char arrays or Strings. Putting this together, if you have a file whose name is "data.txt" and you want to read in tokens from it (e.g. ints and double and booleans and strings), you would create a scanner for it like this:
Scanner sc = new Scanner(new InputStreamReader(new FileInputStream("data.txt")));
\_____________________________/
an InputStream whose bytes
come from file data.txt
\____________________________________________________/
a Reader whose chars come from the bytes in data.txt
\_________________________________________________________________/
a Scanner whose tokens are made up of chars whose bytes come from data.txt
There are some shortcuts to all of this. So-called
"convenience methods" to make, for example, a Reader directly
from a file name
Reader r = new FileReader("data.txt");
or a Scanner directly from a FileInputStream. The following
is equivalent to the big line above:
Scanner sc = new Scanner(new FileReader(fname));
The following program illustrates what this flexiblity buys you. It defines three methods, findELF, countNonAlpha and sumInts, that process inputs as streams of bytes, chars, and tokens, respectively. What the program shows is how flexibly each method can be called on a variatey of different input sources. For example, sumInts can be called with the ultimate source of data being a file, a byte array, a string or a character array ... and, of course, we could have called it on stdin as well! The countNonAlpha method provides an interesting example. To highlight the difference between bytes and chars in Java, try the input file
in1 ← save this, don't view in the browser... is interesting because it contains a non-ascii unicode character (a heart). The result is that it is a seven-byte file that contains only four characters.
The first is the class BufferedReader. The issue BufferedReader addresses is this: when a call to read() is made for a Reader that has, for example, a file as its ultimate source for data, that call results at some lower level in a system call to fetch that byte. At this low level, however, fetching a byte-at-a-time is tremendously inefficeint. It typically takes as much time to fetch something like 1024 or 2048 bytes as it does a single byte. Therefore, it would be nice to have a variant of Reader that would fetch, say, 1024 bytes into a buffer the first time read() is called, then dole those out one-at-a-time for each read() call until the buffer is emtpied. Only then would it go back to fetch more bytes from the lower-level — another chunk of 1024. That's what the class BufferedReader does. What's kind of funny is that it does it as a wrapper around another Reader. In other words, BufferedReader is a Reader that takes a Reader and wraps it in this buffering scheme. So for example, if you had a file "data.txt" to read tokens (e.g. integers) from, and you were worried about performance, you might create your Scanner like this:
Scanner sc1 = new Scanner(new BufferedReader(new InputStreamReader(new FileInputStream("data.txt"))));
The BufferedReader will make calls like read(buff,0,1024) to
its underlying InputStreamReader, which will make a call like
read(buff,0,1024) to its underlying FileInputStream, which
will result in a lower-level system call to fetch the next
2024 bytes from the file. The object oriented design of
Java's I/O package makes this possible. By deriving
BufferedReader from Reader, the Java authors provide modified
functionality that can be used anywhere a regular Reader can
be used.
The second example to look at is the class LineNumberReader, which is much easier to explain. Sometimes you want to be able to ask what line you're on as you read input. That's an extra piece of functionality you might wish that a Reader had. The class LineNumberReader extends BufferedReader to provide just that one extra piece of functionality. So now we could redo our Scanner defintion like this:
LineNumberReader r;
Scanner sc2 = new Scanner(r = new LineNumberReader(new InputStreamReader(new FileInputStream("data.txt"))));
... and whenever you want to know what line number you're on
you can call r.getLineNumber().
Once again, the object oriented design of
Java's I/O package makes this possible. By deriving
LineNumberReader from BufferedReader which is derived from
Reader, the Java authors provide new functionality
that can be used anywhere a regular Reader can
be used.
$ java Ex3 "3.1 - 5.5 =" -2.4 $ java Ex3 3.1 - 5.5 = -2.4 |
xxd is a cool little unix utility that converts
to and from hex. For example:
$ echo "3.1 - 5.5 =" | xxd
00000000: 332e 3120 2d20 352e 3520 3d0a 3.1 - 5.5 =.
... shows us that the hex version of "3.1 - 5.5 =\n" is
332e 3120 2d20 352e 3520 3d0a
332e 3120 2d20 352e 3520 3d0aas input, which is hex for
"3.1 - 5.5 =\n", (see the annotation to the right!) I should get
-2.4 as an answer.
Sounds hard, right? But it turns out that, due to the beautiful
object-oriented design of the Java I/O API, this kind of thing
can be done super cleanly. What we can do is add a new class
that acts as an InputStream feeding bytes to the Scanner sc.
But the way it gets those bytes is to read hex characters from
Reader r and do the conversion.
Now the question is: what do we have to do to create a new kind
of InputStream? The answer is to extend InputStream, and if you
look at the InputStream documentation, you will see that the
only abstract method is:
abstract int read(); // Reads the next byte of data from the input stream.So that's telling us that all we need to do to make a new InputStream is to extend it and implement our own read() method. So we will create a new class HexInputStream that extends InputStream and, if we do it right, all we will have to do is change one line in Ex3.java that instantiates Scanner sc:
Scanner sc = new Scanner(r); -CHANGE TO→ Scanner sc = new Scanner(new HexInputStream(r));
... so that Scanner sc will be getting its bytes from the
HexInputStream which, in turn, will be building its bytes from
the characters it is getting from Reader r.
Here's what that looks like:
$ echo "3.1 - 2.5 + 0.2 =" | xxd -c 20 00000000: 332e 3120 2d20 322e 3520 2b20 302e 3220 3d0a 3.1 - 2.5 + 0.2 =. $ java Ex3 "332e 3120 2d20 322e 3520 2b20 302e 3220 3d0a" 0.8 $ java Ex3 332e 3120 2d20 322e 3520 2b20 302e 3220 3d0a 0.8 |
Similar to the input case, we have two separate hierarchies
for output: the hierarchy rooted at OutputStream,
which is for byte-oriented output, and the hierarchy rooted
at Writer, which is for character-oriented
output.
The distinction is a bit blurrier than for the input case,
because
the class PrintStream, which is derived
from OutputStream, provided methods for writing
int's, double's, String's, etc., as
does PrintWriter, which is derived
from Writer. The distinction has to do with how
characters are encoded as bytes: PrintStream using the JVM's
default encoding and PrintWriter allowing the programmer to
independently specify that encoding. These are distinctions
we won't go into here.
Note, however, that System.out and System.err are both
PrintStream objects.