Designing a system for handling I/O is a daunting problem for language/library/API developers. These operations are so ubiquitous in programs that getting it wrong means making pretty much every program anyone ever writes more difficult, and getting it right means making pretty much every program anyone ever writes easier. So, let's consider some disreable properties, some design goals, regarding reading, i.e. the "input" half of I/O. In particular, we'll look at stream-oriented input and output.
class InputStream | class Reader | class Scanner |
InputStream is for byte-based input. | Reader is for character-based input. | Scanner is for token-based input. |
int read(); // should be byte, but uses int // to return -1 to signal EOF void read(byte[] b, int off, int len); void close(); |
int read(); // should be char, but uses int // to return -1 to signal EOF void read(char[] b, int off, int len); void close(); |
String next(); int nextInt(); double nextDouble(); ... |
InputStream / \ / \ / \ FileInputStream ByteArrayInputStreamSo if, for example, you want to write code to search for the bytes 0x7F 0x45 0x4C 0x46, which indicates the beginning of a Unix executable, you would write your method to take an InputStream argument. That way the same method works for both files and byte arrays.
Similarly, specific sources of characters give rise to classes that extend Reader. For example, in the API we have:
Reader_____ / \ \____ / \ \ / \ \ StringReader CharArrayReader InputStreamReader
So, if you wanted to write code to count the number of non-alphabetical characters in text, you would write that method to take a Reader argument. That way the same method works for Strings, arrays of chars and (and this is interesting!) any InputStream — because we can make an InputStream the source of characters for a Reader via the InputStreamReader class! If you look at the API documentation for InputStreamReader, the InputStreamReader constructors take an InputStream as a parameter.
Readable
interface.
However, Reader
implements Readable
, so this constructor works
with Readers
, but is in fact a bit more general
than that.
2. The Scanner constructor that takes an InputStream as an argument is
actually just a convenience thing. You only really need the
constructor that takes a Reader as an argument. Why is that enough?
ScannerClass Scanner has constructors that take InputStreams or Readers as arguments. So if you wanted to write code to do something like add all the integers in some text, you would write that method to take a Scanner as an argument. That way it would work with files, byte arrays, char arrays or Strings. Putting this together, if you have a file whose name is "data.txt" and you want to read in tokens from it (e.g. ints and double and booleans and strings), you would create a scanner for it like this:
Scanner sc = new Scanner(new InputStreamReader(new FileInputStream("data.txt"))); \_____________________________/ an InputStream whose bytes come from file data.txt \____________________________________________________/ a Reader whose chars come from the bytes in data.txt \_________________________________________________________________/ a Scanner whose tokens are made up of chars whose bytes come from data.txtThere are some shortcuts to all of this. So-called "convenience methods" to make, for example, a Reader directly from a file name.
The following program illustrates what this flexiblity buys you. It defines three methods, findELF, countNonAlpha and sumInts, that process inputs as streams of bytes, chars, and tokens, respectively. What the program shows is how flexibly each method can be called on a variatey of different input sources. For example, sumInts can be called with the ultimate source of data being a file, a byte array, a string or a character array ... and, of course, we could have called it on stdin as well! The countNonAlpha method provides an interesting example. To highlight the difference between bytes and chars in Java, try the input file
in1 ← save this, don't view in the browser... is interesting because it contains a non-ascii unicode character (a heart). The result is that it is a seven-byte file that contains only four characters.
The first is the class BufferedReader. The issue BufferedReader addresses is this: when a call to read() is made for a Reader that has, for example, a file as its ultimate source for data, that call results at some lower level in a system call to fetch that byte. At this low level, however, fetching a byte-at-a-time is tremendously inefficeint. It typically takes as much time to fetch something like 1024 or 2048 bytes as it does a single byte. Therefore, it would be nice to have a variant of Reader that would fetch, say, 1024 bytes into a buffer the first time read() is called, then dole those out one-at-a-time for each read() call until the buffer is emtpied. Only then would it go back to fetch more bytes from the lower-level — another chunk of 1024. That's what the class BufferedReader does. What's kind of funny is that it does it as a wrapper around another Reader. In other words, BufferedReader is a Reader that takes a Reader and wraps it in this buffering scheme. So for example, if you had a file "data.txt" to read tokens (e.g. integers) from, and you were worried about performance, you might create your Scanner like this:
Scanner sc1 = new Scanner(new BufferedReader(new InputStreamReader(new FileInputStream("data.txt"))));The BufferedReader will make calls like read(buff,0,1024) to its underlying InputStreamReader, which will make a call like read(buff,0,1024) to its underlying FileInputStream, which will result in a lower-level system call to fetch the next 2024 bytes from the file. The object oriented design of Java's I/O package makes this possible. By deriving BufferedReader from Reader, the Java authors provide modified functionality that can be used anywhere a regular Reader can be used.
The second example to look at is the class LineNumberReader, which is much easier to explain. Sometimes you want to be able to ask what line you're on as you read input. That's an extra piece of functionality you might wish that a Reader had. The class LineNumberReader extends BufferedReader to provide just that one extra piece of functionality. So now we could redo our Scanner defintion like this:
LineNumberReader r; Scanner sc2 = new Scanner(r = new LineNumberReader(new InputStreamReader(new FileInputStream("data.txt"))));... and whenever you want to know what line number you're on you can call
r.getLineNumber()
.
Once again, the object oriented design of
Java's I/O package makes this possible. By deriving
LineNumberReader from BufferedReader which is derived from
Reader, the Java authors provide new functionality
that can be used anywhere a regular Reader can
be used.
Similar to the input case, we have two separate hierarchies
for output: the hierarchy rooted at OutputStream
,
which is for byte-oriented output, and the hierarchy rooted
at Writer
, which is for character-oriented
output.
The distinction is a bit blurrier than for the input case,
because
the class PrintStream
, which is derived
from OutputStream
, provided methods for writing
int's, double's, String's, etc., as
does PrintWriter
, which is derived
from Writer
. The distinction has to do with how
characters are encoded as bytes: PrintStream using the JVM's
default encoding and PrintWriter allowing the programmer to
independently specify that encoding. These are distinctions
we won't go into here.
Note, however, that System.out and System.err are both
PrintStream objects.