>>> s = "y÷z" # this defines s as a string of characters >>> len(s) 3 >>> t = s.encode("utf-8") # now we convert to a string of bytes >>> t b'y\xc3\xb7z' >>> len(t) 4So you need to make a decision when you open a file for reading or for writing: do you want to deal with this as text (which means character-based) or as binary (which means byte-based). When you open a file with the
open function, the
second argument is the "mode", which is "r" or "w" for reading
or writing as text, and "rb" or "wb" for reading or writing as binary:
fh = open("foo","r") # open text stream for reading
fh = open("foo","w") # open text stream for writing
fh = open("foo","rb") # open binary stream for reading
fh = open("foo","wb") # open binary stream for writing
Note: we're sticking with text-based I/O for this lesson!
\n.
If s is a string, you can
call s.strip() to strip the whitespace from either
end of s (producing a new string, since strings are
immutable!).
For our first examples, we'll just look at the basic Python pattern for reading through a file a line at a time, and we'll apply it to print the contents of foo.txt, but indented by five spaces. Exciting, right?
Finally, of a text stream is iterable, we can do a list comprehension over it, right? So let's take the entire contents of the file, and store it into a list of strings, where each list element is a line of the file (with newlines stripped out). Putting the lines into a list means we can do fun things, like print the lines out in reverse order.
ACTIVITY
write, which takes a string and writes it to the
stream. The return value is the number of characters written.
>>> import io >>> hout = open("tmp.txt","w") >>> hout.write("This is\na test!\n") 16 >>> hout.close() >>> hin = open("tmp.txt","r") >>> hin.readline() 'This is\n' >>> hin.readline() 'a test!\n'If you import the sys module,
sys.stdin is an input
text stream tied to
sys.stdin is an input
text stream tied to standard in, and sys.stdout is an output
text stream tied to standard out.
readline and read.
If fh is a text input
stream, fh.readline() ... well, it reads the next
line! That's up to and including the \n, by the
way. In the script below, we use readline to skip
the first line of foo.txt.
If fh is a text input
stream, fh.read(n) reads the next n
characters of input (or as many as remain, if fewer
than n characters are left) and returns them as a
string. Important! if you are already at
the end of file, an empty string is returned.
The script below uses fh.read(1) to go through
the input stream character-by-character.
The code, by the way, is simply an implementation of the following transducer:
redactor.py without modification (this means
your script will be a new file that just does
an import to get access to my redactor code) to write a
program that takes two command line arguments, the first is the input
filename, the second the output file name. If - is used for the
first argument, use stdin. If - is used for the second
argument, use stdout.
Going further: add a -s that shows the
redacted text, but without the <...>s that define
redacted blocks or the escape characters. This will require
making some changes to redactor.py.