Readers and Writers

The Rd.T and Wr.T object types are the basis of a heirarchy of objects for reading from and writing to terminals, files, sockets, text strings and some devices. You can use a Rd.T or Wr.T through several interfaces. The simplest is IO.i3 which allows you to specify an optional Rd.T or Wr.T argument for most of its procedures. If you need finer control over your I/O, you can use the Rd.i3 and Wr.i3 interfaces.

A Rd.T or Wr.T object is usually locked and unlocked on every procedure call in those interfaces. If you want to control locking on your own, you can achieve higher I/O speeds by calling the procedures in the UnsafeRd.i3 and UnsafeWr.i3 interfaces.

The major derived classes of Rd.T and Wr.T are:

TextRd.T and TextWr.T
These classes read from and write to TEXT objects. So for example, you could use a TextWr.T object to collect output from a thread and then decide later whether to print out the whole thing or ignore it.

FileRd.T and FileWr.T
Defined in the user/fs/fscore extension, these classes read from and write to files. They ought to work with any of our filesystems.

ConnRW.RdT and ConnRW.WrT
These classes read from and write to sockets. In order to create such a reader or writer, you must first create a socket connection. Then call ConnRW.NewRd() or ConnRW.NewWr() to get the reader or writer object. These classes are defined in the socketRW extension.

CharDevRd.T and CharDevWr.T
These classes are defined in the user/dev/devcore extension for reading from and writing to character devices.

The SRC programmers have included exhaustive documentation in their Rd.i3 and Wr.i3 interfaces which is repeated below.

The Rd interface

An "Rd.T" (or ``reader'') is a character input stream. The basic operation on a reader is "GetChar", which returns the source character at the ``current position'' and advances the current position by one. Some readers are ``seekable'', which means that they also allow setting the current position anywhere in the source. For example, readers from random access files are seekable; readers from terminals and sequential files are not.

Some readers are ``intermittent'', which means that the source of the reader trickles in rather than being available to the implementation all at once. For example, the input stream from an interactive terminal is intermittent. An intermittent reader is never seekable.

Abstractly, a reader "rd" consists of

| len(rd)           `the number of source characters`
| src(rd)           `a sequence of length "len(rd)+1"`
| cur(rd)           `an integer in the range "[0..len(rd)]"`
| avail(rd)         `an integer in the range "[cur(rd)..len(rd)+1]"`
| closed(rd)        `a boolean`
| seekable(rd)      `a boolean`
| intermittent(rd)  `a boolean`
These values are not necessarily directly represented in the data fields of a reader object. In particular, for an intermittent reader, "len(rd)" may be unknown to the implementation. But in principle the values determine the state of the reader.

The sequence "src(rd)" is zero-based: "src(rd)[i]" is valid for "i" from 0 to "len(rd"). The first "len(rd)" elements of "src" are the characters that are the source of the reader. The final element is a special value "eof" used to represent end-of-file. The value "eof" is not a character.

The value of "cur(rd)" is the index in "src(rd)" of the next character to be returned by "GetChar", unless "cur(rd) = len(rd)", in which case a call to "GetChar" will raise the exception "EndOfFile".

The value of "avail(rd)" is important for intermittent readers: the elements whose indexes in "src(rd)" are in the range "[cur(rd)..avail(rd)-1]" are available to the implementation and can be read by clients without blocking. If the client tries to read further, the implementation will block waiting for the other characters. If "rd" is not intermittent, then "avail(rd)" is equal to "len(rd)+1". If "rd" is intermittent, then "avail(rd)" can increase asynchronously, although the procedures in this interface are atomic with respect to such increases.

The definitions above encompass readers with infinite sources. If "rd" is such a reader, then "len(rd)" and "len(rd)+1" are both infinity, and there is no final "eof" value.

Every reader is a monitor; that is, it contains an internal lock that is acquired and held for each operation in this interface, so that concurrent operations will appear atomic. For faster, unmonitored access, see the "UnsafeRd" interface.

If you are implementing a long-lived reader class, such as a pipe or TCP stream, the index of the reader may eventually overflow, causing the program to crash with a bounds fault. We recommend that you provide an operation to reset the reader index, which the client can call periodically.

Since there are many classes of readers, there are many ways that a reader can break---for example, the connection to a terminal can be broken, the disk can signal a read error, etc. All problems of this sort are reported by raising the exception "Failure". The documentation of a reader class should specify what failures the class can raise and how they are encoded in the argument to "Failure".

Illegal operations cause a checked runtime error.

Many operations on a reader can wait indefinitely. For example, "GetChar" can wait if the user is not typing. In general these waits are alertable, so each procedure that might wait includes "Thread.Alerted" in its "RAISES" clause.

Notice that on an intermittent reader, "EOF" can block. For example, if there are no characters buffered in a terminal reader, "EOF" must wait until the user types one before it can determine whether he typed the special key signalling end-of-file. If you are using "EOF" in an interactive input loop, the right sequence of operations is:

  1. prompt the user;
  2. call "EOF", which probably waits on user input;
  3. presuming that "EOF" returned "FALSE", read the user's input.

The Wr interface

A "Wr.T" (or ``writer'') is a character output stream. The basic operation on a writer is "PutChar", which extends a writer's character sequence by one character. Some writers (called ``seekable writers'') also allow overwriting in the middle of the sequence. For example, writers to random access files are seekable, but writers to terminals and sequential files are not.

Writers can be (and usually are) buffered. This means that operations on the writer don't immediately affect the underlying target of the writer, but are saved up and performed later. For example, a writer to a disk file is not likely to update the disk after each character.

Abstractly, a writer "wr" consists of:

| len(wr)       `a non-negative integer`
| c(wr)         `a character sequence of length "len(wr)"`
| cur(wr)       `an integer in the range "[0..len(wr)]"`
| target(wr)    `a character sequence`
| closed(wr)    `a boolean`
| seekable(wr)  `a boolean`
| buffered(wr)  `a boolean`
These values are generally not directly represented in the data fields of a writer object, but in principle they determine the state of the writer.

The sequence "c(wr)" is zero-based: "c(wr)[i]" is valid for "i" from 0 through "len(wr)-1". The value of "cur(wr)" is the index of the character in "c(wr)" that will be replaced or appended by the next call to "PutChar". If "wr" is not seekable, then "cur(wr)" is always equal to "len(wr)", since in this case all writing happens at the end.

The difference between "c(wr)" and "target(wr)" reflects the buffering: if "wr" is not buffered, then "target(wr)" is updated to equal "c(wr)" after every operation; if "wr" is buffered, then updates to "target(wr)" can be delayed. For example, in a writer to a file, "target(wr)" is the actual sequence of characters on the disk; in a writer to a terminal, "target(wr)" is the sequence of characters that have actually been transmitted. (This sequence may not exist in any data structure, but it still exists abstractly.)

If "wr" is buffered, then the assignment "target(wr) := c(wr)" can happen asynchronously at any time, although the procedures in this interface are atomic with respect to such assignments.

Every writer is a monitor; that is, it contains an internal lock that is acquired and held for each operation in this interface, so that concurrent operations will appear atomic. For faster, unmonitored access, see the "UnsafeWr" interface.

If you are implementing a long-lived writer class, such as a pipe or TCP stream, the index of the writer may eventually overflow, causing the program to crash with a bounds fault. We recommend that you provide an operation to reset the writer index, which the client can call periodically.

It is useful to specify the effect of several of the procedures in this interface in terms of the action "PutC(wr, ch)", which outputs the character "ch" to the writer "wr":

| PutC(wr, ch) =
|   IF closed(wr) THEN `Cause checked runtime error` END;
|   IF cur(wr) = len(wr) THEN
|     `Extend "c(wr)" by one character, incrementing "len(wr)"`
|   END;
|   c(wr)[cur(wr)] := ch;
|   INC(cur(wr));
"PutC" is used only in specifying the interface; it is not a real procedure.

Since there are many classes of writers, there are many ways that a writer can break---for example, the network can go down, the disk can fill up, etc. All problems of this sort are reported by raising the exception "Failure". The documentation of each writer class should specify what failures the class can raise and how they are encoded in the argument to "Failure".

Illegal operations (for example, writing to a closed writer) cause checked runtime errors.

Many operations on a writer can wait indefinitely. For example, "PutChar" can wait if the user has suspended output to his terminal. These waits can be alertable, so each procedure that might wait includes "Thread.Alerted" in its raises clause.

Flush "wr", release any resources associated with "wr", and set "closed(wr) := TRUE". The documentation for a procedure that creates a writer should specify what resources are released when the writer is closed. This leaves "closed(wr)" equal to "TRUE" even if it raises an exception, and is a no-op if "wr" is closed.


garrett@cs.washington.edu