.NA "SPELL - a program to check and correct spelling" "R.D Eager"
.HD July 1982 DOC/EMAS/K2.5/29
.KE SPELL
.S1 Introduction
.PG
This document describes a program which will check the spelling of words
in  a  file, and (if required) generate a corrected version of the file.
There are also several utility commands which are used
for general proofreading and
for maintaining
lists of words (these lists are called "lexicons").
.PG
The program uses a public list of valid words (the
.BD "system lexicon),"
but will also create and maintain a list of words peculiar to the user's
own requirements (the
.BD "private lexicon)."
.S1 "Access"
.PG
Before  SPELL (or any of its utility commands) can be used for the first
time, the following command must be typed:
.sp
.CP
INSERT(PUBLIC.SPELL)
.EC
.sp
It is needed only once, and is remembered for subsequent logons.
.S1 "Overview"
.PG
SPELL operates by scanning the input file, building  a  sorted  list  of
words.  Each word is recorded together with the line number of its first
occurrence and a count of its total number of occurrences.  This list is
then scanned, and the words compared with the contents of the system and
private lexicons.
.PG
If  an  unknown  word  is  found  during this scan, then it is reported,
together with its frequency count and place of  first  occurrence.   The
user then has several options; these are described in Section 5.
.PG
If  a  corrected  version of the input file is required, any corrections
are recorded, and a second pass is made over the input file in order  to
perform the corrections.
.S1 "Using SPELL"
.PG
The program is entered by the command
.sp
.CP
SPELL
.EC
.PG
Possible parameters are as follows.
.sp
.PS
.PA INPUT 1 "Input file" none
The file whose contents are to be checked for correct spelling.
.PA OUTPUT 2 "Output file" "none"
The  file  to which a corrected version of the input file is to be sent.
If this parameter is omitted, no corrected file is produced.
.PA ERRORS 3 "Logging file" "T#LOG"
A list of errors (and any corrections) is sent to this file.
.PA MYLEX 4 "Private lexicon" "LEXICON"
The user's own list of valid words.  It is created if it does not exist.
.PA SYSLEX 5 "System lexicon" "PUBTXT.LEXICON"
The system word list.
.PA UPDATELEX 6 "Update private lexicon" YES
This may take the value YES or NO (or any abbreviation). If the value is
YES,  the  user's private lexicon is updated with any new words which he
has indicated are valid; otherwise no update is done.
.PA WORKSIZE 7 "Workspace size" 16340
The  number of words set aside for workspace.  This will only need to be
increased for large input files.  Note that an indication  of  workspace
usage is given at the end of every run.
.PE
.PG
If the command
.sp
.CP
SPELL(?)
.EC
.sp
is typed, a summary of parameters and their defaults is given.
.S1 "Interaction with SPELL"
.PG
During the interactive phase of a SPELL run, the user is informed of any
dubious  words,  and  asked if they are in fact valid. There are several
possible replies:
.sp
.LB 10
.LP "H"
Prints a short 'help' text, summarising the possible replies.
.sp
.LP "Y"
The  word  is  considered  valid, and is added (if UPDATELEX=YES) to the
private lexicon.
.sp
.LP "N"
The word is erroneous.  This fact is written to the logging file, and no
correction is attempted.
.sp
.LP "E"
The word is 'eccentric'; this means that it is valid, but does not occur
in enough documents to be worth adding to the private lexicon.
.sp
.LP "W"
Wind through to the end, marking all other dubious words as erroneous.
.sp
.LP "Q"
Quit.  The run is terminated immediately; no corrections are  performed,
and the private lexicon remains unchanged.
.sp
.LP "L"
This has a similar effect to Y, but the word is converted to lower  case
before  being  added to the private lexicon.  This is useful if the word
occurred (say) at the beginning of a sentence.  See later Sections for a
description of the treatment of upper and lower case words.
.sp
.LP "word"
The word 'word' is the correct spelling.  This fact is logged,  and  the
word  corrected wherever it occurs in the input file (if corrections are
wanted).
.sp
.LP "=word"
This has a similar effect to just typing 'word', but the  correction  is
recorded in the private lexicon (if UPDATELEX=YES) so that it applies to
all  subsequent runs of SPELL.  This is useful if a word is consistently
misspelled.
.LE
.S1 "Storage of lexicons"
.PG
Private lexicons are stored as ordinary text files; they can be edited
with any text editor.
.PG
System lexicons are stored in a compressed format.  The commands  SPELLC
and  SPELLE  (see  Section  10)  are  available  to  compress and expand
ordinary text files.
.S1 "Upper and lower case letters"
.PG
The policy  regarding  upper  and  lower  case  letters  is  simple  but
effective.   If  a  word  in a lexicon contains upper case letters, then
words in the input file are expected to have at least those  letters  in
capitals.   Thus  'Ada' in the lexicon will match an occurrence of 'ADA'
in the input file, but 'Apl' in the input file will not match 'APL' in a
lexicon.
.S1 "Consistency"
.PG
It  is  desirable that documents should spell any given word in only one
way.  This is achieved by holding only one spelling of each word in  the
system  lexicon, even if there is more than one possible way in which it
is commonly spelt.  In general, the spelling  used  is  that  considered
most  'English'  (e.g. 'disc'  rather  than  'disk').   Users  may  well
disagree with this, and are free to add alternative spellings  to  their
private lexicons.
.S1 "Maintenance of the system lexicon"
.PG
Requests  for words to be added to the system lexicon may be made to the
author of this document.  They  will  be  considered  on  their  overall
usefulness to the user community.
.PG
Whenever a private lexicon is updated, all words that appear also in the
system  lexicon  (because  the latter has been updated) are removed from
that private lexicon in order to save space.
.S1 "The DOUBLE command"
.PG
A common error in writing documents is the unintentional repetition of a
word,  particularly  where  the first occurrence is the last word on one
line and the second occurrence is the first word on the  next  line. The
DOUBLE  command  checks that no two adjacent words in a file are in fact
the same. It takes the following parameters:
.sp
.PS
.PA INPUT 1 "Input file" none
The file to be checked.
.PA OUTPUT 2 "Output file" .OUT
The file to which error messages are to be written.
.PE
.PG
If the command
.sp
.CP
DOUBLE(?)
.EC
.sp
is typed, a summary of parameters and their defaults is given.
.S1 "Utility commands"
.PG
As mentioned above, there are several utility commands  available  which
are  useful for the maintenance of lexicon files.  They are described in
this Section.  In all cases, the command may be  issued  with  a  single
question mark as a parameter, in order to obtain a summary of parameters
and their defaults.
.S2 "SPELLC"
.PG
The SPELLC command compresses a lexicon, either for long term storage or
for use as a system lexicon.  It takes the following parameters:
.sp
.PS
.PA INPUT 1 "Input file" none
The lexicon file to be compressed.
.PA OUTPUT 2 "Output file" none
The file to which the compressed lexicon is to be written.
.PE
.S2 "SPELLE"
.PG
The SPELLE command expands a lexicon compressed by SPELLC.  It takes the
following parameters:
.sp
.PS
.PA INPUT 1 "Input file" none
The lexicon file to be expanded.
.PA OUTPUT 2 "Output file" none
The file to which the expanded lexicon is to be written.
.PE
.S2 "CHECKLEX"
.PG
The CHECKLEX command checks an expanded lexicon for duplicate words, and
words  which  are not in the correct place (dictionary order).  It takes
the following parameters:
.sp
.PS
.PA INPUT 1 "Input file" LEXICON
The lexicon file to be checked.
.PA OUTPUT 2 "Output" .OUT
The destination for any error messages.
.PE
.S2 "STRIPLEX"
.PG
The STRIPLEX command removes all  words  from  a  lexicon  except  those
containing only lower case letters. It takes the following parameters:
.sp
.PS
.PA INPUT 1 "Input file" none
The lexicon file to be 'stripped'.
.PA OUTPUT 2 "Output file" none
The file to which the new lexicon is to be written.
.PE
.S2 "LEXMERGE"
.PG
The LEXMERGE command merges two lexicons; these are assumed to be in the
correct  order.   It  is  particularly  useful  when  several people are
writing a document; they can ensure, at intervals, that they pick up all
the words used by other members of the team.
The command takes the following parameters:
.sp
.PS
.PA INPUT1 1 "First input file" none
One of the lexicon files to be merged.
.PA INPUT2 2 "Second input file" none
The other lexicon file to be merged.
.PA OUTPUT 3 "Output file" none
The file to which the new (merged) lexicon is to be written.
.PE
.S1 "Acknowledgement"
.PG
The  program  and utilities described in this document are based heavily
on two versions of the SPELL program developed  by  Peter  Robinson  and
Dave Singer of the University of Cambridge.  For more details, see CACM,
Vol. 24, No. 5, pp. 296-297 (May 1981).
