Text


What Is "Text"

"Text" — also known as "Plaintext" or "PureText" or "Text With Line Breaks" — is a block of one or more ASCII characters, possibly including some ASCII end-of-line characters. Each character is represented by a single 8-bit byte.

The defining characteristic of text is that it contains only the characters themselves. Text carries no "meta" information whatsoever. Examples of meta information are the font, the size, the spacing, the wrapping, and the style. All of this is omitted. Text consists simply of the characters, nothing else. There is no room for any other information in the text's ASCII representation.

To show how simple text is, I will show the mapping from text characters to bytes in the ASCII text representation. There are 256 possible bytes numbered 0 to 255. The following table shows the relationship between ASCII bytes (shown here in decimal) and ASCII characters.

0..31  Control characters
   32  space
   33  !
   34  "
   35  #
   36  $
   37  %
   38  &
   39  '
   40  (
   41 )
   42  *
   43  +
   44  ,
   45  -
   46  .
   47  /
   48  0
   49  1
   50  2
   51  3
   52  4
   53  5
   54  6
   55  7
   56  8
   57  9
   58  :
   59  ;
   60  <
   61  =
   62  >
   63  ?
   64  @
   65  A
   66  B
   67  C
   68  D
   69  E
   70  F
   71  G
   72  H
   73  I
   74  J
   75  K
   76  L
   77  M
   78  N
   79  O
   80  P
   81  Q
   82  R
   83  S
   84  T
   85  U
   86  V
   87  W
   88  X
   89  Y
   90  Z
   91  [
   92  \
   93  ]
   94  ^
   95  _
   96  `
   97  a
   98  b
   99  c
  100  d
  101  e
  102  f
  103  g
  104  h
  105  i
  106  j
  107  k
  108  l
  109  m
  110  n
  111  o
  112  p
  113  q
  114  r
  115  s
  116  t
  117  u
  118  v
  119  w
  120  x
  121  y
  122  z
  123  {
  124  |
  125  }
  126  ~
  127  DELETE
  128-255 Other printable characters.

Thus, the following block of text:

   This is
   a test

will turn into the following bytes:

   84 104 105 115 32 105 115 10 97 32 116 101 115 116 10

The 10 bytes are end-of-line bytes that indicate the end of the line. On Unix, these are 10. On Macintosh they are 13. On Windows, these are 13 followed by 10.

It should be clear from this example, that there is no room in the format for anything but the characters themselves. Each byte represents a single character. A single special character (or two on Windows) represents the end of a line.

For the purpose of understanding text, the details don't matter. What matters is that text consists only of just the characters and nothing else.

A text file contains text.

Why Is Text Important?

Text is important because it is the Esperanto of the computing world. Here are some examples of where text is used:

Computer Programs: The source code of nearly all computer programs consists of text and is stored in text files. A computer program consists of a collection of text files.

Web Forms: When you fill in fields in a web form, you are entering text.

Configuration Files: Most configuration files in computer systems are text files. For example, all the network routing tables in the internet are text files.

HTML and XML: HTML and XML source code consists of text.

Email messages: Most email messages consist of text. (However, a lot of email is now in a more complex textual form that allows the font and other formatting information to be provided and is not pure text).

Text completely dominates the technical world.

Why Technical Types Like Text Files

Technical people like using text and text files because:

Simplicity: The format is very simple.

Openess: The format is well defined and open. It is not controlled by a single corporation.

Programability: Because the format is simple and open, it is easy to read into computer programs.

Stability: Whereas word processor documents may be unreadable in a just a few years, text files created in the 1960s are still readable by modern text editors. Text format is the simplest most stable format for representing information.

Unix is based around text: Text is the universal data interchange format used by Unix and Unix shell commands. It is far easier for programmers to manipulate text than any other form of data.

Text is easy to email: To email text, just cut and paste it into the email message. There's no need for attachments.

Text "Illiteracy"

Despite the total dominance of text in the technical world, many computer users do not understand the importance of text, and are unable to manipulate it effectively. Yet, without a basic understanding of text files, and the ability to read, edit, and write text files, it is difficult to operate at even the most basic technical level.

The reason that this situation has arisen is because most non-technical computer users are aware of text only through complex word processors such as Microsoft Word. These products cause users to think of text as very complex, containing font, size and other meta information. Users often believe that Microsoft Word is the only way to edit documents, and do not realize that there is a whole world of far simpler text documents that is in use by the technical world.

Working With Text

It's easy to become proficient at working with text. Simply obtain a text editor program and try to edit some text files.

Microsoft Word Is NOT A Text Editor!

While it is possible to edit text with Microsoft Word, it is extremely awkward to do so. First, it may display the file using a variable-pitched font. This means that any tables in the text (that have been aligned by counting characters) will immediately become misaligned. Second, it's too easy to add meta information without realizing that you're doing so. For example, if you adjust the margins, this will not be reflected in the plain text output. Third, every time you save the file, you will have to remember to explicitly save it as "text with line breaks". Fourth, Microsoft Word lacks a lot of the operations that one typically wishes to perform on text. In summary, do not attempt to use Microsoft Word to edit text. Purchase a dedicated text editor instead.

The Text Editor As Text Purifier

Even if you are in the habit of using Microsoft Word, a Text Editor can be very useful as text purifier!

Consider the case when you wish to copy a slab of text from one Microsoft Word document to another, where one document is in one font/size/style, and the second document is in a completely different font/size/style. If you copy the text from the first document to the second, the font/size/style is copied too, and then you have to figure out how to convert the text so that it blends in with the text into which it has just been pasted. This can be time consuming.

A text editor can help by purifying the text. In the first document, select and COPY the text. Then PASTE it into a text document in your text editor. Then COPY it out of the text editor and PASTE it into the target document. You will find that the text takes on the font/size/style of the target document. The reason is that the text editor was incapable of storing all the font/size/style information, so this information was discarded when the text was pasted into the text editor document. What was then copied out was unstyled text.

This style-stripping technique is useful in Microsoft Word, Eudora, FileMaker Pro and many other applications that copy and paste the styling of text.

Where To Obtain A Text Editor

The best text editor for Macintosh is BBEdit which can be downloaded from BBEdit. In February 2002, the price was approximately US$120.

The best text editor for Windows is TextPad which can be downloaded from the TextPad website. In February 2002, the price was approximately US$24.

Ross Williams (ross@ross.net)
1 February 2002

Revised 2 February 2002.


Home   RossHome   Copyright © Ross Williams 2001-2002. All rights reserved.