RTF (Rich Text Foramt) is a very ancient and horrible format for "word processing" documents. It is from very ancient times and a quasi standard of Microsoft. It can prove useful when trying to generate documents that people will open with Word. It is much harder to take a modern version of Word and make comprehensive reversible RTF, but creating simple things with RTF that can be seen on any version of Word should be possible.
In case the point is missed here, when you are required to create some unholy output that some wretch needs to see with Microsoft Word because they haven’t the wherewithal to handle it any other way, using RTF you might just be able to synthesize something using only wholesome tools that could work. For example, a Python program, an XSLT transformation, or just by hand in Vim.
If you need more of the ghastly details, here is the full RTF specification.
Viewing
Check out apt-get install catdoc
for a simple conversion of RTF.
Conventions
Uses twips (1/20th of a point aka 1/1440th of an inch). 1 pt = 20twips. 1 inch = 1440 twips. Plenty of precision there.
-
\fs28
-
Font size is specified in half points (14 point here).
Minimal RTF Code
Here is a very small sample document.
{\rtf1\ansi\deff0 {\fonttbl {\f0 Times New Roman;}} \f0\fs24 Sample\line Minimal RTF Document}
Here’s some stuff to include for polite operation of a general document. Put this in before the first real content:
\deflang1033 \plain \fs26 \widowctrl \hyphauto \ftnbj
{\header \pard\qr\plain\f0\fs16 \chpgn \par}
This resets everything and sets a default language (us-en). It also
sets a default header. The \chpgn
inserts the current page number.
Structure
Note that spacing is lax where it is lax and important where it is important (usually when literal in your text). New lines don’t usually cause trouble.
Braces are nested properly and the entire document must be enclosed in curly braces.
Major RTF component classes:
- commands
-
Does stuff. Matches
/\\[a-z]+(-?[0-9]+)? ?/
- escapes
-
Specify the hard to specify stuff. See below.
- groups
-
Use braces to save state (
{
) and then do something and then revert state (}
). Thinkgsave
andgrestore
in PostScript. - plaintext
-
Literal text. The actual content. What a concept!
Commands
Commands end in a space or a newline. Otherwise newlines don’t do anything in the text at all. Many spaces after a command will be inserted literally except for the first one which ends the command. That first space could have been a newline with the same effect. You can use newlines pretty safely for formatting the source.
-
\rtf1
-
RTF version 1 document follows. Required first thing. Always use 1.
-
\ansi
-
Windows Code Page 1252 character set.
-
\deff0
-
Default font is font #0 in the font table. See next item.
-
{\fonttbl {\f0 Times New Roman;}}
-
This is the font table defining font #0 as TNR.
-
\fs24
-
Set text to 12 point.
-
\i
-
Italic.
-
\b
-
Bold.
-
\ul
-
Underline.
-
\super
-
Superscript. I’m super, thanks for asking!
-
\sub
-
Subscript.
-
\scaps
-
Small capitals. Sort of an oxymoron.
-
\strike
-
Strike through. Maybe good for generating fake <hr> sorts of things. Or try
{\pard \brdrb \brdrs \brdrw10 \brsp20 \par} {\pard\par}
. -
\plain
-
Turns off any formatting stuff and resets stuff in general.
-
\line
-
Start a new line within the paragraph.
-
\qc
-
"Quadding" (alignment) center.
-
\ql
-
Left justify.
-
\qr
-
Right justify.
-
\qj
-
Full justify.
-
\pagebb
-
Page break here.
-
\keep
-
Don’t split this paragraph across pages.
-
\keepn
-
Keep this paragraph with the next one.
-
\widctlpar
-
Turns on a reluctance to break near the first or last sentence (or two?). Widow/orphan control it’s called.
-
\nowidctlpar
-
Turns off widctlpar if it’s on.
-
\hyphpar
-
Turns on automatic hyphenation for the paragraph. With parameter 0 (
\hyphpar0
) it is canceled for this paragraph. -
\hyphauto
-
Turns on automatic hyphenation for the whole doc if used near the beginning.
-
\sl480\slmult1
-
Single line spacing. 1.5 spacing is 360. Single spacing (240) is the default.
-
\pvpg\phpg\posxX\posyY\abswW
-
Exact paragraph positioning X,Y from top left in twips and W wide.
Those Microsoft people love their backslashes.
Escapes
RTF only uses ASCII values 32-126. To get something else use this kind of escape:
\'a9
Where a9 is a hex number (representing the copyright symbol).
Escapes of interest:
-
\~
-
Non-breaking space, like
-
\-
-
Optional hyphenation point.
-
\_
-
Real hyphen that should not be broken
-
\*
-
Something else.
There are no optional spaces after escape sequences.
Unicode works and is done like this:
\uc1\u2620*
Looks like the star ends it? There are some other funky rules for very high unicode values.
Fonts
In theory a document should have a font table.
Something like this will do:
{\fonttbl {\f0\froman Times;} {\f1\fswiss Helvetica;} {\f2\fmodern Courier;} }
And then you can do {\f1 A Swiss typeface.}
.
Paragraphs
A basic paragraph syntax looks like this:
{\rtf1\ansi\deff0 {\fonttbl {\f0 Times New Roman;}}\f0\fs24
{\pard Some lengthy explanation of how paragraphs in RTF are
structured is probably occurring right now somewhere.\par}
}
Note that there needs to be a literal space after are in the source because the newline isn’t converted to it. It is simply dropped from the source.
The "d" in pard
means "default" and it is implied that the paragraph
defaults are set at the start of a paragraph. YMMV.
Here is a 20pt Centered title line using the paragraph delimters:
{\pard \qc \fs40 Chris X Edwards\par}
I don’t know if it makes a difference which kind of formatting command comes first in cases like this.
Padding
To get vertical space around paragraphs, use:
-
\sbN
-
Add N twips of space before the paragraph.
-
\saN
-
Add N twips of space after the paragraph.
Best to add space after for best general effect, but this can be done at the beginning of the paragraph definition:
{\pard\sa720 \fs24 Some text is followed by 1/2".\par}
Indenting
-
\liN
-
Indent lines N twips of space from the left margin.
-
\fiN
-
Indent lines N twips of space from the paragraph margin (set with
\liN
). You can use a negative number to have the first line be not as indented as the rest. -
\riN
-
Indent lines N twips of space from the right margin.
Margins
-
\margtN
-
Where N is the distance in twips from the top.
-
\margbN
-
Where N is the distance in twips from the bottom.
-
\marglN
-
Where N is the distance in twips from the left.
-
\margrN
-
Where N is the distance in twips from the right.
US Letter:
\paperw15840 \paperh12240
Landscape US Letter:
\paperw15840 \paperh12240 \margl1440 \margr1440 \margt1800 \margb1800 \landscape
Also:
\twoonone
God help us.
Columns
-
\colsN
-
Where N is the number of columns you want.
-
\colsxN
-
Where N is the spacing between columns in twips.
-
\linebetcol
-
Draw a line between columns.
Tables
Tables can be done. Not fun but can be done. Here’s an example:
\trowd \trgaph180
\cellx1440\cellx2880
\pard\intbl Content off upper left box.\cell
\pard\intbl Content off upper right box.\cell
\row
\trowd \trgaph180
\cellx1440\cellx2880
\pard\intbl Content off lower left box.\cell
\pard\intbl Content off lower right box.\cell
\row
Info Section
You can have an optional info section of metadata. This looks like:
{\info
{\title THE DOCUMENT TITLE}
{\author CHRIS X EDWARDS}
{\company PRIVACY LEAK DETECTIVES, LLC}
{\creatim\yr2012\mo8\dy20\hr12\min06}
{\doccomm Any random comment, URL, version number, etc.}
}
Speaking of comments, it seems that RTF does not really support them (since they never expected people like me to compose human readable documents directly in RTF from an editor). But there is some weird syntax designed to allow backwards compatibility. The idea is newfangled commands can get parsed and attended to by newfangled parsers. But old parsers were instructed to be ready for newfangled commands and to skip over them if they didn’t understand them. So basically if you just make up a new command with this syntax, you can have what amounts to a comment. Here’s how:
{\*\xednote{Chris X Edwards}}
{\*\xednote{As long as "xednote" isn't a real function in a parser,
everything in this block should be ignored.}}