Nerdtext

Nerdtext Rabbit

Nerdtext is an AsciiDoc-inspired document processor written in C++. It is not a comprehensive implementation of all possible AsciiDoc features. It deviates from orthodox AsciiDoc when sensible. For example, I can mention "C++" without causing a massive jumbled mess!

Obtaining And Compiling

The entire source code is contained in one file. You're welcome. Get it here.

nerdtext.cxx

Create an executable with this compiler command.

g++ -o nerdtext nerdtext.cxx

Simple Quine Example

If you open up a text editor (Vim, Kakoune, Emacs, gedit, pluma, TextEdit, Atom, etc.) and type up a file that looks something like this:

== Simple Quine Example
If you open up a text editor (https://www.vim.org/[Vim],
http://kakoune.org/[Kakoune],
https://www.gnu.org/software/emacs/[Emacs], gedit, pluma, TextEdit,
Atom, etc.) and type up a file that looks something like _this_:

--------------------------------------------------------
== Simple Quine Example
If you open up a text editor (https://www.vim.org/[Vim],
http://kakoune.org/[Kakoune],
Did you know that "stack overflow" means a computer
is trapped in a Christopher Nolan film?
--------------------------------------------------------

You'll get something like *this section* when you process it with
`nerdtext`.

You'll get something like this section when you process it with nerdtext.

A More Complex Example

Click here to see what I typed into my editor to produce this document. Seriously, do it. It should allow you to instantly understand everything that anyone would need to know about this project.

Running

If you saved your file as x.txt you can now process it in a normal sensible way  —  both of these work.

./nerdtext -o /tmp/x.html x.txt 
./nerdtext <x.txt >/tmp/x.html

Open the HTML it produces in your editor to understand what exactly is produced. See how it looks in your normal web garbage truck browser by putting something like file:///tmp/x.html into the garbage bar.

Nerdtext Features

Performance

I feel like it could be a lot faster but it's not bad. How not bad? If speed is what you're after, let's get some perspective by looking at your other options. The first words after the title over at https://asciidoctor.org are "A fast text processor..." And, yes, the "fast" is emphasized, so let's go ahead and believe that.

I have been testing with 554 text files from my website plus a few test specific files. This is almost 800k words of blog posts and complex technical documentation collected over the years  —  real working text.

Asciidoctor
Processes all of that in 89.570s.
Nerdtext
Finishes it all in 12.907s.

Perhaps all that time is being spent doing syntax highlighting on the code blocks  —  something I haven't hooked up yet to the external GNU source-highlight program. To test this, I excluded the help files which have most of the code examples.

Asciidoctor
60.420s
Nerdtext
6.875s

So no. That's not it.

Another example is an ancient Gentoo system I fired up for testing. With that system lacking modern updates, it is easier for me to write an entire AsciiDoc processing system from scratch than to install Ruby. Of course Nerdtext does not compile on this machine because of ancient C++ deficiencies. However, it does run a nerdtext executable statically compiled elsewhere just fine. See how that might be handy? The ancient (2010) Rackham asciidoc.py2 on that ancient system running on some equally ancient hardware takes about 34.5 seconds just to compile one particular document from my test set. Nerdtext running on that same ancient hardware processes the same document in 1.947s.

As you can see Asciidoctor may have features Nerdtext lacks, but if you only need the features Nerdtext has (e.g. the ones that created this page) then it might save you a few seconds. Nerdtext was created for my documents and you'll obviously want to do your own benchmarks on your own material.

Independent

It is compiled and can be deployed as a single binary executable that does not need the correct version of a supporting scripting language. It runs using native op codes just like Dennis Ritchie (and I) thought it should. Hopefully it will not have to be completely rewritten  —  like the original Asciidoc  —  when the language it is written in inevitably changes. It uses no external library dependencies, only integrated GCC standard libraries. Because I'm a masochist, it does not even use a regular expressions library. I was able to compile a full version (without symbols) that weighed in at 188kB total.

Nerdy

Its intended target usage is with technical documentation. It can be used for reports, blog posts, poetry, email, love letters, history books, conspiracy theories, fan fiction, ransom notes, etc., but its design goal is to be better with complex documentations for computer nerds. Obviously if you're a mathematician or physicist with some really obnoxious conception of what "complex" means, you'll want to move along to Donald Knuth's circus since he conclusively solved your documentation problems as well as they can be solved. But I've found that normal computer nerd text is missing a light weight approach somewhere between troff and Markdown.

Currently that's only a goal with nothing too exciting actually implemented. But consider that I  —  someone with 15 years of AsciiDoc experience  —  was halfway through writing an AsciiDoc parser before I realized that I did not know the difference between a listing block and a literal block. After researching it as much as I could stand, I still do not know. (Do send me a note if you have some hints!) My plan is to use some subtle thing like that for something useful like separating code (listing) from nerdy gibberish your computer dumps on your screen (literal). I think that indented literals are the best way to document one off commands  —  but do we really need to preserve the possibly problematic unnatural indentation? No. And Nerdtext removes the indentation to the level of the first line of the indented block.

cat <<"Simply_Indented" # No literal delimiter needed.
Looks decent in the source. But why would we ever have a
situation where the indentation was _also_ important?
Simply_Indented

I have noticed that some of the improvements I thought of are not just annoyances I'm dealing with. Asciidoctor is also doing good work on making computer nerd stuff a bit better. One need I have that I see Asciidoctor is also trying to address is talking about what exact keyboard/mouse buttons are to be pressed. Another one that Asciidoctor looks like it's getting around to is how to talk about irritating-to-document GUI elements (they say these features are "experimental"). I'm happy to just pass through weird annoying web garbage but like Asciidoctor, I also came to the conclusion that "video" and "SVG" embedding features were reasonable. I haven't added those yet but they're probably coming.

Simplified

Many bad AsciiDoc "features" are removed for simplicity in implementation and usage. For example, original AsciiDoc has astonishingly complex table syntax  —  so baroque, in fact, that it would be difficult to claim they were simpler than ordinary HTML/CSS. While the idea of sending AsciiDoc off to dozens of back ends is nice, in practice 99.99999% of AsciiDoc is rendered as HTML. Nerdtext goes with the philosophy that if the markup is more complex than the back end, you should pass through and work with the back end's syntax. Traditional AsciiDoc required all kinds of unmemorable syntax to get simple small tables that were a sensible width. Point is, the table syntax has always been aggravating to me and the tables looked ridiculous. I'll take plain untouched tables thanks. If I have the need for something fancier, I'll pass through some fanciness or adjust my CSS accordingly.

Another example is images that are links. AsciiDoc traditionally has a syntax for images and a syntax for links and a third syntax when both of those are combined. With Nerdtext you simply put a normal image syntax in a normal link's syntax like this.

link:./index.txt[A link] plus an image image:ntlogo80.png[a bunny image] equals

A link plus an image a bunny image equals

link:./index.txt[image:ntlogo80.png[a linked bunny image]] a linked image

a linked bunny image a linked image

It's not the easiest jumble to parse, but it seems easier to remember and use.

Back End

The front end for Nerdtext is the text the author types. The back end is what the target audience reads. This can be a little confusing sometimes since normally web page reading people are looking at the front end work. The "front end" to Nerdtext is the source text you type and that it processes. The "back end" is what is produced by Nerdtext  —  usually HTML.

Twenty years ago, the creator of AsciiDoc was pretty gung ho on Docbook. I barely even know what it is beyond: XML that I've been able to comfortably live without. So Nerdtext will probably not be too attentive to that format. But it is certainly not incompatible if there is a need.

Currently, I have only limited motivation to worry about other back ends. Nerdtext is, however, fully decoupled from the back end so that arbitrary new ones can be added. Currently there are stubs for Markdown (for example, to provide a handy way to convert your AsciiDoc documentation for use on GitHub), plain text (basically just slightly removing the more grating AsciiDoc syntax), and TeX. Now don't go getting too excited  —  that was just roughed in badly as a proof of concept and would take a lot to get working smoothly. Another one that would be possible to add that is actually quite plausible is RTF. This could, in theory, allow people to bang out large chunks of prose in a sensible composition editor and later import it to the required typesetting software such as Libre Office. If you're a serious nerd, you might want to see groff or some esoteric thing like that. If you're a serious nerd, go for it!

Something to note about the main HTML back end is that it is HTML. What it is not is HTML+CSS. An important aspect of classic AsciiDoc tools that I've found very frustrating is that its output looks bad. From terrible tables to bad fonts (that look ridiculous saying things like 32GB  —  why are the numbers so weirdly misaligned in fonts like Garamond?) to bad transitions in and out of code text to the tepid color scheme, the normal output of normal AsciiDoc processors is not going to impress anyone. The correct solution: do not try. Leave the styling blank!

Nerdtext supports either calling out to a single CSS file or not. It does not embed a giant bolus of mostly unused CSS (and mysterious JavaScript!) into the back end output. There are many advantages. If you don't want any styling (or don't want to disable NoScript), great you're done. If you do, great, you have a clean slate to work with. This also encourages people to share sensible CSS files so that users might have a choice about how their text looks. Let's try to move away from the times where it was easy to spot an AsciiDoc user a mile away due to the idiosyncratic styling that was not all that good or easy to change.

Testing Style Sheets

Look, I'm no fancy artiste when it comes to decorating web pages. For example, my default color scheme is based mostly on my initials and other easy to type hex colors. But different people do have different ideas about what looks good and I want to do what I can to help them achieve that. I have been frustrated that for 15 years I really didn't know how to change the look of my AsciiDoc documents. Yes, I could have read the manual more times and studied the code, etc, but if I was going to go through that trouble... well, look where we are.

I want to make things quite a bit easier to get started with this topic. Switch between multiple style sheets right now simply by using the buttons below.

The best CSS is often no CSS! 😀
My personal default style. Best not to emulate! source
Basically random  —  for illustration purposes. source
Demonstrating the concept with terrible execution. source
Pulled from the ancient AsciiDoc I use. source
Pulled from Asciidoctor's output. source
Pulled from AsciiDoc3.py's output. source
Traditional and mysterious. source
Traditional and even more mysterious. source

The middle ones are essentially random just to show the kinds of things you might be able to do. The rest are from other AsciiDoc situations and I actually have very little understanding of these stylesheets. I mostly just ran the software I had and pulled these out of the generated HTML. The Volnitsky and Flask are ones I'd vaguely heard of but I don't know much about - these came from AsciiDoc3's tarball. These are almost certain to be interacting "incorrectly" with what Nerdtext produces. YMMV. Hopefully this demonstration is helpful as a starting point for getting exactly what you want.

Basically the default HTML back end behavior of Nerdtext is to request a style sheet (in the same directory) named style.css. If you've got one, great. If not you get clean HTML. You can also change what the name of that file is with the -s option. If you don't want Nerdtext to even think about making a CSS request, use -s . to force only plain HTML. Or pass through that stuff if you like.

You can look over my messy CSS used here in the ./style.css file. But really, I left it terrible as an incentive for people to make their own and get what they really want.

Features Of Orthodox AsciiDoc I'm Not Keen On

License

Software Freedom is very important to me. I wish to support the free software community which has supported me. I will also honor the ghost of Stuart Rackham who released the original AsciiDoc (the only implementation I've ever used) under a GNU GPL.

Copyright 2021 Chris X Edwards

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Yada YADA... You KNOW the DRILL.

The license should be available at https://www.gnu.org/licenses/gpl-3.0.txt. If for some reason it is not, a copy is distributed with this project at ./LICENSE.

Contact

Want to report a bug in Nerdtext? Want to tell me about some cool thing you made it do? Or wish you could make it do? You can send me an email! Feel free to jazz it up with some Nerdtext or Asciidoc! I use a text mail client so that will work perfectly!

My email address is: nerd at xed.ch

Frivolous Answered Questions

What is AsciiDoc?

Wikipedia says:

AsciiDoc is a human-readable document format, semantically equivalent to DocBook XML, but using plain-text mark-up conventions. AsciiDoc documents can be created using any text editor and read "as-is", or rendered to HTML or any other format supported by a DocBook tool-chain, i.e. PDF, TeX, Unix manpages, e-books, slide presentations, etc.

For me personally, AsciiDoc, or something like it, is a way that I can compose text in a good text editor without worrying about anything but the content. It is a way I can keep my notes and writing scrupulously consistent. This consistency helps reduce mental energy required to express complex ideas, and also helps to cue my recall when revisiting my writing. Like using whitespace to semantically organize Python code, good practices are good; the fact that you can harness them in very powerful explicit ways as a bonus is proof of that.

For me as a practical matter, I can write my web site content and my professional technical notes with complete confidence that my system will be consistent, sensible, and ultimately useful. By using free software that is "owned" by us all collectively and run privately on computers I control, I am assured that my work will not be interrupted by trolls shaking me down for what they think they can extract from me. I am assured that my thoughts committed to written text will not be used against me by unscrupulous companies that spuriously promise "ease" for the chance to eavesdrop at all times. I can be completely confident that I will have a satisfying way to record my thoughts for the rest of my life, even if that turns out to be a very long time.

Why not use Markdown?

I've got nothing against Markdown. GitHub has probably done more for raising the profile of some kind of light mark up convention than anything else ever has. (If you're not aware of it, Markdown is the convention traditionally used by GitHub README documents that commonly describe what a repository is all about.) If Markdown is sufficient for your needs, go for it. It is largely compatible with AsciiDoc  —  Asciidoctor specifically seems to support much of it. Nerdtext will probably move toward greater support too. There already is a backend for Markdown roughed in which I hope will allow Nerdtext to convert to Markdown. I actually like the link format of Markdown better, but generally Markdown is inadequate for many of my requirements.

Why is it called Nerdtext?

First of all it's easy to get confused by other software with names like "AsciiDocJr" or whatever. It is probably best to make a clean break with the name space. I think Discount is wisely named and generally an inspirational piece of software. Asciidoctor is a decent name too since it properly separates itself from the AsciiDoc concept and classic asciidoc executable that I've been using all these years.

Specifically, Nerdtext is so named because I believe that the hallmark of being a nerd which is necessary and sufficient is applied literacy. Got a lot of text to write? You are a nerd. There's really no way around that. Got a lot of technical text to write about computer topics? Well, it should be pretty obvious that you're not a normal person.

Is Nerdtext capitalized? How exactly does one refer to it?

The project and concept are capitalized  —  just the N. The executable is not so you can use nerdtext -v when talking about commands as they are typed.

Why should I not use Nerdtext?

In the words of the actually great Apple co-founder:

If you are doing something for a grade or salary or a reward, it doesn't have as much meaning as creating something for yourself and your own life.

Why a rabbit?