Nerdtext

Nerdtext Rabbit

Nerdtext is an AsciiDoc-inspired document processor written in C++. It is not a comprehensive implementation of all possible AsciiDoc features. It deviates from orthodox AsciiDoc when sensible. For example, I can mention "C++" without causing a massive jumbled mess!

Obtaining And Compiling

The entire source code is contained in one file. You're welcome. Get it here.

nerdtext.cxx

Create an executable with this compiler command.

g++ -o nerdtext nerdtext.cxx

Simple Quine Example

If you open up a text editor (Vim, Kakoune, Emacs, gedit, pluma, TextEdit, Atom, etc.) and type up a file that looks something like this:

== Simple Quine Example
If you open up a text editor (https://www.vim.org/[Vim],
http://kakoune.org/[Kakoune],
https://www.gnu.org/software/emacs/[Emacs], gedit, pluma, TextEdit,
Atom, etc.) and type up a file that looks something like _this_:

--------------------------------------------------------
== Simple Quine Example
If you open up a text editor (https://www.vim.org/[Vim],
http://kakoune.org/[Kakoune],
Did you know that "stack overflow" means a computer
is trapped in a Christopher Nolan film?
--------------------------------------------------------

You'll get something like *this section* when you process it with
`nerdtext`.

You'll get something like this section when you process it with nerdtext.

A More Complex Example

Click here to see what I typed into my editor to produce this document. Seriously, do it. It should allow you to instantly understand everything that anyone would need to know about this project.

Running

If you saved your file as x.txt you can now process it in a normal sensible way — both of these work.

./nerdtext -o /tmp/x.html x.txt 
./nerdtext <x.txt >/tmp/x.html

Open the HTML it produces in your editor to understand what exactly is produced. See how it looks in your normal web garbage truck browser by putting something like file:///tmp/x.html into the garbage bar.

Nerdtext Features

Performance

I feel like it could be a lot faster but it's not bad. How not bad? If speed is what you're after, let's get some perspective by looking at your other options. The first words after the title over at https://asciidoctor.org are "A fast text processor..." And, yes, the "fast" is emphasized, so let's go ahead and believe that.

I have been testing with 554 text files from my website plus a few test specific files. This is almost 800k words of blog posts and complex technical documentation collected over the years — real working text.

Asciidoctor: Processes all of that in 89.570s.
Nerdtext: Finishes it all in 12.907s.

Perhaps all that time is being spent doing syntax highlighting on the code blocks — something I haven't hooked up yet to the external GNU source-highlight program. To test this, I excluded the help files which have most of the code examples.

Asciidoctor: 60.420s
Nerdtext: 6.875s

So no. That's not it.

Another example is an ancient Gentoo system I fired up for testing. With that system lacking modern updates, it is easier for me to write an entire AsciiDoc processing system from scratch than to install Ruby. Of course Nerdtext does not compile on this machine because of ancient C++ deficiencies. However, it does run a nerdtext executable statically compiled elsewhere just fine. See how that might be handy? The ancient (2010) Rackham asciidoc.py2 on that ancient system running on some equally ancient hardware takes about 34.5 seconds just to compile one particular document from my test set. Nerdtext running on that same ancient hardware processes the same document in 1.947s.

As you can see Asciidoctor may have features Nerdtext lacks, but if you only need the features Nerdtext has (e.g. the ones that created this page) then it might save you a few seconds. Nerdtext was created for my documents and you'll obviously want to do your own benchmarks on your own material.

Independent

It is compiled and can be deployed as a single binary executable that does not need the correct version of a supporting scripting language. It runs using native op codes just like Dennis Ritchie (and I) thought it should. Hopefully it will not have to be completely rewritten — like the original Asciidoc — when the language it is written in inevitably changes. It uses no external library dependencies, only integrated GCC standard libraries. Because I'm a masochist, it does not even use a regular expressions library. I was able to compile a full version (without symbols) that weighed in at 188kB total.

Nerdy

Its intended target usage is with technical documentation. It can be used for reports, blog posts, poetry, email, love letters, history books, conspiracy theories, fan fiction, ransom notes, etc., but its design goal is to be better with complex documentations for computer nerds. Obviously if you're a mathematician or physicist with some really obnoxious conception of what "complex" means, you'll want to move along to Donald Knuth's circus since he conclusively solved your documentation problems as well as they can be solved. But I've found that normal computer nerd text is missing a light weight approach somewhere between troff and Markdown.

Currently that's only a goal with nothing too exciting actually implemented. But consider that I — someone with 15 years of AsciiDoc experience — was halfway through writing an AsciiDoc parser before I realized that I did not know the difference between a listing block and a literal block. After researching it as much as I could stand, I still do not know. (Do send me a note if you have some hints!) My plan is to use some subtle thing like that for something useful like separating code (listing) from nerdy gibberish your computer dumps on your screen (literal). I think that indented literals are the best way to document one off commands — but do we really need to preserve the possibly problematic unnatural indentation? No. And Nerdtext removes the indentation to the level of the first line of the indented block.

cat <<"Simply_Indented" # No literal delimiter needed.
Looks decent in the source. But why would we ever have a
situation where the indentation was _also_ important?
Simply_Indented

I have noticed that some of the improvements I thought of are not just annoyances I'm dealing with. Asciidoctor is also doing good work on making computer nerd stuff a bit better. One need I have that I see Asciidoctor is also trying to address is talking about what exact keyboard/mouse buttons are to be pressed. Another one that Asciidoctor looks like it's getting around to is how to talk about irritating-to-document GUI elements (they say these features are "experimental"). I'm happy to just pass through weird annoying web garbage but like Asciidoctor, I also came to the conclusion that "video" and "SVG" embedding features were reasonable. I haven't added those yet but they're probably coming.

Simplified

Many bad AsciiDoc "features" are removed for simplicity in implementation and usage. For example, original AsciiDoc has astonishingly complex table syntax — so baroque, in fact, that it would be difficult to claim they were simpler than ordinary HTML/CSS. While the idea of sending AsciiDoc off to dozens of back ends is nice, in practice 99.99999% of AsciiDoc is rendered as HTML. Nerdtext goes with the philosophy that if the markup is more complex than the back end, you should pass through and work with the back end's syntax. Traditional AsciiDoc required all kinds of unmemorable syntax to get simple small tables that were a sensible width. Point is, the table syntax has always been aggravating to me and the tables looked ridiculous. I'll take plain untouched tables thanks. If I have the need for something fancier, I'll pass through some fanciness or adjust my CSS accordingly.

Another example is images that are links. AsciiDoc traditionally has a syntax for images and a syntax for links and a third syntax when both of those are combined. With Nerdtext you simply put a normal image syntax in a normal link's syntax like this.

link:./index.txt[A link] plus an image image:ntlogo80.png[a bunny image] equals

A link plus an image a bunny image equals

link:./index.txt[image:ntlogo80.png[a linked bunny image]] a linked image

a linked image

It's not the easiest jumble to parse, but it seems easier to remember and use.

Back End

The front end for Nerdtext is the text the author types. The back end is what the target audience reads. This can be a little confusing sometimes since normally web page reading people are looking at the front end work. The "front end" to Nerdtext is the source text you type and that it processes. The "back end" is what is produced by Nerdtext — usually HTML.

Twenty years ago, the creator of AsciiDoc was pretty gung ho on Docbook. I barely even know what it is beyond: XML that I've been able to comfortably live without. So Nerdtext will probably not be too attentive to that format. But it is certainly not incompatible if there is a need.

Currently, I have only limited motivation to worry about other back ends. Nerdtext is, however, fully decoupled from the back end so that arbitrary new ones can be added. Currently there are stubs for Markdown (for example, to provide a handy way to convert your AsciiDoc documentation for use on GitHub), plain text (basically just slightly removing the more grating AsciiDoc syntax), and TeX. Now don't go getting too excited — that was just roughed in badly as a proof of concept and would take a lot to get working smoothly. Another one that would be possible to add that is actually quite plausible is RTF. This could, in theory, allow people to bang out large chunks of prose in a sensible composition editor and later import it to the required typesetting software such as Libre Office. If you're a serious nerd, you might want to see groff or some esoteric thing like that. If you're a serious nerd, go for it!

Something to note about the main HTML back end is that it is HTML. What it is not is HTML+CSS. An important aspect of classic AsciiDoc tools that I've found very frustrating is that its output looks bad. From terrible tables to bad fonts (that look ridiculous saying things like 32GB — why are the numbers so weirdly misaligned in fonts like Garamond?) to bad transitions in and out of code text to the tepid color scheme, the normal output of normal AsciiDoc processors is not going to impress anyone. The correct solution: do not try. Leave the styling blank!

Nerdtext supports either calling out to a single CSS file or not. It does not embed a giant bolus of mostly unused CSS (and mysterious JavaScript!) into the back end output. There are many advantages. If you don't want any styling (or don't want to disable NoScript), great you're done. If you do, great, you have a clean slate to work with. This also encourages people to share sensible CSS files so that users might have a choice about how their text looks. Let's try to move away from the times where it was easy to spot an AsciiDoc user a mile away due to the idiosyncratic styling that was not all that good or easy to change.

Testing Style Sheets

Look, I'm no fancy artiste when it comes to decorating web pages. For example, my default color scheme is based mostly on my initials and other easy to type hex colors. But different people do have different ideas about what looks good and I want to do what I can to help them achieve that. I have been frustrated that for 15 years I really didn't know how to change the look of my AsciiDoc documents. Yes, I could have read the manual more times and studied the code, etc, but if I was going to go through that trouble... well, look where we are.

I want to make things quite a bit easier to get started with this topic. Switch between multiple style sheets right now simply by using the buttons below.

	The best CSS is often no CSS!	😀
	My personal default style. Best not to emulate!	source
	Basically random — for illustration purposes.	source
	Demonstrating the concept with terrible execution.	source
	Pulled from the ancient AsciiDoc I use.	source
	Pulled from Asciidoctor's output.	source
	Pulled from AsciiDoc3.py's output.	source
	Traditional and mysterious.	source
	Traditional and even more mysterious.	source

The middle ones are essentially random just to show the kinds of things you might be able to do. The rest are from other AsciiDoc situations and I actually have very little understanding of these stylesheets. I mostly just ran the software I had and pulled these out of the generated HTML. The Volnitsky and Flask are ones I'd vaguely heard of but I don't know much about - these came from AsciiDoc3's tarball. These are almost certain to be interacting "incorrectly" with what Nerdtext produces. YMMV. Hopefully this demonstration is helpful as a starting point for getting exactly what you want.

Basically the default HTML back end behavior of Nerdtext is to request a style sheet (in the same directory) named style.css. If you've got one, great. If not you get clean HTML. You can also change what the name of that file is with the -s option. If you don't want Nerdtext to even think about making a CSS request, use -s . to force only plain HTML. Or pass through that stuff if you like.

You can look over my messy CSS used here in the ./style.css file. But really, I left it terrible as an incentive for people to make their own and get what they really want.

Features Of Orthodox AsciiDoc I'm Not Keen On

The nightmarish ++ for <tt>. Let me repeat: Nerdtext is written in C++! Let us speak no more of this anti-feature.
In theory 'this apostrophe foolishness' is a synonym for emphasized; but I'm not sure that's smart. Also ' is an XML special syntax character. Some would say it should be escaped like < and > but no, it will be left alone. One goal is cutting and pasting from notes should not cause stupid typesetting characters to clutter things up.
Fancy ``quoted text''. This is something I never use and am kind of squeamish about. Avoid.
Nerdtext can detect bare links but chooses not to. There is a specific reason for this. I have been frustrated in the past at not being able to talk about http://a-rather-bad-website.com without making an aggrandizing link to it. If you want that to be a link, just add [] to the end of it and it will marked up like a normal link.
Nerdtext also ignores mailto:name@gmail.com. If you really must have such a crufty thing, run a pass through.
Links in Nerdtext default to opening in a different tab. I know — Tim Berners-Lee will be disappointed, but he is already very disappointed. This is not 1980s Hypercard or GNU Info. Nerdtext is ideally suited for longish prose text where checking a reference is very likely to be a temporary diversion. Sure, the back button works — sometimes — but then you have to find your place again, etc.
Speaking of the previous millennia, I do not like subscripts and superscripts messing with things like Bash's use of tilde expansion etc. That's just pandering to bad 1990's HTML thinking. If you need this kind of thing, use pass through or go big and set up the necessary MathML or MathJax or whatever the cool kids are using these days. Or stick with a chalk board.
Character replacement substitutions are not done capriciously. Stuff like (TM) (C) (R) => ... can all stay just like they are. Go on, ask me how much I care about the punctilious details of "intellectual property" symbology.
I'm a bit confused about how classic AsciiDoc decides to escape this kind of thing: ٦. Probably should be sent through some pass-through.
Some AsciiDoc processing systems make a distinction between inline and block type for images and they use image:: for the latter. I think that's what that's for, but I'm not really sure. If I can't remember/learn the meaning of that cryptic inconsistent syntax, it'd be good to drop it. If you want a block image, put blank lines around the one consistent way to specify images which is image:url/path/file.jpg[hover text].
Out of order sub headings, e.g. starting with === Level Three, are begrudgingly accepted in other AsciiDoc engines but will fail comically in Nerdtext giving you the chance to spot the mistake and not spend years of embarrassment with messed up heading levels.
The subheading methods with different underline characters are stupid. Who could possibly remember the details of that awkward syntax? I make an exception for the main title which uses = Main Title or can be underlined with a row of = of the same length as the title. But for subheadings I only endorse: == Title. I will probably add support for this syntax too: == Title ==.
Bibliography? Footnotes? Ugh. A table of contents is bad enough. I haven't bothered with any of that stuff. Sometimes I do find a table of contents helpful (which I manually create because in the ancient days, that was how you had to do it). But I get the sense that the smarter approach here would be to embrace the unix philosophy and have separate competent programs reconstruct your content for you if that's what you need. I consider such things similar to code syntax highlighting and outside of the scope of Nerdtext. As a quick example, from Vim you can do something like this to get a pretty good start on a table of contents. :r! sed -n '/^==/s/^==* *$.*$$/* <<?,\1>>/p' [C]-r% Note that I'm currently still working on implementing ID attributes and cross references but you get the idea.

License

Software Freedom is very important to me. I wish to support the free software community which has supported me. I will also honor the ghost of Stuart Rackham who released the original AsciiDoc (the only implementation I've ever used) under a GNU GPL.

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Yada YADA... You KNOW the DRILL.

The license should be available at https://www.gnu.org/licenses/gpl-3.0.txt. If for some reason it is not, a copy is distributed with this project at ./LICENSE.

Contact

Want to report a bug in Nerdtext? Want to tell me about some cool thing you made it do? Or wish you could make it do? You can send me an email! Feel free to jazz it up with some Nerdtext or Asciidoc! I use a text mail client so that will work perfectly!

My email address is: nerd at xed.ch

Frivolous Answered Questions

What is AsciiDoc?

Wikipedia says:

AsciiDoc is a human-readable document format, semantically equivalent to DocBook XML, but using plain-text mark-up conventions. AsciiDoc documents can be created using any text editor and read "as-is", or rendered to HTML or any other format supported by a DocBook tool-chain, i.e. PDF, TeX, Unix manpages, e-books, slide presentations, etc.

For me personally, AsciiDoc, or something like it, is a way that I can compose text in a good text editor without worrying about anything but the content. It is a way I can keep my notes and writing scrupulously consistent. This consistency helps reduce mental energy required to express complex ideas, and also helps to cue my recall when revisiting my writing. Like using whitespace to semantically organize Python code, good practices are good; the fact that you can harness them in very powerful explicit ways as a bonus is proof of that.

For me as a practical matter, I can write my web site content and my professional technical notes with complete confidence that my system will be consistent, sensible, and ultimately useful. By using free software that is "owned" by us all collectively and run privately on computers I control, I am assured that my work will not be interrupted by trolls shaking me down for what they think they can extract from me. I am assured that my thoughts committed to written text will not be used against me by unscrupulous companies that spuriously promise "ease" for the chance to eavesdrop at all times. I can be completely confident that I will have a satisfying way to record my thoughts for the rest of my life, even if that turns out to be a very long time.

Why not use Markdown?

I've got nothing against Markdown. GitHub has probably done more for raising the profile of some kind of light mark up convention than anything else ever has. (If you're not aware of it, Markdown is the convention traditionally used by GitHub README documents that commonly describe what a repository is all about.) If Markdown is sufficient for your needs, go for it. It is largely compatible with AsciiDoc — Asciidoctor specifically seems to support much of it. Nerdtext will probably move toward greater support too. There already is a backend for Markdown roughed in which I hope will allow Nerdtext to convert to Markdown. I actually like the link format of Markdown better, but generally Markdown is inadequate for many of my requirements.

Why is it called Nerdtext?

First of all it's easy to get confused by other software with names like "AsciiDocJr" or whatever. It is probably best to make a clean break with the name space. I think Discount is wisely named and generally an inspirational piece of software. Asciidoctor is a decent name too since it properly separates itself from the AsciiDoc concept and classic asciidoc executable that I've been using all these years.

Specifically, Nerdtext is so named because I believe that the hallmark of being a nerd which is necessary and sufficient is applied literacy. Got a lot of text to write? You are a nerd. There's really no way around that. Got a lot of technical text to write about computer topics? Well, it should be pretty obvious that you're not a normal person.

Is Nerdtext capitalized? How exactly does one refer to it?

The project and concept are capitalized — just the N. The executable is not so you can use nerdtext -v when talking about commands as they are typed.

Why should I not use Nerdtext?

The chances are very good that I did not give any thought to how Nerdtext works on your OS. Sorry. Good luck though! Let me know how it goes. Oh hey, just had a report that it compiles just fine with WSL. Enjoy!
It is brand new. It might have some terrible uncorrectable problem. We'll see.
The list of known bugs is admirably enumerated using Nerdtext syntax, but it is already pretty long. The darkness of unknown bugs is vast like the cosmos.
If you need support for some other text processing system's more exotic syntax, that will probably not be as comprehensive as you'd like.
Some reasonable features are simply not done. For example, nested lists, ID attributes, cross references, block titles, other kinds of attributes. They could be added or I could learn to live without them depending on what turns out to be easier.
Syntax highlighting in Vim works pretty well using normal AsciiDoc highlighting, but sometimes for complex documents (like this one that starts off with the dreaded ++), the discrepancies add up. I'll probably have to make my own syntax file soon.
Some of the code is terrible! My hat is off to Stu and everybody who has ever made an AsciiDoc processor because it is a real pain in the ass. As with most software, one learns a lot about the best way to code something by writing it some other way. I'm pretty sure I could do a really good job of it now if I started over — which I'm totally not going to do! There is however a decent chance I will clean up some of the stuff that turned out to be not so well organized. I did lavishly comment the code so it's not just a confusing rabbit hole of madness.
Perhaps you are a not an ordinary nerd. Perhaps you are a several levels beyond that. One inspiration for Nerdtext was a former colleague, a molecular biophysics professor, who, before Asciidoc and Markdown existed, wrote his own text to HTML system in C using his own preferred mark up syntax. He got exactly what he wanted and had complete control over every detail. For the highest caliber of nerd, there is no substitute — that is how you do it. I actually think a project like this would make an excellent curriculum at a school that was earnestly trying to produce competent programmers. It would be like the tools machinists trust and treasure for their entire lives because they made them themselves in trade school. If you are a true alpha nerd, you must honor tradition and use the Force to build your light saber yourself.

In the words of the actually great Apple co-founder:

If you are doing something for a grade or salary or a reward, it doesn't have as much meaning as creating something for yourself and your own life.

Why a rabbit?

Rabbits use the lowest level materials in the (terrestrial) food chain.
Rabbits are on the small end of bigger animals (HTML, XML, TeX) or the large end of smaller ones (Markdown, MediaWiki, in-app convenience mark up).
Rabbits are a bit silly looking and not really taken as seriously as they probably should be. Rabbits are hardcore, grinding away 24/7 until they get eaten.
Rabbits (and C++) poop constantly.
They are not the fastest animal in the world — but they are damn fast and probably way faster than other creatures chosen at random. When you need racing dogs to run very fast, have them chase a rabbit. In sports like marathon running, the "rabbit" is an athlete paid just to keep the pace high regardless of the outcome.
Rabbits are not some delicate tenuous "species of concern"; they thrive nearly everywhere! Ask New Zealand how easy it is to get rid of them.