Nerdtext

I hope everyone is having a decent holiday. At least a lot of places are closed, giving us the day off from absurd moral quandaries emerging from weird banal questions like "Should I avoid the gym today to stay healthy?"

Speaking of odd quandaries, did you know that holiday gift giving has a deadweight loss between a tenth and a third of the value of the gift? Well, try not to dwell on it. I mention it because I believe that (with certain exceptions) homemade gifts are even worse. But what if there was something you could make for yourself that you really appreciated (at full value by definition) that you could also give away to others with little extra cost or effort? From the perspective of the other people, this would probably be a pretty lame "gift", but I believe (to economists anyway) it technically does beat the normal auto-cannibalism of holiday gifting.

In that spirit I’m putting some effort into a public release of some software I wrote for myself that I’m quite pleased with. The program is called Nerdtext and it is a tool to help people communicate efficiently — using the web (mostly) — and without the spectre of surveillance capitalism abusing them.

Nerdtext

If you don’t feel like reading about the backstory, you can jump right to the Nerdtext Project page. If you’re still with me, cool! Let’s take a deeper look at the making of my website.

Ever wonder how I produce this blog? Of course you haven’t! But consider the question rhetorical and let’s pretend you were interested. Ever wonder why my whole website looks like shit? Sure, that one my regular readers may have briefly entertained. It’s a long story.

In high school the number of times I ever took notes in class is precisely zero. In engineering school, notes were much more critical — still zero. And yet today I personally know nobody to be as serious of a note taker as I am. Good notes are the cornerstone of my professional confidence. So why the radical change?

While most tech nerds today have seen the "profound" transition from last year’s fad framework to this year’s, I experienced the tail end of information technology methods used since Latin was the native language of London. Writing stuff down with a stick is a classic but it suffers very badly in several ways. First, it is difficult to duplicate — the monasteries of Europe were serious operations that barely were able to pass along to us a tiny slice of survivorship bias we call Western Civilization. Second, hand written notes and printed books alike are notoriously hard to organize and recall effectively. If you only have a few super important rules you need to communicate, you can carve them onto a chunk of fancy stone and they’re as available to everyone as that post-it with your password on your monitor. But once everyone realizes how super handy literacy can be, suddenly there are post-it notes all over the place. As with the concept of life itself: how do you organize that messy victim of its own success?

The answer of course is computers. Computers have in my lifetime convincingly solved all traditional problems related to literacy, perhaps to the point of obviating it. Of course… Computers… They are their own universe of new and improved problems. Fortunately for me (maybe) I seem to be preternaturally suited to computerized literacy, which at least goes far to compensate for my horrendous penmanship.

So how do I practice computerized literacy? Well, if you asked an F1 driver to talk about "cars", you’ll get some weird esoteric answers. Mine are in the same spirit. But try to keep in mind F1 tech does find its way into passenger cars. For example: have you ever used Slack or Discord or some chat program like that? Did you know that you can make words you type italicized by surrounding them with underscores? This is precisely how everything (←yes) on my website that has ever been italicized has been made so since about 2006.

And that convention goes back much farther. A lot of text embellishment convention was established and popularized on the Usenet. Some if it goes back to conventions people used with manual typewriters but old typewritten documents are incredibly staid by modern standards. They also used things like red ink ribbons and double printing tricks which are not so transferable to simple computer encoding. That’s why I feel it was Usenet, maybe some email, that really tried to forge new typographic conventions that could be expressed well in efficient text encodings — the only kind at the time.

These conventions were not especially complicated. For example…

If you put *asterisks* around something, it seems more serious.
As mentioned _underscores_ can, well, underscore a point. The dictionary literally defines underscore’s purpose "…for emphasis or to indicate intent to italicize…".
Another similar convention is to put a row of dashes or underscores on the line below a title following the thought process of double printing underlined titles on a typewriter.
You might have a list of bullet points that each start with an asterisk. This list is an example of such a thing.
When you see a blank line (feed) between blobs of text, you can naturally infer a paragraph break is implied.

If any of that was terrifyingly complex, you probably have not been a literate human for long. These are quite deliberately the simplest text organizing conventions possible. And I mean simple for humans! At the other end of the spectrum is something like this:

The volume of a cone is: $${1\over3} \pi \rho^2 h$$

I had to actually double check what the formula really was because I’m not especially good at cone geometry or Donald Knuth’s TeX syntax for using text characters to describe complex mathematical typesetting (despite having read the entire fascinating and brilliant TeXbook).

Somewhere in between the simplest text conventions you can imagine and TeX is something like HTML. Most people today are familiar with it. You can right click in your browser and ask to "View Page Source" and see the <insane> <jumbled> <mess> of <tags> and <more tags="and all kinds of weird stuff"/>. HTML is actually not hard or complicated, but it sure looks terrible. If you’re reading it directly, someone is probably paying you.

By the late 1990s it was quickly becoming apparent that the web was going to be a lot more powerful and important than even its pioneers could imagine. In a gold rush atmosphere, everybody was clamoring to figure out how to get a web site that would make them rich. Some people, like me, had personal websites just because they seemed very sensible. I immediately realized that tagging up websites was tedious drudgery that I wanted no part of. Little did I suspect that 99% of computer jobs for the next 25 years would be doing exactly that! So much for the robots and space travel I’d been promised as a youth!

In the early 2000’s some other people were getting pretty sick of HTML work too. Programs that generated HTML that humans never had to look at were becoming popular. This gave us things like wikis and web mail and shopping carts, etc. The famous Aaron Schwartz was doing stuff like inventing RSS, Creative Commons, and, importantly, Markdown.

An important protagonist in this story is Stuart Rackham. He seems to be a Kiwi but his existence is shrouded in mystery now — perhaps he has "retired".

In 2002 he seems to have written the program called AsciiDoc. The purpose of AsciiDoc is to translate a disciplined use of those simple obvious conventions I mentioned earlier that were created for and by humans, directly into something more obnoxious like HTML. I learned about this program in 2005 or early 2006 and immediately started using it for my web pages.

I have been using it continuously since then. For the first 5 years or so, it seemed like any other high quality free software program. It released new features and bug fixes and the documentation improved. But while I was busy worrying about other things, I started to get the feeling that the development of AsciiDoc was faltering. Eventually it was clear that it was not being actively developed.

At the same time Markdown was enjoying a surge in popularity thanks to GitHub promoting it as an aid to documenting software. The simplistic Markdown had been completely unknown to me and when I discovered it, I was disappointed that compared to AsciiDoc, it lacked many features I’d come to depend on.

At some point Stu’s website seems to have disappeared. I feel like there has been some confusion about AsciiDoc’s status. Some helpful internet people have done some work to rescue the code from oblivion. Here, here, here. Or maybe they have rewritten it — it’s difficult to say. The old version was in Python 2 and it looks like it wasn’t Stuart Rackham who brought it into the era of Python 3. It’s all rather confusing.

And some people have definitely rewritten it completely, most notably the Asciidoctor project. We know it is completely rewritten because it is no longer in Python at all — their project is in Ruby. That was a fantastic blessing for the Ruby community! Unfortunately I am not interested in becoming a member thereof.

Like Austin Powers, I’m kind of waking up from a deep sleep while my production server had a very long life of flawless production. I never fiddled with my system because it worked. For almost 10 years. The AsciiDoc that will compile this blog post is by Stuart Rackham and has a copyright notice from 2002-2010.

$ head -n7 /usr/bin/asciidoc
#!/usr/bin/env python2.7
"""
asciidoc - converts an AsciiDoc text file to HTML or DocBook

Copyright (C) 2002-2010 Stuart Rackham. Free use of this software is granted
under the terms of the GNU General Public License (GPL).
"""
$ asciidoc --version
asciidoc 8.6.9

But it’s time to replace it. Well it was about 7 years ago! I actually have had a replacement server running for almost a year but have not switched over yet. One big question that remained was how would I process my very large body of AsciiDoc documents on a new modern system? With the Rackham AsciiDoc retired to a remote bach in the New Zealand wilderness with no internet connection, what could replace it?

I am really not keen on Ruby, Java, or JavaScript (Asciidoctor’s three options). I’m a little bit confused and unnerved by the state of the Python projects. I have huge respect for those keeping the code working, but it doesn’t seem like anyone has replaced Rackham to lead a coherent Python AsciiDoc solution.

And finally, ever since I started using AsciiDoc, something has nagged me. I dislike the way it renders. Sure it’s ok enough that I’ve lived with it for 15 years. But I was really hoping that instead of dying, that the AsciiDoc community would start to make it easier to do what needed to be done to get my own look. This is why if you go to http://xed.ch it will look very different than if you go to http://xed.ch/help/asciidoc. The latter has AsciiDoc’s default 2011 styling which practically screams "this was produced by AsciiDoc!" I always found it a bit weird and problematic that I could not figure out how to fix this.

Obviously I could have figured it out, but if the project seems dead and it’s so much work to dig into how to change the back end to be the way I want, well… We’re starting to enter the territory where I may just want to redo it. My way.

And that is what I did. Over the last few weeks, I wrote a new text processing engine from scratch in C++. This is Nerdtext. I’m not calling it something with AsciiDoc in the name just to avoid confusion. I’m going to be simplifying and modernizing the mark up syntax. It will be focused on my applications, at least at first. I have tested it on all my notes and blog posts (some 800,000 words) and it does pretty well. Very few serious errors or problems, so quite compatible with classic AsciiDoc.

This has been a huge project for me. I feel a small bit stupid having known full well going in that it was a quintessential classic Laundry Problem. But it had to be done. The fact that people will look upon this and very wrongly think "Well, that’s easy!" is just a cross I’ll have to carry.

I definitely improved my C++ skills, especially modern flavors which really are a lot better behaved than they were in the 1990s. I still prefer C and Python, but C++ did well and impressed me.

My plan now is to overhaul my entire website. This will involve transitioning to the new hardware, changing my hosting arrangement a bit, and compiling everything with Nerdtext. That itself will be a pretty serious job. Hopefully this will be one of the last posts that uses archaeological AsciiDoc. (By the way, you can see source code for pretty much any of my pages by replacing the .html with .txt in the URL. For example.)

To check out Nerdtext, have a look at its new project page which is the first web page on my site to be produced entirely by Nerdtext.

http://xed.ch/project/nerdtext