Best Programming Language For Text

:date: 2025-06-14 21:11 :tags:

There are a lot of standard challenges that face all serious programmers at some point. My favorite is the problem of encoding text strings with text strings. It is my favorite inevitable problem because, in the programming language I created, I have cured it.

It's nearing 20 years since I came up with my system and during that time I have overwhelmingly preferred it over all other systems I have ever encountered. As one of my better innovations, I felt it deserved to be highlighted on its own.

The Problem

Let's start with an overview of what I'm talking about and the notable forms this problem takes. Simple text is always simple. Consider a generic definition of a string of text in some hypothetical computer language.

the_text = "a simple text value"

That is a very normal way to define text strings. Cool. Easy!

Let's now jump right to the thought experiment that I think best illustrates the worst kinds of problems: text for a user manual. What if the real text string I want to define is part of a programming manual describing the bit about how to define text strings?

text_definition_example = "the_text = "a simple text value""

Serious programmers see the problem immediately and even normal people are likely to sense it  —  the quotes are ambiguous. Are they part of the defining of the string, or part of the string itself?

https://xed.ch/p/gg/Images/seperator_Bizarro.gif

(I love this cartoon! And the magnificent Dan Piraro, the creator of the Bizarro comic, kindly gave me permission to use it to educate about text entry problems. Go check out his website and send him some $.)

The normal way this is handled  —  and universally hated by programmers  —  is to "escape" certain characters from the program syntax so they can remain inert as part of the text string itself. Here's the typical C way.

char text_definition_example[] = "the_text = \"a simple text value\"";

The \" pair gets converted to just a simple double quote character in the resulting text string. But we're writing a manual, remember? How do we then explain that, how to suppress interpreting a double quote as program syntax? Well, here's how that would traditionally be done in C  —  and why this kind of thing makes programmers queasy.

char complex_definition_example[] = "To include a literal quote, use \\\".";

Don't worry if you're eyes are glazing over looking at that  —  that happens to even the sharpest programmers when faced with enough of this kind of syntax gunk. I mention C because it inspired many style elements and ways to solve syntax problems for many, perhaps most, programming languages (Python, Javascript, Java, R, obviously other nominal C derivatives, etc.).

However it is interesting to note that this style of character escaping wasn't new to C. Here is what the solution looked like in Lisp a decade before C.

(setq myString "A quoted \"quote\".")

This "solution" to the problem is by far the most common one in all of software creation.

A newer partial remedy which is found in languages like Python, JavaScript, PHP, Perl, R, Lua, SQL, Go, Rust, and Bash (sort of) is to let the user choose what kind of quote character they'd like to use. These both define strings in, say, Python.

t1 = 'text'
t2 = "also text"

This means you can do this.

fact = 'The first "Windows" keys had Apple logos.'

It is amazing how seldom this type of quote substituting is a good solution. It mostly just leads to annoying capricious style discrepancies.

I won't go into detail but Python's text formatting has bloated into all kinds of exuberant features for string input: raw strings, formatted strings (including an optional format() function), variable substitution in strings (two completely different syntaxes), triple quote (single and double) blocks, Unicode strings, and template class string voodoo. I'm not saying it's bad  —  it's typical for modern languages  —  but it is hard to look at that and say those text syntax rules are clear and simple.

I definitely researched a lot of programming languages when I designed mine and interestingly one of the best ones for string quoting is Fortran. Fortran definitely has the usual mess of backslash escape sequences; beating C by over a decade it may actually be the ultimate source of the technique that is near universal today.

Interestingly Fortran also allows a rather idiosyncratic way to escape quotes in strings. To include an apostrophe in an apostrophe-delimited string, you can repeat it. To include a double quote in a double quote-delimited string, you can also repeat that too.

character(len=20) : : myString = 'A quoted ''quote''.'

This stores A quoted 'quote'. into myString. Ultimately Fortran is still a bit clumsy but I did take some slight inspiration from this syntax.

Why Me?

It may seem ridiculous that I, Chris Edwards, thought up a better solution but it is helpful to understand why this problem was so much more oppressive to me than for normal programmers. When I first was getting serious about writing software I was working as the plant engineer of a machine shop and every day I would see shop prints filled with text like this.

(3) 1' 2-1/4" C'BORE 2" DP.

The reason I thought to write about this today is I was just looking at a land survey document that included tons of directional bearings like this.

ddln-bearings.png

Those are angular minutes and seconds. Every day I also record text noting temporal minutes and seconds for my workouts.

Maybe most programmers can dodge these annoying cases, but I run right into them all the time! When I started designing my own programming language in the early 2000s, the main application I envisioned was to support a technical geometry system (modeling, GIS, solving complex geometry problems, etc.) Solving the problem of how to specify extremely inconvenient text typical of technical geometry was one of my main goals.

My Solution

My solution is that text delimiters are defined  —  always  —  as a pair of single quotes (aka apostrophes): ''. I chose this character because it is a home row key and no shift is needed. Pressing it twice should be slightly more ergonomic than one double quote.

This solves all problems  —  except one. If you've been following along, you may be wondering: ok, how then are a pair of single quotes specified as content text? The answer is what I call the magic character. Normally it is ^, aka ^. Because this glyph usually modifies other characters (e.g. hôpital, hôtel), this character in isolation is very seldom found in natural text, at least in my experience. (Though not impossible! For example, I do use it here.) By putting this magic character directly between two single quotes, the result is that the magic character is dropped, leaving just the two single quotes. Problem solved!

Input:

''A '^' B''

Result:

A '' B

Whoa, hold up! Doesn't that now introduce a new problem? Yes it does. What if you need your text to actually include a '^'? If the magic character gets stripped out what do you do? First, let's understand that we are now firmly into very weird territory. The only natural case I've ever contrived for this to occur is perhaps describing how to construct a regular expression. For example, if I wanted the following text, this way to specify it would not work.

''Expand your term by adding anchors: regexp='^'+term+'$'''

The first solution I have provided for is that you can set your magic character to anything you want with a kind of pragma variable. But that could get messy. Let's stay within the orthodox system and instead solve this very small problem by creating a much smaller one. This will work to create the string shown above.

''Expand your term by adding anchors: regexp='^^'+term+'$'''

When two of the magic characters are found directly between single quotes, one is removed. And yes, to get two magic characters between single quotes, use three and one will be removed. Need 3, use 4. Ad infinitum. Use as many as you need. And that's it. I do not think there are any more rules that are needed to cover all cases. Yes, that is somewhat of an infinite regress but no worse than obnoxious backslash escaping. And every level down you go, the need for another level becomes orders of magnitude smaller.

As I mentioned the best demonstration of the problem is encoding text for a user manual, i.e. something that uses how it works to describe how it works. The following shows the exact characters I use in the definition syntax and then I will show the resulting string. The content of the example input/result pairs are explanatory, and the result text is factually correct.

Input:

''Text is specified with two apostrophe (') characters.''

Result:

Text is specified with two apostrophe (') characters.

Input:

''Typing '^'foo'^' would make a text object containing "foo".''

Result:

Typing ''foo'' would make a text object containing "foo".

Input:

''Note that to get that text '^^' was typed where a '^' was desired.''

Result:

Note that to get that text '^' was typed where a '' was desired.

Input:

''This can be repeated indefinitely to get '^^^' and so on.''

Result:

This can be repeated indefinitely to get '^^' and so on.

Here's a more complicated example. Let's say I want this text.

"He said, 'It's text with "doubles" and 'singles'.'", she replied.
"But what about ""doubled doubles"" and ''doubled singles''?" I asked.

I simply enter this, both lines as is.

''"He said, 'It's text with "doubles" and 'singles'.'", she replied.
"But what about ""doubled doubles"" and '^'doubled singles'^'?" I asked.''

The only weirdness there is inserting the magic character in the content doubled (single) quotes. I would show you how gruesome that is with backslash escaping, but I would probably get it wrong. I don't feel bad about that because even modern LLMs often struggle with this kind of thing.

Problems

In practice, I've never run into problems or limitations with my system but here are some potential arguments against it.

Advantages

That is basically my text quoting system. I think it is better than any other I've ever come across. By a lot. The rest of my programming language is generally superior to all other programming languages because the guys who made the 1990s HP programmable calculators were geniuses and I lifted their stuff pretty much directly. But the text input system is all mine and it is something I'm quite proud of.