Do Not Fire Your Good Programmers Just Yet

:date: 2026-05-23 10:04 :tags:

Being a programmer these days is like being a portrait painter in 1860  —  clearly the profession is changing a bit.

painting.jpg

A common trope of mainstream media wonders if AI will replace programmers. The answer is: it's complicated. Modern LLMs have been able to impress just about everybody with some amazing tricks at least some of the time. Just to highlight that I'm fully aware of how LLMs can sometimes be insanely competent I thought I'd try to get a very flashy demo of them doing a very good job. I came up with the idea of a C program that simulates a bouncing ball. In less than a couple of minutes from thinking of the request, my normal LLM had instantly produced code that compiled with no problems and was able to produce this.

bounce.gif

That's just astonishingly competent. Capturing the GIF took way more time than creating this program ex nihilo! So good programmers definitely understand that there is a lot of potential here and AI coding can not be dismissed as incompetent.

But those of us who take more than just a casual interest in software usually have some other, less flattering, data points. Today's adventure is not a typical AI fail story where I ask for a thing and get a simple mess. That's easy to produce. What I came across was a programming problem that is what I would consider to be easy.

My programming request was so easy that I was actually able to open my editor, type in an extemporaneous C program in one shot, have it compile, run without errors, and function effectively. Since it was so easy for me I thought to expand the scope a bit and see what possibly better strategy my robot friends would come up with. Wow. I was quite unprepared for the massive incompetence!

I know you're not interested in esoteric C programming details but I think superficially understanding this problem can help one understand the kinds of trouble AI can get into. Robotic readers may recall that I take a special interest in complicated parsing conundrums. This excellent Dan Piraro cartoon best explains what that means.

https://xed.ch/p/gg/Images/seperator_Bizarro.gif

Another good example is the "roger, Roger" scene from the movie Airplane.

I am currently writing a specialized C parser and I have to think about some similar issues. The specifics are unimportant so feel free to ignore them. In fact, it's safe to ignore all the code in this post!

Here is the original prompt I gave.

I'd like a C function that when called clearly demonstrates which gets parsed first: comments or preprocessor directives.

This led to so much nonsense that I had to clarify a bit. Here is the less demanding prompt I used in my tests.

I have two custom research C compilers. One of them parses preprocessor directives first and then handles comments. The other handles the comments and then deploys the preprocessor directives. Can you write a short program that can tell which is which?

Xed

Here is the "natural" intelligence solution that I came up with in less than 5 minutes. It seems to work.

#include <stdio.h>
#define AFTER //", no wait, before"
int main(int argc, char* argv) {
    printf("Pre-processor processed after" AFTER " comments!\n");
    return 0;
}

Mistral

https://chat.mistral.ai/chat

This is my main robot chat friend that I use for a variety of quality of life reasons unrelated to the most insightful output possible. It is the one that instantly produced the bouncing ball demo. Recently it has had an update which shows its "internal dialog" as it passes ideas back and forth to itself. An interesting approach. Normally there are between zero and a couple of such exchanges. For this program, there were 19.

It then took 44 seconds to produce this (which normal people can safely ignore).

#include <stdio.h>

/*
 * Preprocessor directives are processed first.
 * A '#' inside a comment is not recognized as a directive.
 */

// #define IGNORED 42
#define USED 100

int main() {
    printf("%d\n", USED);
    return 0;
}

This is an amazingly hard fail. You can tell it's trying to do a similar strategy to my code but it has totally bungled it. It bungled the main logic. It bungled the fact that preprocessor directives can only be preceded by whitespace.

After much prompting and pages and pages of muddle, the best I got it to do was this code that will throw an error in one case and run in the other. I'm unimpressed with the formatting choice too.

/*
#define x 1
*/int main(){return x;}

Perplexity

https://www.perplexity.ai

This one was absolutely bonkers!

It decided the best way to solve the problem was to...

Run a lightweight “simulation” of each stage and compare results... It is not a full C parser—just a minimal differentiator you can integrate with your actual compilers as a test harness.

What? That is just insane. I'm not sure if this approach is merely a terrible over-complication or actually fully brain dead but it clearly missed the target and then doubled down.

What's even crazier is that it clearly senses I want a minimal solution.

I can help with a small program that distinguishes the two compiler orders... Note: Since you’re asking for a short program, this example focuses on illustrating the concept with simple, deterministic behavior...

And then it proceeds to produce this insane (terribly formatted) program. No need to read it, I'm just showing an overview image of the whole thing to give you an idea of what it thought was "simple".

perplex.png

Euria

https://euria.infomaniak.com/

This is a Swiss bot from my domain registrar that is scrambling to try to provide Euro folks with a non-American AI alternative. I find that it is usually pretty good in many ways and I often turn to it when I want a second opinion  —  a better one.

This one didn't strain to think of a solution  —  this popped out immediately.

/*
 * The following line looks like a preprocessor directive but is
 * inside a multi-line comment.
 *
 * #define FAKE_DIRECTIVE 1
 */

int main(void) {
    #ifdef FAKE_DIRECTIVE
        return 0; /* Compiler A: Preprocessor runs first -> sees #define -> defines FAKE_DIRECTIVE */
    #else
        return 1; /* Compiler B: Comments removed first -> #define ignored -> FAKE_DIRECTIVE undefined */
    #endif
}

Not bad. Seems close and on the right path but I spot an immediate error...

The problem with this is that we know the C preprocessor system will never look at the commented version because it needs to have only whitespace (or nothing) between the start of the line and the # that signals the directive.

It then started rambling for pages and pages coming up with five different versions of essentially this

int main(void) {
    /* #define TEST 1 */
    #ifdef TEST
        return 1; /* Compiler A: Preprocessor runs first -> sees #define inside comment -> TEST defined -> returns 1 */
    #else
        return 0; /* Compiler B: Comments removed first -> #define ignored -> TEST undefined -> returns 0 */
    #endif
}

Note that my objection still stands! What's really crazy is that it was so close the first time. All it would have needed to do was just pull those decorative * characters from the comment! Indeed for a problem like this it could be considered helpful even if I had to make minor adjustments myself.

Claude

https://claude.ai/login

It starts out with this which I think is quite close. But check out that comment styling!

//* This line is a comment in C++ style, but what about the slash before it?
#define MSG "Preprocessor ran FIRST (comments second)"
//*/
#define MSG "Comments ran FIRST (preprocessor second)"

#include <stdio.h>

int main(void) {
    puts(MSG);
    return 0;
}

WTF is that? If it just did normal C comments (/* like this */) then I think it would have worked. But as is, the double // style comments just leave the first define as always present. But it doesn't even matter because the MSG value will always get redefined.

It walks through this code  —  still without me prompting a follow up  —  and then thinks better of it.

Wait — let me think through this more carefully and give you a cleaner, more reliable trick...

This time it uses regular C comments but still defines the commented version before the uncommented one meaning it will never work either.

It rambles a bit more about that code without addressing the aforementioned problem  —  again without me putting in any new prompting  —  and again thinks better of it!

Actually, a #define redefining without #undef is typically a warning/error. Here's the cleanest version:

Finally on the fourth attempt it creates code that I think looks reasonably sensible.

#include <stdio.h>

/*
#define WHICH 1
*/

#ifndef WHICH
#define WHICH 2
#endif

int main(void) {
    if (WHICH == 1)
        puts("Preprocessor ran first (comments were still present during macro pass)");
    else
        puts("Comments ran first (comment block was gone before macro pass)");
    return 0;
}

If it just spit out that final program and no muddled versions or pages of muddled explanations, we'd have a clear winner. But no, it really struggled to come up with something. And that is what I find so fascinating.

ChatGPT

https://chatgpt.com/

Early on in LLM history I used to pay for ChatGPT. But I'm poor and don't have a good ROI for that now. Even still, their free service was by far the most competent of the bots I tried. The OG LLM immediately generated this sensible code with a sensible strategy and no glaring errors. There was no muddled or even extraneous explanations.

#include <stdio.h>

/*
#define MSG "preprocessor-runs-before-comments"
*/

#ifndef MSG
#define MSG "comments-run-before-preprocessor"
#endif

int main(void) {
    puts(MSG);
    return 0;
}

If I were only using ChatGPT, I may not have noticed what a challenge this request seems to be for bots. Good job ChatGPT.

Concluding Lessons

What can we learn from this? Clearly LLM code generation and even problem solving and reasoning is very strong. It is better than humans a lot of the time. Why did this particular problem vex bots more than me? I have a few suspicions.

First, I think this is a very weird request. The fact is that it is difficult to conceive of software that is not mostly like a lot of software that has already existed for a long time. But who asks for some weird thing about the compiler itself? The people writing the compilers are obviously experts who don't have to do this kind of check. I would push back here and defend the program as not merely interesting as a thorn for robots to step on. It is actually a useful program for teaching C and I could imagine using it in a class demonstration. So the program has a valid purpose, but it is unusually unrepresented in online examples.

The second aspect of this problem that might have been more challenging to a bot than a human is its meta quality. Just like the Piraro cartoon, we humans can quickly use language to understand problems like this but LLMs no. Formal theorem provers and weird structured software can go to arbitrarily high dimensions but I think LLMs suffer when the words are not really the words in a straight forward way. It sees words and wants to act, not think. What is extremely interesting as a reflection of our species is that is usually enough.