Exposing ChatGPT's Subtle Limitations

:date: 2023-05-03 07:51 :tags:

I got an interesting ChatGPT response that I felt does a good job of calibrating its quality. Make no mistake - its quality is good! But to understand where it makes subtle hard to catch flaws is a very valuable skill.

First we need to understand the setup a bit and hopefully this digression will be slightly valuable on its own.

In the winter I often leave computers on and let them help heat my house; in the summer the opposite strategy. When I power up my normal workstation, I usually log into a "cloud" server  —  which is in my basement  —  with something like this command.

ssh-add /home/xed/.ssh/mykey_rsa && ssh -l${USER} ${SERVER}

That command is wrapped up in a two letter alias (that I chose) so I then simply type in the passphrase and I'm logged in. About as easy as can be imagined once the setup is in place. And unlike the default way of using SSH, by using keys I can continue to log in (or use sshfs or move things with rsync, etc) and I never need to type the password again until I reboot. (Some fancy computers have a "keyring" or something like that but my way is basically just the explicit way to do that without the extraneous middleware.)

This morning I was doing this and after all these years (decades!) of this process I had an explicit insight into another profound advantage of this strategy. So excited was I that I had to drop a little self-hosted "tweet" about it to remind my future self about it.

But then I wondered, is this entirely opaque to people who are not exactly me? I realized ChatGPT would be a reasonable sanity check on a question like that. This involves analyzing complex language and a complex technical component. And in this interesting (and rare) situation, I am the world's leading expert on the correct answer.

So I asked ChatGPT 3.5 this.

I just read a cryptic tweet and I wonder if you can explain it to me: "Yet another reason to use SSH keys: when you mistype the passphrase loading your agent you don't go to jail for N seconds like a criminal."

And here's what it said. (I understand the topic is gobbledygook to normal people; read as much as you like and skip the rest and I'll summarize below.)

This tweet is suggesting that when using Secure Shell (SSH) to remotely access a server or computer, it is recommended to use SSH keys as a means of authentication rather than a passphrase. A passphrase is a sequence of characters used to authenticate a user and unlock access to a private key. When using SSH with passphrase authentication, a user needs to enter the passphrase every time they want to connect to the remote server. If the user mistypes the passphrase, the SSH agent will lock for a period of time, commonly a few seconds, to prevent brute-force attacks. During this lock period, the user will not be able to connect to the server, and it could be frustrating. The tweet implies that mistyping a passphrase when trying to authenticate with an SSH agent can be seen as a criminal activity by some, potentially leading to some negative consequences. Therefore, using SSH keys instead of a passphrase can be beneficial, as it eliminates the need to enter a passphrase, reducing the possibility of mistyping it, and preventing the SSH agent from locking the user out of the server. In summary, using SSH keys is a more secure and efficient way of authentication in comparison to using a passphrase.

Paragraphs one and two reasonably summarize the situation, at least partly. But paragraph three, which sounds lovely if you don't think about it, is not quite right. It is not even wrong. It's just very subtly muddled. And that is the important lesson I'm trying to convey here. (How is it muddled? Skip this if you don't care. For one thing, keys do not "eliminate the need to enter a passphrase" entirely and in our context of the specific times you need to enter the passphrase while using keys, it is problematically off the topic. And an SSH agent does not itself "lock the user out of a server".)

Ok, so that was enough to have a conversation about the topic but not insightfully correct. How about GPT-4?

This tweet is referring to the use of Secure Shell (SSH) keys as a more user-friendly and secure method of authentication compared to traditional passwords. SSH is a cryptographic network protocol used to securely access and manage network devices and servers over an unsecured network. When the person who posted the tweet refers to "mistyping the passphrase loading your agent," they are talking about entering an incorrect password or passphrase when connecting to a remote server. In many systems, when an incorrect password is entered, there is a delay (N seconds) before the user can try again. This delay is implemented as a security measure to prevent brute-force attacks, in which an attacker attempts multiple password combinations in quick succession. The tweet implies that using SSH keys, which are a more secure and convenient method of authentication, can help users avoid the inconvenience of waiting for N seconds after a failed login attempt. With SSH keys, a public and private key pair is generated, and the public key is added to the remote server. The private key remains on the user's local machine, and when they try to connect, the server checks if the public and private keys match. This method does not require users to remember and type in a password, reducing the risk of mistyping and being locked out for a period of time. In summary, the tweet is highlighting the advantages of using SSH keys over traditional passwords, emphasizing that SSH keys can save users from the inconvenience of temporary lockouts due to mistyped passphrases.

That is better  —  one of the points of this post is to highlight what these incremental improvements in AI models look like when exposed in a practical example. However, it is still not great. It blusters about how SSH works which is nice and probably helpful for someone truly trying to learn about this but it has settled on the idea that we're avoiding the timeout penalty by circumventing times when typing passwords is necessary. That's kind of true, but it's not quite right. In the tweet I'm comparing a single log in with a password, to a single time of loading the password into an agent. I'm highlighting that even if you just type the password once in your day, there is still an ergonomic advantage in that you can save yourself the time out duration multiplied by the probability of you making a typo. About a second a day for me!

These both highlight a quirky tendency with these models. ChatGPT is pulled with a gravitational force to the "normal" discussion that usually goes with the words composing your discussion. This does not make these systems useless. I use ChatGPT to see what "normal" members of my target audience think about my writing. Even in this topic it is  —  usefully  —  revealing that casual normal readers will make misinterpretations like it did. As long as you carefully calibrate for this, it is a feature, not a bug.

That is an example of what is "wrong" with ChatGPT. I hope an even bigger point you take away from this is how utterly astonishing it is that it gets this close to a perfect answer. The answers are slightly wrong for subtle contextual reasons, but they're packed with useful correct information and are probably more readable than my quirky prose. These models are the tip of an iceberg that will be changing life for our species in unimaginably profound ways. ChatGPT rather conclusively clarifies Turing's question, "Are there imaginable digital computers which would do well in the imitation game [now called the Turing Test]?" This has been an open question for over 70 years which I've lived with my whole life. It is quite an intellectual thrill to have the answer suddenly arrive: ChatGPT shows us clearly that that the answer is yes.