It used to be that if you were an exclusive Linux user (guilty!) gaming was pretty much not something you did. There just were, relatively, very few games for Linux. However, that list has been growing extremely quickly in recent years thanks to Valve’s SteamOS which is really a euphemism for "Linux".

With this in mind, some time ago (a couple of years?) I purchased an ok graphics card for my son’s gaming computer. Now I’m pretty thrifty about such things and I basically wanted the cheapest hardware I could get that would work and that would reasonably play normal games normally. As a builder of custom workstations for molecular physicists, I’ve had a lot of experience with Nvidia and hardware accelerated graphics. But it turns out that rendering thousands of spherical atoms in the most complex molecules is pretty trivial compared to modern games. So much so that for the workstations I build, I like to use this silent fanless card (GeForce 8400) which is less than $40 at the moment. Works fine for many applications and lasts forever. Here’s an example of the crazy pentameric symmetry found in an HPV capsid taken from my 3 monitors, reduced from 3600x1920, driven by this humble $40 card.

hpv_from3600x1920.png

But for games, it doesn’t even come close to being sufficient.

How do you choose a modern graphics card? I have to confess, I have no idea. I only recently learned that Nvidia cards had a rough scheme to how their model numbers work despite seeming completely random to me.

Eventually I purchased an Nvidia GeForce GTX 760. I thought it worked fine. Recently, my son somehow had managed to acquire a new graphics card. A better graphics card. This was the Nvidia GeForce GTX 1050 Ti. Obviously it’s better because that model number is bigger, right? My son believed it was better but we really knew very little about the bewildering (intentionally?) quagmire of gaming hardware marketing.

Take for example this benchmark.

passmark.png

Sadly they don’t show the GTX 1050, but based on the 1060 and 1070, you’d expect this card to be way better, right?

But then check out this benchmark which does include both. It’s better but not such a slam dunk. (Ours is the Ti version, whatever that means.)

futuremark.png

People often come to me with breathless hype for some marketing angle they’ve been pitched for computer performance and I always caution that the only way you can be sure it will have the hoped for value is if you benchmark it on your own application. You can’t blindly trust generic benchmarks which at best might coincidentally be unlike your requirements and at worst be completely gamed. Since I had these cards and I was curious to find out what the difference between GPUs really looked like, I did some tests.

Before we return to the point of the exercise, playing awesome games awesomely, let’s take a little hardcore nerd detour into another aspect of gaming graphics cards: the zygote of our AI overlords. Yes, all that scary stuff you hear about super-intelligent AI burying you in paperclips is getting real credibility because of the miracles of machine learning that have been, strangely, enabled by the parallel linear algebra awesomeness of gaming graphics hardware.

Last year I did a lot of work with machine learning and one thing that I learned was that GPUs make the whole process go a lot faster. I was curious how valuable each of these cards was in that context. I dug out an old project I had worked on for classifying German traffic signs (which is totally a thing). I first wanted to run my classifier on a CPU to get a sense of how valuable the graphics card (i.e. the GPU) was in general.

Here is the CPU based run using a 4 core (8 with hyperthreading) 2.93GHz Intel® Core™ i7 CPU 870.

Loaded - ./xedtrainset/syncombotrain.p Training Set:   69598 samples
Loaded - ./xedtrainset/synvalid.p Training Set:   4410 samples
Loaded - ./xedtrainset/syntest.p Training Set:   12630 samples

2018-01-16 19:02:54.260119: W tensorflow/core/platform/cpu_feature_guard.cc:45]
The TensorFlow library wasn't compiled to use SSE4.1 instructions, but
these are available on your machine and could speed up CPU
computations.
2018-01-16 19:02:54.260143: W tensorflow/core/platform/cpu_feature_guard.cc:45]
The TensorFlow library wasn't compiled to use SSE4.2 instructions, but
these are available on your machine and could speed up CPU
computations.

Training...
EPOCH 1 ... Validation Accuracy= 0.927
EPOCH 2 ... Validation Accuracy= 0.951
EPOCH 3 ... Validation Accuracy= 0.973
EPOCH 4 ... Validation Accuracy= 0.968
EPOCH 5 ... Validation Accuracy= 0.958
EPOCH 6 ... Validation Accuracy= 0.980
Model saved

Test Accuracy= 0.978

real    4m42.903s
user    17m31.120s
sys     2m28.476s

So just under 5 minutes to run. I could see that all the cores were churning away and the GPU wasn’t being used. You can see some (irritating) warnings from TensorFlow (the machine learning library); apparently I have foolishly failed to compile support for some of the CPU tricks that could be used. Maybe some more performance could be squeezed out of this setup but compiling TensorFlow from source code doesn’t quite make the list of things I’ll do simply to amuse myself.

Hmm, 98% seems suspiciously high. Oh well, it doesn’t matter for benchmarking. Last year I was around 93%. Still. That’s not bad when the expected random selection would pick only 2.3% correctly.

Next I installed the version of TensorFlow that uses the GPU.

conda install -n testenv tensorflow-gpu

Now I was running it on the card that Linux reports as: NVIDIA Corporation GP107 [GeForce GTX 1050 Ti] (rev a1).

Loaded - ./xedtrainset/syncombotrain.p Training Set:   69598 samples
Loaded - ./xedtrainset/synvalid.p Training Set:   4410 samples
Loaded - ./xedtrainset/syntest.p Training Set:   12630 samples

2018-01-16 20:12:40.294673: I tensorflow/core/common_runtime/gpu/gpu_device.cc:955]
Found device 0 with properties:
name: GeForce GTX 1050 Ti
major: 6 minor: 1 memoryClockRate (GHz) 1.392
pciBusID 0000:01:00.0
Total memory: 3.94GiB
Free memory: 3.76GiB
2018-01-16 20:12:40.294699: I tensorflow/core/common_runtime/gpu/gpu_device.cc:976] DMA: 0
2018-01-16 20:12:40.294713: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 0:   Y
2018-01-16 20:12:40.294726: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045]
Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1050 Ti, pci bus id: 0000:01:00.0)

Training...
EPOCH 1 ... Validation Accuracy= 0.920
EPOCH 2 ... Validation Accuracy= 0.937
EPOCH 3 ... Validation Accuracy= 0.975
EPOCH 4 ... Validation Accuracy= 0.983
EPOCH 5 ... Validation Accuracy= 0.971
EPOCH 6 ... Validation Accuracy= 0.983
Model saved

2018-01-16 20:13:18.767520: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045]
Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1050 Ti, pci bus id: 0000:01:00.0)
Test Accuracy= 0.984

real    1m7.441s
user    1m5.344s
sys     0m5.452s

You can see that it found and used the GPU. This took less than a quarter of the time that the CPU needed! Clearly GPUs make training neural networks go much faster. What about how it compares to the other card?

One caveat is that I didn’t feel like swapping the cards again, so I ran this on a different computer. This time on a six core AMD FX(tm)-6300. But this shouldn’t really matter much, right? The processing is in the card. That card identifies as: NVIDIA Corporation GK104 [GeForce GTX 760] (rev a1). Here’s what that looked like.

Loaded - ./xedtrainset/syncombotrain.p Training Set:   69598 samples
Loaded - ./xedtrainset/synvalid.p Training Set:   4410 samples
Loaded - ./xedtrainset/syntest.p Training Set:   12630 samples

2018-01-16 20:13:57.953655: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940]
Found device 0 with properties:
name: GeForce GTX 760
major: 3 minor: 0 memoryClockRate (GHz) 1.0715
pciBusID 0000:01:00.0
Total memory: 1.95GiB
Free memory: 1.88GiB
2018-01-16 20:13:57.953694: I tensorflow/core/common_runtime/gpu/gpu_device.cc:961] DMA: 0
2018-01-16 20:13:57.953703: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0:   Y
2018-01-16 20:13:57.953715: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030]
Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 760, pci bus id: 0000:01:00.0)

Training...
EPOCH 1 ... Validation Accuracy= 0.935
EPOCH 2 ... Validation Accuracy= 0.953
EPOCH 3 ... Validation Accuracy= 0.956
EPOCH 4 ... Validation Accuracy= 0.976
EPOCH 5 ... Validation Accuracy= 0.971
EPOCH 6 ... Validation Accuracy= 0.979
Model saved

2018-01-16 20:14:43.861117: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030]
Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 760, pci bus id: 0000:01:00.0)
Test Accuracy= 0.977

real    1m0.685s
user    1m10.636s
sys     0m7.164s

As you can see, this is pretty close. I certainly wouldn’t want to spend a bunch of extra money on one of these cards over another for machine learning purposes. So that was interesting but what about where it really matters? What about game performance?

This is really tricky to quantify. Some people may have different thresholds of perception about some graphical effects. Frame rate is an important consideration in many cases, but I’m going to assume that 30 frames per second is sufficient since I’m not worrying about VR (which apparently requires 90fps). My goal was to create the setup most likely to highlight any differences in quality. I created two videos, one using each card on the same computer, and then spliced the left side of one to the right side of the other.

This video is pretty cool. In theory, it is best appreciated at 1920x1080 (full screen it maybe). Locally, it looks really good but who knows what YouTube has done to it. Even the compositing in Blender could have mutated something. Even the original encoding process on my standalone HDMI pass-through capture box could have distorted things. (This standalone capture box does produce some annoying intermittent artifacts like the left of the screen at 0:15 and the right at 0:21 — this is the capture box and has nothing to do with the cards.) And of course if you’re using Linux and Firefox you probably can’t see this in high quality anyway (ahem, thanks YouTube).

So that’s video cards for you. What may look like like hardware models with an obvious difference may not really have much of a difference. Or they might. In practice, you need to check them to really be sure. If you noticed any clear difference in the two video sources, let me know, because I didn’t see it. Frame rates for both were locked solidly at 30fps.

Speaking of incredibly small differences, how about those two laps around the Monaco Grand Prix circuit? I drove those separately (in heavy rain with manual shifting) and the driving is so consistent that they almost splice together. I’ve enjoyed playing F1 2015. This is the first time Linux people could play this franchise. The physics are as amazing as the graphics. What is completely lame, however, are the AI opponents (too annoying to include in my video). Wow they are stupid! Computer controlled cars… a very hard problem.