I just finished a pretty interesting project for the Self-Driving Car Engineering course I’m taking. The project’s goal was to explore something called "behavioral cloning". The idea is that you create a neural network that you hope can do the thinking you need it to do and then you train it using inputs derived from humans actually doing what comes natuarally in similar situations.
My first thought on understanding that much was, OMG! the whole point of autonomous vehicles for me is to eradicate the scourge of idiot humans driving cars. Keeping around their idiot essence in the logic of computers seems extremely unappealing. But that consideration aside, the technology is pretty dang cool.
Perhaps the thing that does the most to make it cool is how the story ends. Is this possible? It turns out, yes it is. First a little digression…
Did you hear about Intel paying $15.3e9 for Mobileye’s self-driving car technology. Intel. Interesting, isn’t it? Why did they do that? It seems that Intel felt that they were falling behind to their rivals in this area. Intel’s rivals make computer chips and their main rival is Nvidia. It turns out that Nvidia already is quite active in the self-driving car market (such that it is). It is also helpful to remember that OpenCV, a project that is critical to almost any kind of autonomous vehicle, was originally funded by Intel.
This brings us back to behavioral cloning. Here is a paper by Nvidia researchers where they describe a system they created that could drive a car without really programming it to do so explicitly.
Here’s how it works. You need a car with cameras and a way to know (and eventually set) the steering angle for every frame the cameras capture. You start by driving around recording all this stuff. You are getting a video which is a sequence of images over time and a corresponding list of steering angles for each of those images. Let’s say the human driver is negotiating a road curving to the right. The road will look like it is curving to the right to humans. That will be obvious. And the human driver will be doing the right thing by turning the wheel to the right, a positive steering angle.
The idea is to make a convolutional neural network that accepts the values of all the pixels as input variables (yes, that may be millions of variables) and returns an output value of a steering angle. Madness, right? I would have thought so. But amazingly, it can work.
My project was to roughly replicate the Nvidia system using a simulator. The simulator would allow you to drive around a track collecting data in the form of images (there were three cameras, but let’s keep it simple) and corresponding steering angles. I wrote a system in Python that used TensorFlow and Keras to create a convolutional neural network with a similar architecture to the Nvidia team’s design. Then I burned many dollar’s worth of electricity running my son’s gaming computer at full gas for about a solid week training the neural network. The end result was that I had a system that could be fed an image in real time and decide what sort of steering that image implied. This allowed the car to drive the track autonomously.
To really get a feel for what this means I think it is most helpful to see it when it fails. Here is short video compilation of a few typical modes of failure when the car was driving autonomously. It’s quite sad and pathetic in a way.
But my car was a fighter! Eventually I got a system that was reliable enough to drive smoothly around this track, anti-clockwise and clockwise, indefinitely. Here’s a quick look at that.
The image quality is terrible because these were 320x160 (RGB) images from the simulated car’s simulated cameras. That’s 150k variables boiled down into one resulting steering angle. And that’s it. Never did I program the idea of a road and that it might be a good idea to stay on it. The program knows nothing and yet enough. It just recognizes the scene should be handled a certain way. It’s astonishing really.
I have a much better understanding why computer science PhD types think that this will prove to be sufficient in our life times. I have a lot more respect for that position, but I still am not on board. First of all this is a pretty simple driving task in the big picture of operating a car in the real world. Second of all a lot can still go wrong and that’s exactly where this kind of system fails the worst-- things it has never seen before like being off the road. And if you’re still overly optimistic, just consider the notion of adversarial neural net training. If this were ever in a live system that used data received from actual human drivers (in their Tesla’s let’s say), you know some mischevious CS researchers would design a driving regime that would demonstrably crash the system (literally) in some weird case.
Still it is a shockingly powerful demonstration of what this kind of neural network can do these days. The fact that the trainnig can be accomplished about 30 times faster on a linear algebra loving GPU explains Nvidia’s enthusiasm. While I don’t think we’ll ride around in cars driven by neural networks that were trained the same way humans were, I do think this technology will prove to be pretty valuable to autonomous car projects. It opens up a lot of possibilities for other projects as well.