Chris X Edwards

The insane phrase "evidence-based medicine" did not exist before 1990. And even today it still needs to exist.
2017-12-15 13:09
I wonder how many people suffer awful user interfaces and wonder, "Is it just me?" Or is it just me?
2017-12-15 09:58
Almost literally walked into a Waschbär family this morning. Raccoon or rabbit should be UCSD's mascot.
2017-12-14 06:20
Which will come first, a VR car driving game supporting Linux or the end of driving?
2017-12-10 15:07
AI taking over the world soon? Look at all your Amazon recommendations. You can start worrying when nothing is lol wrong.
2017-12-10 11:43

Blender The Beast

2017-12-08 13:00

The first computer program I ever saw run was a 3d graphical virtual reality simulation which was as immersive as any I’ve ever experienced. What is really astonishing is that this took place in 1979 and the program was loaded into less than 48 kilobytes of RAM from a cassette tape. Yes, a cassette tape.

That program, called FS1, was written by a genius visionary named Bruce Artwick. Very soon after my dad and I saw that demonstration, we were among the first families to have a computer in our home. Of course we loved Flight Simulator, as it is better known. But there’s some even more obscure ancient history hiding in there.

Not long after that, Artwick’s company Sublogic released a program called A23D1. You can find an ancient reference to it in the March 1980 edition of Byte magazine. It simply says, "A23D1 animation package for the Apple II ($45 on cassette, $55 for disk)." That is all I can find to remind myself that I wasn’t just dreaming it.

Although Flight Simulator was jaw droppingly spectacular, I almost think that A23D1 was even more historically premature. It was nothing less than a general purpose 3d modeling program and rendering engine. Remember, this was for the 8-bit 6502 processor with 48kB of RAM.

Of course we’re not talking about Pixar level of polish, but in 1980 seeing any 3d computer graphics was nearly a religious experience. I think it would be hard to understand the impact today. It was like looking through a knothole in the fence between our reality and the magical land of the fantastic. Remember at this time the only 3d graphics anybody had ever seen were on Luke’s targeting computer and, as dorky as those graphics look today, at the time we walked out of the theaters no less stunned than if we’d just returned from an actual visit to a galaxy far, far away.

I remember my dad getting out the graph paper and straining his way through the severe A23D1 manual until, many hexadecimal conversions later, he had created a little sailboat in a reality that had not existed before in our lives. To see a window to another universe in our house, tantalizingly under our control, was mind-blowing. These were the first rays of light in the dawn of virtual reality.

I think A23D1 overreached a bit. It was not truly time for 3d. I spent my high school years absorbed by the miraculous new 2d "paint" programs. When I landed my first gig as an engineering intern for a metrology robot company, they had a copy of AutoCAD. I don’t know exactly why because nobody used it or even knew how. I was drawn to it immediately. There was no mouse (yes, the AutoCAD of 1988 had a keyboard-only mode which was pretty commonly used) and the monitor was monochrome. I started systematically building expertise. I eventually learned how to model things in 3d and how to write software in AutoLisp (apparently a direct contemporary of EmacsLisp).

AutoCAD formed the basis of a pretty good engineering career for me. The problem was that I was pushing the limits of what AutoCAD was designed for. I constantly struggled with the fact that (1990s) AutoCAD’s 3d features were roughly bolted on to an earlier 2d product. The expense of AutoCAD was tolerable for a business but not for me personally. As AutoCAD moved away from any kind of cross-platform support, the thought of using it on a stupid OS filled me with dread. As a result of the dark curse of proprietary formats I found myself cut off from a large body of my own intellectual work.

That’s the background story that helps explain why I thought it might be best if I recreated AutoCAD myself from scratch. I was kind of hoping the free software world would handily beat me to it, but no, my reasons are still as good as ever to press on with my own incomplete geometric modeler.

But it is incomplete. And that has been a real impediment for someone like me who is so experienced with 3d modeling. A few years ago, I was making some videos and having trouble finding free software that was stable enough to do the job. I eventually was directed to Blender and I was impressed. I have done a lot of video editing now with Blender (email me for a link to my YouTube empire if you’re interested) and it has never let me down. Blender has a very quirky interface (to me) but it is not stupid nor designed for stupid people. After getting a feel for it I started to realize that this was a serious tool for serious people. I believe it is one of the greatest works of free software ever written.

My backlog of 3d modeling projects has grown so large that I decided to try to get skilled at using Blender at the end of this year. I have envisioned a lot of engineering projects that just need something more heavy duty than what my modeling system is currently ready for. I also think that my system can be quite complimentary to something like Blender.

The problem with Blender for me is that it is a first class tool for artists. But for engineering geometry, I find it to be more of a challenge. My system on the other hand is by its fundamental design the opposite. One of the things that would always frustrate me with bad AutoCAD users (which is almost all of the ones I ever encountered, and if you’re an exception, you’ll know exactly what I mean) is that they often would make things look just fine. This is maddening because looking right is not the same thing as being right. Blender specializes in making things look great. Which is fine but when I start a project I usually have a long list of hard numerical constraints that make looks irrelevant. I’m not saying Blender is incapable; the fact that there’s a Python console mode suggests that all serious things are more than possible with Blender.

But I get a bit dispirited when I go looking for documentation for such things and turn up nothing. Even for relatively simple things this is all too common.


Since I’ve just had such a great experience with on-line education I thought maybe there was some such way to learn Blender thoroughly. And there is! I’ve been going through this very comprehensive course from Udemy. I’m about half way through it and it basically provides a structured way to go through most of the important functionality of Blender while getting good explanations and plenty of practice.

Here’s an example of a stylish low-poly chess set I created.


Not that exciting but a good project to get solid practice with.

With AutoCAD I remember writing all my own software to animate architectural walk-throughs and machine articulation simulations. Obviously Blender comes with all of that refined for a professional level of modern 3d animation craftsmanship. Here’s a quick little animation I did which was not so quick to create, but very educational.


Rendering this tiny thing I learned that Blender is the ultimate CPU and GPU punisher. Simultaneously! If you want to melt your overclocked gaming rig, I recommend Blender.

The reason I think it’s wise and safe to invest so heavily in Blender is that this rug will never be pulled out from under me. I can’t afford AutoCAD so that door is slammed in my face. Blender, on the other hand, is in the public domain. I even have access to the source code if there’s something I don’t like. No excuses.

I hope I can integrate it with the more engineering oriented geometry tools I have written. I am confident that I can use it to start design work on my own autonomous vehicles and to generate assets for vehicle simulations in game engines.

Blender is a fun program. It is heroically cross-platform. You can just download it from If you can’t get inspired by the awesome artwork people have created (e.g.) you’re probably pretty dull. While there is a lot to it, the rewards are commensurate. If you have ever used A23D1, Blender is well within your capabilities. The same is true if you have ever run a virtual fashion empire designing and selling virtual skirts to virtual people. In fact, if that describes you, I would highly recommend you pay the $10 for this Udemy course and get to it!

Patently Ridiculous

2017-12-06 13:33

Years ago I tried to talk some sense about what I feel are overblown fears of scary AI enslaving humanity. In that, I pointed out The Economist pointing out that we’ve been here before. They mention that "government bureaucracies, markets and armies" have supernatural power over ordinary humans and must be handled with care. A new article expands on that theme nicely; the short version is entirely captured by the title, "AI Has Already Taken Over, It’s Called the Corporation".

In my aforementioned post I proposed my own idea that AI wouldn’t be much of a concern because if it was truly intelligent, it wouldn’t care about humans one bit. Sort of like we don’t go around worrying about diatoms even though they’re pretty awesome and vastly outnumber us.

If escaping from the scary menace of SkyNet AI involves, essentially, obscurity, maybe the same is true with the ominous spectre of corporations. For example, USC §271(a) says pretty clearly that, "…whoever … makes, [or] uses … any patented invention… infringes the patent."

Let’s say I’m pursuing a research agenda to accelerate autonomous car technology. If I work for a big company, patents provide a guide to what must be treated as forbidden. If I avoid such entanglements and work by myself, patents can be stolen as expedient with complete disregard to the law. Probably. So I got that goin for me, which is nice.

And with all that in mind, let’s turn now to autonomous car news of the weird. This article in Wired talks about some random engineer dude with an interest in autonomous car company lawsuits. Sort of like me but, apparently, with a bit more disposable cash. If you’ll recall, I wrote about the extremely bizarre testimony in the Waymo v. Uber lawsuit here and here.

This random engineer guy, Eric Swildens, was watching the circus too and he started to get the feeling that the whole case Waymo was presenting was kind of weak for no other reason than the putative infringed upon patent was kind of stupid. Sure enough, he does some minor digging and finds out that there’s prior art and yadayada Waymo’s case is embarrassing and Uber’s defense oversight maybe more so. If any of that sounds interesting, do check out the whole article which is surreal.

But here’s the thing… In those depositions, Waymo seemed pretty pissed off at their man switching teams and taking some tech (and enough bonus pay to start a cult). I thought the technology involved was trade secret stuff. There was all this talk about what was checked out of the version control and who had what hard drive where, etc. But am I to understand that all of this was really about a specific patent which can be accessed by anybody with a web browser (made easy by Google no less)? Something doesn’t make sense.

Whatever. Thanks to the magic of the Streisand effect, I am cheerfully reading through all Waymo’s patents.

Part II - An Example Simplified Until Comprehensible

2017-11-24 09:47

In my last post about machine learning "neural" networks I tried to frame a very rough way to think about that topic. This isn’t because my physical analogy is technically exactly what is going on with machine learning but because it is close enough that it will hopefully help make things clearer when the details are studied in more depth. Well, clearer than neurophysiology!

In this post I will try to simplify and explore some of the math involved in the actual optimization (learning) strategy used in normal neural network approaches. The goal here is to do this with a minimal illustrative example. This means that I’m going to snip away almost all of the complexity of a real neural network system so that some intuition about some core ideas can be a little clearer than when they are later awash in a flood of data and complexity in a real practical system. Although this example is just a "simple" optimization problem, I think it conveys some of the important themes found in machine learning neural network techniques and is helpful for getting acclimated to its important concepts.

Recall from the last article, I proposed a thought experiment featuring a big jumble of hardware arranged in layers with a bunch of adjustment screws. In that example, there was a huge question left unanswered — how much exactly are the adjustment screws adjusted? Since the actual classifier (dog or muffin) is just a complex but essentially similar case to the log example I presented, I’ll focus on the simple log example. In that example, I imagined driving some screws into the logs to do the adjusting. Screws are really just helical wedges so let’s think about that problem visualizing wedges.


Recall that the goal is to adjust all the wedges until the actual value is where you want it. This value that the system actually produces for a given input is often marked as a Y with a hat on it. People even say "why-hat". Plain Y sans chapeau is used to designate what the target should be and thus what we are aiming for. In machine learning the plain Y is often the "label" part of a labeled training set. We want to adjust the system (weights) so that it at least hits these known targets pretty well before trying it on data we don’t have the correct answers for.

Looking over the diagram with the wedges, it’s almost simple enough now to actually do explicit geometric calculations. But I’m lazy so I need to simplify this yet more. We could imagine a big simplification by removing every other wedge and replacing it with a hinge.


Now we’re down to just 3 knobs to adjust. I’ve made them omegas because that seems like a traditional angle sort of measurement and they still look like "w" which will remind us that these angle settings are now the "weights" in the system.

This is definitely doable, but I am even lazier than that. If we simplify this system even further we get something like this.


What’s cool about this format is that although it is structurally very similar to my previous conceptual model, it seems to have taken on a different form. This problem could be a robot arm with 3 servo motors. How would you set the servo motors to put the robot’s gripper on the target? In case you feel we’ve wandered too far away from machine learning, consider that this problem is just an optimization problem and so is machine learning. This highly stripped down version allows us to study it without tons of other complex considerations required by the scale of machine learning’s typical complexity. In other words, machine learning is basically solving problems like this; it’s just usually doing thousands at a time in parallel to be properly useful. We can just focus on one strand of the network that serendipitously has a different practical application.

This particular problem format is in fact an important problem on its own. It is called inverse kinematics and is critical to many fields from robotics to molecular physics. Now that I’ve evolved my tower of logs example into a simpler inverse kinematics problem, how can we solve it using the rough ideas also used at the heart of machine learning?

First let’s consider how we would figure out the structure’s current position given certain settings. If you recall very basic trigonometry and we assume that each segment of the linkage is one unit long, the positions of the joints are very easy to calculate. The lateral position is just the sine of that joint’s angle. We can keep an account of these as we go, each joint’s position added to the previous. Here is some simple code that takes a starting position where the base is located (Y0) and angle settings for each of three joints (w1,w2,w3), and returns the lateral position at each end point (Y1,Y2, and the end, Y3).

from math import sin,cos,radians # This example involves trigonometry.

def calculate_pose(Y0,w1,w2,w3): # Base position and linkage angles.
    Y1= Y0 + sin(w1)             # First arm's end position.
    Y2= Y1 + sin(w2)             # Second arm's end position.
    Y3= Y2 + sin(w3)             # End of entire 3 bar linkage.
    return Y1,Y2,Y3              # Output lateral positions of linkage.

Pretty simple, right? This is the forward pass. We take the system and see how it is with no meddling. Seeing what you’ve got and how the system works out is the first step before messing with things to try and improve the system.

I’m trying to show a radically simplified example here so that the core ideas used in machine learning are less likely to be lost in the bustle of all of the other things necessary for useful deep learning neural networks in practice (a large network, more complex and less visual functions, a lot of data to apply statistics to, framework conventions, etc). So don’t fixate too much on the deficiencies. In most neural network lessons, you will start with a different kind of gross simplification. I feel having two different simple perspectives is helpful.

Once we know how well the system is working, i.e. how far Y3 is from being the same as Y0, we want to adjust the system (weights) so if we try again, we can hopefully do better. The huge difference between neural network techniques and the way humans usually solve these kinds of hard problems is that humans don’t explicitly calculate algorithmic guesses for how to adjust each of the weights. For a computer to attack such problems, this is exactly what must be done.

Since we have 3 weights (the joint angles) that can be adjusted which affect the desired goal, we need to figure out optimal amounts to tweak each of these angles. One might wonder why we can’t just solve for the final answer. In some simple cases maybe that’s possible, but even in this one there are many (infinite) settings of the weights that will line up the end of the arm with the base. Perhaps with more constraints you could just solve it but in practice the complexity will make that notion prohibitive. We just want to converge effectively on something that works with a simple algorithm because in neural networks we’ll be applying it a gazillion times.

The main gist of how this works is we consider in turn how each weight affects the overall error. In other words, if I turn w1, how does the Y3 end position change? Or similarly but more importantly how does the distance to the target change? I’ll call that distance E for error and unlike the position, Y3, it will always be positive. We ask the same about w2 and w3. For people who can remember calculus, these values are the derivatives of E (the error) with respect to each weight. If I turn w1 quickly, does the error E change slowly or quickly? Does it go up or down? That’s what we’re looking for. Math people write this quantity as a "dE" over "dw1" like a fraction (maybe even using Greek deltas). As a programmer I’ll write it like dE_dw1.

The trick with machine learning often involves very elaborate networks of calculations that are as simple as I’ve contrived. It is generally necessary to calculate the change in error, dE, with respect to an intermediate thing changing and then calculate how that intermediate thing changes with respect to your important weight adjustment. There can be many layers of this. This is what back propagation really is.

With all that explained let’s continue with the program and see how we can figure out how to adjust the weights to lower the error.

def update_weights(Y0,w1,w2,w3):
    Y1,Y2,Y3= calculate_pose(Y0,w1,w2,w3)

Here’s a new function and the first thing to do is figure out where we’re at with the weights as they are. You could think of this first step as the forward pass or forward propagation.

    E= .5*(Y0-Y3)**2 # Magnitude of error.
    dE_dY3= -(Y0-Y3) # Change in error as Y3 changes (just Y3 for Y1=0).

This next bit looks ugly but is really not too bad. The E is the error we want to minimize. We’re trying to make Y3 line up with the base at Y0, so their differences need to be close to zero. The first line just calculates the sum of the squared error, SSE, to prevent large negative errors from seeming better (smaller) than small positive errors.

Next the derivative dE_dY3 is calculated. This is the change in error E with respect to the change in Y3 (the position of the end of the linkage). Obviously this is a very simplistic thing to worry about but it is illustrative of the bulk of the work that is done in real neural networks at deeper layers. This also shows why it’s often traditional to multiply by 1/2 when calculating E (because the derivative of .5*x*x simplifies to just x).

One thing I do remember from my many misspent years studying calculus is that the derivative of the sine function is, interestingly, the cosine function. This means that the rate of change in each arm’s position is related to the joint angle’s rate of change by cosine. That gives us this.

    dY3_dw1= cos(w1) # Rate of change of Y3 as w1 is adjusted.
    dY3_dw2= cos(w2) # Rate of change of Y3 as w2 is adjusted.
    dY3_dw3= cos(w3) # Rate of change of Y3 as w3 is adjusted.

But this isn’t exactly what we’re after. We need to link the adjustment of the joint with the final error and currently we have joint angle to position, and position to error. To chain these two steps together, we use a trick of calculus called the chain rule. When I learned the chain rule long ago, I was confident that it could be safely forgotten. But no! It’s actually quite useful and really at the heart of allowing neural network machine learning to be possible. If you want to brush up on your calculus, look carefully at the chain rule.

If getting your head around how exactly the chain rule works and why it is important seems hard, thankfully, just deploying it is refreshingly easy. Here it is in action.

    dE_dw1= dE_dY3 * dY3_dw1 # Chain rule.
    dE_dw2= dE_dY3 * dY3_dw2
    dE_dw3= dE_dY3 * dY3_dw3

Again, that’s a super simple example by design for educational purposes. In practice this will get ugly enough that you will definitely want a computer to keep track of things but conceptually, this is all there is to it.

After that step, we know how the error, E, is linked to each weight. Now comes the part where we actually adjust the weights. This introduces something called the "learning rate". Imagine I’m leveling my log tower by turning screws. I may feel like a full turn of screw J will bring down the error twice as much as a full turn of screw K. That’s super helpful (and basically what we have with dE_dw1, etc) but that still leaves an important practical question — how much should I actually turn those screws? I could turn K one turn and J two turns. Or I could turn K half a turn and J one turn. Or K 6 turns and J 12. We know which screws most effectively solve our problem relatively speaking but we don’t know how much of that solution to apply. The answer to this question is specified by the "learning rate". This is often shown with a greek letter eta (though other conventions are annoyingly common).

In neural network training, this is a hyperparameter which must be selected by the designer. You can imagine that 100 turns with K and 200 with J might overshoot your goals while 0.1 degree of J turning and 0.2 degrees of K might not accomplish enough to be useful in a reasonable amount of adjustment iterations. You just have to choose based on intuition and make revisions if it is not improving at a sensible pace.

Now the weights can be corrected using the original weights and the learning rate and the connection factor between the error and this weight. This is known as the delta rule though memorizing that fact doesn’t seem critical.

    eta= .075           # Learning rate. Chosen by trial and error.
    w1= w1 - eta*dE_dw1 # Delta rule.
    w2= w2 - eta*dE_dw2
    w3= w3 - eta*dE_dw3
    return w1,w2,w3     # New improved weights ready for another try!

And that is basically it. Now we just need to do this operation a decent number of times. Each time the metaphorical tower is disassembled, adjusted, and reassembled is called an "epoch".

Another surprisingly important technicality is choosing where the system starts from. This example is so simplified that if all the joints are set to zero, no further work is needed! But in real neural networks, the opposite is often true. By setting all the weights to zero initially, you often have a terrible time training it. It is common that performance is greatly enhanced with starting weights set randomly. Often subtle changes in this can have a huge impact on overall learning success. For example, maybe setting them with a Gaussian distribution versus just purely random noise. But in our little example, I’ll just pick some nice looking arbitrary starting angles.

Here then is the main program that actually iterates towards a solution.

Y0= 0                                          # Initial input.
w1,w2,w3= radians(22),radians(-20),radians(14) # Initial arbitrary weights.
print(calculate_pose(Y0,w1,w2,w3))             # Show initial pose.
for epoch in range(20):                        # Iterate through epochs.
    w1,w2,w3= update_weights(Y0,w1,w2,w3)      # Keep improving weights.
print(calculate_pose(Y0,w1,w2,w3))             # Show final pose.

When I run this I get the following output.

(0.374606593415912, 0.0325864500902433, 0.274508345689911)
(0.2852955778665301, -0.14244640527322044, 0.0029191436234602963)

These are the lateral displacements of the end of each arm segment. Since the overall objective was to get the end of my robot arm to line up with the base (which was zero), we were hoping that the final number would come down close to zero and it did!

I ran this with a bunch more different starting weights so we can see how and how well the algorithm finds the desired solution. These diagrams show the starting pose as a red line and the final solution pose in green. This one shows an arm with the first joint at 10 degrees, the second set to 15, and the third set to 10 (these angles are all with respect to absolute horizontal, not the previous segment).



As you can see the initial pose quickly converges on the correct pose. The learning rate will influence how jumpy the transition is. The number of epochs controls how persistent it is and how many intermediate poses are attempted before returning a best guess final answer.



This one shows that even when the error is negative, this strategy still tries to minimize it back to a horizontal zero.

Here are some diverse examples showing that it can pretty reliably and sensibly find a solution.







These next two show that the algorithm isn’t perfect. By prioritizing adjustments based on the derivatives, you can see that this cosine strategy penalizes valid improvements where the angles are close to 90. When the angle is close to 90, the cosine (derivative of the position’s function, sine) is close to zero so not much gets improved at that location even though it could theoretically be doing more to help.





This one seems even worse even if it did manage to find a solution.



This next one did struggle to find a satisfactory solution in the number of epochs I allowed.



For this next one, I changed the Y0 value to be 0.7 which merely shifts the whole thing up.

-75,-55,0 with Y0=.7


We could easily set up this system so that the target (Y3) and the input (Y0) could be different and this would allow us to move a robot arm to arbitrary elevations. Traditionally the input to the system (not the weights which are the system) is called X but in a graphical geometric example, that is a bit confusing.

The big leap from this simple example to proper machine learning are systems where the input vector X (Y0 here) can be novel previously unseen circumstances and, because the weights are set (trained) so cleverly, the output reflects some useful insight. For example, you could imagine putting the input (Y0) at the number of legs a creature has and training the system with a lot of examples until the system’s weights can position the end of the arm below zero for mammals and above zero for insects. We know from general experience that the math is pretty simple there (4 or less legs, probably not an insect) but that is something the system can start to figure out on its own if you keep giving it known examples (number of legs and correct invertebrate status). The functions of how the joint angles are set by the weights (purely geometric and in the simplest way possible in my example) may need to be upgraded to allow more complexity and quirky outcomes but that’s exactly what you’ll find in proper neural network architectures.

Machine learning involves going through lots of examples just like this one and finding the best ways to adjust the weights so that the entire collection of these training examples produce results as close to what you want as possible. Then, and this is the entire point, you can give it a new input and its best guess about it will hopefully be pretty useful.

Lessons From Heavy Metal

2017-11-20 10:02

When I was thinking about neural networks and trying to come up with a better metaphor than brain physiology, I reflected on my machine shop experience. A long time ago, I had a very good job where I was the engineer for a 150k sqft heavy manufacturing facility. It was basically a subcontract machine shop for workpieces on the enormous end of the scale.

Although I was a very young guy at the time, the owner entrusted me personally with a prodigious amount of responsibility. At the time, I didn’t think much about it but when I look back on the things I managed, wow! The owner would buy old machine tools from failing manufacturing plants and my job was to reverse engineer their installations and create the same in our plant. Although my 3d modeling skills were not too challenging to me, they were very rare at the time and profoundly helpful for these projects.

What reminded me of the machine shop was after hearing the machine learning word "weight" be called "adjustment" enough times, I thought of these.


These are leveling screws. I actually invented these ones. Commercial ones were something like $150 and I realized that we could just make our own from electrical boxes and tube stock (it was a machine shop after all). We were able to get these down to about $20 each. As you can see we had many projects which ultimately used thousands of these cans. That’s the kind of thing I did to add value.

The way these anchors work is that you embed them in concrete so that you can hold a machine tool tightly to the floor. Between the machine tool and the floor is a leveling jack which can push the machine away from the floor. In this way you could "adjust" the machine tool’s pose.


The next trick was getting all the cans in the right place. This was something else I was responsible for. I managed the layout, the excavation, the setup, and the concrete logistics. Communicating to a concrete contractor how a hundred of these cans needed to hang in space before the pour was step one and step two was helping to design the scaffolding that could accomplish that. Then I was responsible for final inspections before the pour.


Once the concrete was poured, the leveling jacks were set in place and themselves leveled with their own leveling system. Small forms were used to pour grout to lock them in place.


Once the leveling jacks were fixed to the floor, the machine would be brought in. This photo shows a large Schiess CNC horizontal boring machine’s bed being placed in an installation I designed. Because of such projects with German equipment, I was the plant’s German translator too — before I learned German! I can’t quite remember what that cut out on the back wall was for but it was for something tricky like clearance to remove one of the motor shafts if it ever needed replacing. Stuff like that has to be thought out well in advance.

Here’s a picture of Howard leveling a machine.


He is adjusting the "weights" using non-artificial intelligence. The cost function is how straight and flat this table will move and the straightedge spanning the way surfaces is how he is intuitively not using the chain rule of calculus to link his adjustment actions to the final reduced error. Howard and all the mechanics were extremely good at this kind of thing. Their mechanical aptitude was humbling.

Here is a nice sequence of an installation I designed that shows how the leveling process could be rather complex. This vertical boring machine starts with a ridiculously complex geometry as set by the machine’s clearance requirements.


All of those jacks have to be precisely angled to get correct access to them. You can see that the straightedge is being used to make sure the levelers are all level (horizontal and coplanar). Note that each of these jacks has 4 leveling screws.


Once the base of the jacks are set in place, it looks like this.


After the bed and the columns are set on the jacks and assembled, the machine needs to be leveled. I don’t know if they’re bolting the column to the base or if they are already doing some leveling on the jacks. Either way, they need to reach through holes in the casting to do that and that’s why getting the angles perfect is not an optional feature.


If you want to think about this (very, very roughly) as a metaphor for machine learning, consider that the cost function, the "error", arises from an assembly of many moving parts. Not only do the bed and table need to be level, but that affects the columns, which affect the rail, which affects the heads, which affect the cutting tools, etc. Each of these pieces has their own adjustments, though just like most neural network architectures, fewer the closer you get to the measured error.

And finally, here is the finished installation actually turning a profit.


You can see the kinds of parts this machine works on. It can cut the mating surfaces of this power generation casing or the bore of the press base (I think that’s what that is) spinning on the machine. (There is an unrelated press bases in front of the fork lift.)

The Australian guy running this machine is one of the smartest people I’ve ever worked with and I work with a lot of molecular biotech people with PhDs. (That’s why he’s getting first shot running this thing.) The owner of the whole place is also a brilliant machinist (best in the world in my opinion) with a level of genius you hope to find in the halls of academia but too often don’t.

It’s interesting to contemplate why this installation is lowered into a pit. Basically when the machine was purchased the owner had the correct intuition (and I numerically verified) that the thing would not clear our overhead cranes. He also correctly guessed that his engineer could design an installation that would make it work out. And there it is!

Machine Learning - A New Metaphor

2017-11-19 22:00

When I studied the latest machine learning best practices earlier this year, the experience was like having Sherpas guide me up Mt. Everest. Though that rarefied atmosphere was pretty exhausting, I’m no script-kiddie tourist. I wanted to revisit this mountain unguided and tackle it in my own way.

As you can see, I like metaphors. The first thing I felt needed to happen was to critically scrutinize the primary metaphor of machine learning, neural networks. Every lesson on neural networks starts with a half-hearted neurophysiology lesson which is accompanied by enough hand waving to generate a breeze. The problem, as the instructor makes clear eventually, is that the neural networks of machine learning don’t really have as much to do with the meat in your head as the course name might suggest.

…originally motivated by the goal of having machines that can mimic the brain. …[the reason for learning is] they work really well… and not, certainly not, just because they’re "biologically" [air quotes!] motivated.

— Andrew Ng

Due to all these and many other simplifications, be prepared to hear groaning sounds from anyone with some neuroscience background if you draw analogies between Neural Networks and real brains.

— Andrej Karpathy

It is often worth understanding models that are known to be wrong (but we must not forget that they are wrong!)

— Geoffrey Hinton

As best I can figure, back when Isaac Asimov started publishing robot stories (about 1950) people got the idea that synthesizing a human-like machine was a credible aspiration. Turing did nothing to dampen such enthusiasm with his famous imitation game test around the same time. The target was drawn and the goal was clear — build a machine that passed for human, cognition included.

As ever more complicated physics continued to prove useful, no phenomena was considered fundamentally unknowable. Why not the human mind? The famous McCulloch and Pitts paper of 1943 is an uncanny predecessor of modern machine learning texts insofar as it starts with vague neuroscience mumbo jumbo and devolves into mathematical glossolalia by the end.

I think it is now simply a sacred tradition to casually mention some meaningless trivia about the human brain before talking about the kinds of thinking that machines might be able to do. This is kind of like a Sherpa Puja blessing before a climb. No harm done, right?

pyml.png This book cover features an image of the neurons in your head that you will use to understand machine learning. Nothing more.

But does this digression help anything? I say no. I believe pointless nonsense about neurons wastes the mental space that a metaphor truly useful to beginners could occupy. For example, every computer science professor and every student just filing out of their first machine learning lecture knows what an axon and a dendrite is. To me that is completely wasted educational effort.

How should machine learning be introduced? That’s a good question and I don’t pretend to have the optimal answer. All I know is that when I was learning this stuff, I felt cheated by the neuro preliminaries and I struggled to make sense of what was really going on in that context. Later after getting some experience with various neural network architectures, I tried to come up with a better way for beginners to understand what modern "neural" network machine learning is all about.

I have come up with a physical analogy that I think is illustrative and educational even if it is not perfect. It is certainly more helpful than axons and dendrites! By making the system physical I feel like some intuition can be applied. My analogy can take many forms but let’s start with this simple one. It’s a bit silly but bear with me.

Imagine a telephone pole needs to be replaced. You don’t have a spare straight pole but there is a natural tree-shaped tree nearby that needs to be cleared anyway. A crew cuts down the tree with a chainsaw and then cross cuts the trunk into logs.

You stack the logs vertically end to end at the site of the old pole and when you’re done the new pole is very precarious of course. Let’s not worry about that; we’ll assume it eventually gets wrapped in fiberglass or something. The real problem here is that the top of the pole is not lined up with the bottom of the pole. You disassemble the stack of logs and drive three 40mm screws 20mm deep into the end of each log forming a triangle.


If you’re very accurate with that, you should be able to stack the logs again, resting on the screws now, and a plumb line from the top will not have moved any closer to the base.

If this mechanical system is a metaphor for machine learning, then the thing we’re trying to "learn" is how much adjusting do we need to apply to each of the leveling screws in each log to make the top line up with the bottom.


This is obviously a hard problem. If there are 6 logs then there are 21 screws that can be adjusted. The way any particular screw at any particular level gets adjusted has different effects depending on what level it’s at and what the others are doing. To adjust these screws you really have to disassemble the system from the top.

If you don’t know much about machine learning, this is a good start. The goal is to bring the top of the log tower to a target position (over the base) by changing the settings of some adjustment screws. In machine learning each log would be a "layer" of the network and the more logs, the "deeper" the network. Logs between the top and bottom log are called "hidden". In theory, there are settings where the pole could snake up in a wild curve as long as the top made it back to the same horizontal position as the base. As with machine learning, the pole is built from the bottom up. You know the location you put the first log. That is the "input" and when you’re done stacking, you can check where the top ends up. You then have to disassemble the log tower from the top down to make adjustments to the leveling screws.

This disassembly is the "back propagation" operation in machine learning where it is a bit more complicated. The reason is that in my log example, I don’t know exactly how you would know how to adjust the screws. You’d pretty much have to guess and use intuition, something the mathematical version cleverly avoids. The screws are like the "weights" in machine learning, just adjustable parameters that make the structure do what it’s going to do. In proper machine learning back propagation, the whole point is that you can calculate from the top’s error down each level and figure out how much you should be turning each screw in the system. You do those calculations, make the adjustments, rebuild the pole, and check the new error. Repeat until it’s lined up.

Ok, now that you’ve pictured that, let’s expand that thought experiment. Now imagine that you have a similar system but instead of logs, you have a big sheet of 10mm thick rubber, maybe 1m square. Every 5cm on a 20 by 20 grid you have a leveling screw set into it just like the 3 in the logs. That’s 400 leveling screws that can be adjusted. And on top of that sheet of rubber, you put another one just like it but this one will be set off by the height of a screw set on a grid every 10cm, so 100 screws on this level. Now imagine carrying on like this for another 6 layers or so each time reducing the number of screws and even the area of rubber needed. Finally you get to the top layer and it has a single screw sticking out of it. That final screw is near the ceiling and there is a line painted on the ceiling.


Got all that? Here’s the interesting part. Let’s say you have a bunch of low relief sculptures of either dogs or muffins (or whatever). If you can somehow slide these sculptures one at a time under your big pile of rubber and screws, what you would like to happen is that when a dog carving is under the pile, the top screw points to the right of the line on the ceiling, and when a muffin carving is under the pile the top screw points to the left of a line. That way, if you didn’t know if a relief sculpture was a dog or a muffin, by putting it under this giant mess, you could read the top screw and learn that system’s interpretation. The key trick to machine learning is how the hell are we going to correctly set all those damn screws (thousands, maybe millions!) to pull off this kind of ridiculous miracle. As unlikely as my physical analogy makes it sound, it turns out that when the adjusting screws are mathematically based, the layers are designed just so, and massive GPU power is pumped into calculating the number of turns each screw needs, this kind of crazy contraption works!

I’m not suggesting anyone rush out and try to construct a dog detector out of sheets of rubber! This is just to give you a very rough physical intuition of what the hell is going on with machine learning methods before you tackle the mathematical magic which makes it all possible. You may find this conception of what machine learning is like to be implausible, pointless, and completely erroneous. Well, that’s par for the course here! Once I started thinking of machine learning like this, I was finally able to get a solid understanding of what the math was doing.

For people with no actual interest in applying machine learning themselves, this analogy might be interesting just to see what the AI hype is really all about and the implausible seeming model it’s based on. It may be implausable but it does often make uncanny true predictions. I’m not sure this miraculous lack of complete failure has anything to do with human cognition, but I’ll allow that it may be giving us hints about that too. Let’s just not start with that!


For older posts and RSS feed see the blog archives.
Chris X Edwards © 1999-2017