External References

https://homes.cs.washington.edu/~todorov/courses/cseP590/readings/tutorialEKF.pdf

Unscented Kalman Filter paper.

Excellent tutorial for nonexperts (Which BTW is the classy way to say "for dummies".)

Paper about EKFs in robots using IMUs.

An online book on Kalman Filters in Python. ("Some books offer Matlab code, but I do not have a license to that expensive package." Bravo.)

Python module to simplify KF use.
Kalman Filters As I Understand Them
The Kalman filter is interesting because it is an attempt, a good damn one, to use fancy math hand waving to get around some fundamental philosophical problems, most notably the problem of induction. A lot of science people believe that they believe in "facts" and that they understand The Truth about some things. This is, sadly, never the case.
I’ll use examples involving tracking which are as good as any and represent one of the major uses of the Kalman filter. You could be wondering if an asteroid is on a collision course for Earth. You could be a cheetah chasing a gazelle. You could be building a robot which juggles chainsaws. You could be flying a spacecraft to repair a satellite. All of these things involve time and motion through space and a keen interest in what some other object is doing with its relative space and time.
In these examples tracking is accomplished by receiving input about the world from sensors. The problem with induction is clear the instant you start tracking something with more than one sensor. The philosophical reality is that if you look closely enough, any two sensors will never agree. Often you can just take an average and not worry about it so much. But what if you know one of the sensors is quite good and one is quite bad? And what if in different operating conditions the good and bad sometimes reverse?
Imagine tracking a car using two sensors. The first would be a GPS and the second would be an odometer that knows how many wheel revolutions have occurred. The GPS knows pretty well where on the planet you are, but maybe its readings jump around a 5m radius. If you’re sitting in traffic it might say you’re superimposed with the car in front of you, then the next reading say you’re enmeshed with the car behind you. Not terribly useful to plan a driving strategy in that environment. But the wheel sensor would know that you’re stopped or your moving very slowly. How can you take two sources of information like that and make the best of them? A good answer is the Kalman filter.
Prediction a priori
The way it works is that information is arriving in discrete packets which inform you about something you are trying to keep track of. This happens in iterative steps. In the previous step you knew a bunch of stuff. Now it’s a new step. Based only on that previous stuff, not the new readings that have just arrived, where should things stand now? What should those new readings turn out to be if the system were perfect? What do you know at this point without consulting the new readings?
Although it scales well for "sensor fusion", consider only one of the sensors. If you knew for instance that the previous measurement, x[t1], was 10 and x was increasing by 5 every step, then the current sensor measurement, x[t], would be about 15. In this step, the 5/step is accounted in F, the state transition matrix which describes how the state seems to be changing. The Bu term shows up here too since this represents the known internal motive agents effecting the changing target. If you know for sure that the tracked object sped up or slowed down through internal knowledge (you see brake lights, for example), that would be included here. If, on the other hand, the other object is opaque and its internal aspirations are unknown, then this is set to 0 (assume a steady burn or at least as many accelerations as decelerations) and any true effects from this will have to be accounted for as errors as best as possible.
That is the state. There is also P, the predicted estimate covariance. Knowing only how your best model of the system behaves from last round, what is the historical uncertainty of that approach? Step after step, have these estimates been all over the place? Or are they pretty reliable? The nu is used to produce the Q covariance matrix is the process noise, which is assumed to have a zero mean.
So now we know based on last step’s model what this step should be and how much we should trust that.
Update the prediction
In an open loop system, that could be that. However we now receive a measurement update which may not exactly align with our model. This measurement tells us where we are to an extent, but if the measurement is spurious we can, to some degree, rely on the prior trends to keep us from being misled. The key is that the model that we’re using to make predictions must now take this new measurement into account. The model must be judiciously updated to reflect this new data. At the same time it can’t completely trust that the new measurement is correct.
The measurement comes in the form of z which is the current step’s literal sensor input. These direct readings are adjusted by our history with the system. H, the measurement matrix, projects belief about the system’s current state. What degree of the old model should be applied to counterbalance direct, but possibly spurious, measurement readings? y is the measurement residual and is the discrepancy between what the model says should be happening now and what the sensors tell us is happening. If this is small, that’s great, the sensors and model must be doing things well. If it is large, then maybe a single spurious reading came in or the model is just not reliable.
And just as the uninformed model had a value and a covariance to track confidence in that value, so does the measurement residual. y is the distance between the actual raw measurement set, z, and the previous model’s prediction of what z would turn out to be. S is the residual covariance which tracks confidence in this model by referring to the model’s predicted estimate covariance, P, and the noise covariance matrix, R. This R can reflect inherent fundamental lidar or sensor hardware inaccuracy determined through a controlled calibration process before deployment. If S is large, then the sensors are noisy and don’t track the model’s beliefs very well. If S is small, even if the model and the measurement disagree, they consistently disagree which is good enough for an implicit compensation.
Update the prediction model
Remember that if the measurement is spurious we can, to some degree, rely on the prior trends to keep us on track. That "some degree" is an important issue to resolve. Just how much should we trust the historical model and how much should we trust what the sensors are telling us right now? This is done with K, the Kalman gain. It is comparing the covariance of the historical prediction model and the covariance of the measurement discrepancy history to assign the best mix of which to use.
Once the Kalman gain, K, is known, it can be used to update the model so it can be a little wiser next round. This is just done by taking the Kalman gain’s proportion of the measurement residual and adding it to the uninformed prediction based on last round’s model. This new state is the best guess for the current location of the target’s true Platonic location, known only to an omniscient deity. Not only should you aim your gun at this position to best hit the target, this will also provide, theoretically, a better basis for the next step to consider new measurement information. P is also be updated with the Kalman gain to reflect any changes in confidence in the model for the next round.
Kalman Filter Flow Chart
DATA > Initialize EKF matricies

Process Measurement
Check if First Measurement  yes  Initialize state,x,covariance matrices
 no
Predict
compute elapsed time, dt
use dt to compute new EKF, F, Q
predict x,P

Laser or Radar \
 
Update Laser Update Radar
Setup laser matrices Linearize measurement function, h(x')
 Setup radar matrices
 
State update with new measurements, x
Kalman Filter Equations
The actual update equations for a Kalman filter are "involved". … Every time I use them, I just look them up.
— Sebastian Thrun
Definitions

B  "control input matrix" or "Control input model"

u  "control vector" "Warp factor
u
, Mr. Sulu!" How that plays out depends on B. 
Bu  "motion control" and zero here, used to express active effector, like using the motor, sort of open loop. Since we do not know the internal forces a cyclist, for example, is engaging, this matrix is set to 0. The unknowableinadvance movements (accelerations) will just be treated as random noise.

w  "process noise" which is a normal distribution with a mean of zero and a covariance of Q.

Q_nu  covariance matrix of the individual noise processes matrix

Q  complete process noise covariance matrix (depends on noise nu)

Q= E[nu*nu.transpose] where E is expectation

Q= [ [sig^2_ax,0],[0,sig^2_ay] ]
 Q= [ [(.25)dt^4sig^2_ax, 0, (.5)dt^3sig^2_ax, 0] [0, (.25)dt^4sig^2_ay, 0, (.5)dt^3sig^2_ay] [(.5)dt^3sig^2_ax, 0, dt^2sig^2_ax, 0] [0, (.5)dt^3sig^2_ay, 0, dt^2sig^2_ay]


nu  "process noise", zero mean and Q covariance matrix,

nu= [nu_px,nu_py,nu_vx,nu_vy]


omega  "measurement noise", zero mean and R covariance matrix

x  "estimate" or "state vector" [position velocity]  This is the mean of a Gaussian distribution. Could also be stated as a four element vector [px py vx vy].

x'  predicted state matrix, x'x is x' given x. Also written x_{kk1}.

F  "state transition matrix" (Possibly the F stands for "filter"), integrates time
kf_.F_= [ [1,0,dt, 0],[0,1,0,dt],[0,0,1,0],[0,0,0,1] ]
May commonly imply the laws of motion. 
F*x  predicts where the object will be at time dt.

K  Kalman gain K=P*H^T & Sinverse ????

Z is the measurement vector

lidar= positionx and positiony (px,py)

radar= z= [rho,psi,rhodot] rho is range, psi is bearing, rhodot is range rate


H  "measurement matrix" or "observation model"  encodes belief about the object’s current behavior

lidar= drop the velocity
[ [1,0,0,0],[0,1,0,0] ]
(just keeps x and y)


Hj  Jacobian measurement matrix, partial derivatives of everything to convert polar stuff to expected state vector x. float ssqs= px*px + py*py; // Sum of squares of px and py. float d= sqrt(ssqs); // Distance effectively. float d3= d*d*d; // Cube of distance. if (ssqs == 0) return Hj; // Check division by zero
Hj= [ [ px/d, py/d, 0, 0,], [py/ssqs, px/ssqs, 0, 0,], [py*(vx*pyvy*px)/(d3), px*(vy*pxvx*py)/(d3), px/d, py/d] ]

v  "observation noise" which is a Gaussian distribution with mean 0 and covariance R.

R  observation measurement noise covariance matrix or "measurement covariance matrix"  this is the uncertainty of the measurement sensor. It can be supplied by the sensor (e.g. lidar) manufacturer based on an offline calibration regime.

P  Predicted state estimate error covariance or "state covariance" or "uncertainty covariance". When calculated before the measurements have been incorporated into the model, it is an "a priori". When recalculated after the update step, it is the "a posteriori" estimate covariance.

I  Identity matrix

y  measurement residual  the discrepancy between what the model says should be happening now and what the sensors tell us is happening. Also called "innovation" residual because the model thought the amp goes to 10 but really it goes to 11, an innovation of 1.

S  measurement residual covariance matrix, measures the variability of the quality of the system. Also innovation covariance.

lambda in UKF  spreading factor. How far should the sigma points be from the mean? This is just a scaling component in that calculation.
Udacity
Sebbie calls the "prediction" step "motion update" which goes better with "measurement update" though I’m not sure that’s entirely accurate since there’s not real update with external info. Still, it gets the point across.

x' = Fx + u

P' = FPF^{T} + Q

y = z  Hx'

S = HP’H^{T} + R

K = P’H^{T}S^{1}

x = x' + Ky

P = (IKH)P'
"What does the prime notation in the x vector represent? The prime notation like p_{x}' means you have already done the update step but have not done the measurement step yet. In other words, the object was at p_{x}. After time Δt, you calculate where you believe the object will be based on the motion model and get p_{x}'."
Simple Python Format
Z= matrix([[measurements[n]]]) y= Z  (H * x) S= H * P * H.transpose() + R K= P * H.transpose() * S.inverse() x= x + (K * y) P= (I  (K*H)) * P # prediction x= (F*x) + u P= F * P * F.transpose()
According to Octave
Function File: [est, g, x] = kalman (sys, q, r) Function File: [est, g, x] = kalman (sys, q, r, s) Function File: [est, g, x] = kalman (sys, q, r, [], sensors, known) Function File: [est, g, x] = kalman (sys, q, r, s, sensors, known)
Design Kalman estimator for LTI systems.
Inputs

sys:: Nominal plant model.

q:: Covariance of white process noise.

r:: Covariance of white measurement noise.

s:: Optional cross term covariance. Default value is 0.

sensors:: Indices of measured output signals y from sys. If omitted, all outputs are measured.

known:: Indices of known input signals u (deterministic) to sys. All other inputs to sys are assumed stochastic. If argument known is omitted, no inputs u are known.
Outputs

est:: Statespace model of the Kalman estimator.

g:: Estimator gain.

x:: Solution of the Riccati equation.
Block Diagram
u ++ ^
+> > y
 ++ + y  est  ^
u +> >(+)> > x
 sys  ^ + ++
w >  
++ v
Q = cov (w, w') R = cov (v, v') S = cov (w, v')
???
If x=[p,nu] and F=[ [1,dt],[0,1] ]
linear motion
p'=p+v*dt v'=v i.e. constant velocity
state transition function
p'x 1 0 dt 0 px nupx
p'y = 0 1 0 dt * py + nupy
v'x 0 0 1 0 vx nuvx
v'y 0 0 0 1 vy nuyv
x'=f(x)+nu x'=F*x+Bu+nu x'= F*x+nu // since mean noise is zero… x'= F*x
measurement function
z=h(x')+omega z=H*x'+omega