Simultaneously there was a project on Formula1 advertisement detection (for fun!) and tracking cars seemed like a nice thing to try. As in all problems with the Kalman filter, the modelling of the system (states, transition and measurement) is most crucial. After a long thought process, it turned out that the modelling cannot actually involve the image or pixel values directly. It was just the basic equations of linear motion (physics) that would suffice.
The solution that I arrived at was to perform a search of the car location in the next frame, using a reference of the previous frame. The change in this position is the displacement and is treated as an observation. The KF trades off this observation (and its inherent errors) with the expected displacement computed by using the previous known velocity.
Since this displacement occurs in “unit” frame time, it can be called (sort of) velocity too. A more better, but complex approach would be to change the size of this “bounding box”, and to account for the fact that a car is actually 3D and doesn’t show the same face to the camera. The first one was accounted for (to some extent) by modifying the above search approach.
So below are 2 gif animations (click to view them), the first one showing correct tracking with resistance to occlusion, and some change in bounding box sizes too, and the second showing how the 2 similar cars are not confused which would surely be mixed if not for the KF.
What I strongly feel – Kalman Filter is all about trust. The noise power trades off your observations against state models, and hence it becomes a very powerful tool in numerous applications.