Category Archives: Video Processing

Jinja to Visualize Shot Threads and Scenes

Jinja2 (docs) – a template rendering system for Python – is quite useful to visualize results in a quick and consistent way. In this post we see some images (frames of a video) which need to be presented in a repetitive structure. In case you do your research work in Matlab this is also easy since you can export your data as a mat file and load it up in Python with (follow these instructions to make it easier).

The main idea is
import jinja2
template = jinja2.Template(open("path_template").read())
data = loadmat("path_to_matfile")

In my case of working with TV series and movies, one popular problem is shot threading. TV series and movies are typically shot from multiple cameras, and are heavily edited. Reverse engineering the editing can help put together shots from the same camera, which then goes on to help tasks such as person identification (they might be the same person!).

Threaded Shots from different cameras capturing a scene

Shots from different cameras capturing a scene

Visual analysis of the threading results can be quite hard due to the massive scale of the problem (imagine 700-1000 shots per 40min episode, and 20+ such episodes in a season). In the example above, we use Jinja to generate an HTML file with tables and links to the images. We see that the shot number below the image is staggered and indicates the editing pattern (click to view larger image and actually see the numbers).

Another popular problem is to divde the video into scenes, typically defined as a set of shots at the same location with the same set of people. Again visual analysis of the scene break detection algorithm yields itself very well to a template rendering system.

Scene break detection (5 shots per row)

Scene break detection (5 shots per row)

We see a correctly inserted scene break here (yaay!) as the characters in the story have changed.

Just as a final note, Jinja can do any form of template rendering. It is not restricted to HTML. Automatically generated LaTeX and then PDF files are certainly something to try 🙂


Formula1 Car Tracking

¡Need quite a fast Internet connection for loading gif images properly!
In our Advanced Signal Processing course at UPC, we learnt about this awesome tool called the Kalman Filter (KF) – typically used as a tracker and there was this “urge” to use it in something really cool :P.

Simultaneously there was a project on Formula1 advertisement detection (for fun!) and tracking cars seemed like a nice thing to try. As in all problems with the Kalman filter, the modelling of the system (states, transition and measurement) is most crucial. After a long thought process, it turned out that the modelling cannot actually involve the image or pixel values directly. It was just the basic equations of linear motion (physics) that would suffice.

The solution that I arrived at was to perform a search of the car location in the next frame, using a reference of the previous frame. The change in this position is the displacement and is treated as an observation. The KF trades off this observation (and its inherent errors) with the expected displacement computed by using the previous known velocity.

Since this displacement occurs in “unit” frame time, it can be called (sort of) velocity too. A more better, but complex approach would be to change the size of this “bounding box”, and to account for the fact that a car is actually 3D and doesn’t show the same face to the camera. The first one was accounted for (to some extent) by modifying the above search approach.

So below are 2 gif animations (click to view them), the first one showing correct tracking with resistance to occlusion, and some change in bounding box sizes too, and the second showing how the 2 similar cars are not confused which would surely be mixed if not for the KF.

What I strongly feel – Kalman Filter is all about trust. The noise power trades off your observations against state models, and hence it becomes a very powerful tool in numerous applications.


Tracking with Artificial Occlusion


Tracking the car behind without confusion