Archive for the ‘Boulot’ Category

Gstreamer and OpenCV for image stabilisation

Thursday, November 10th, 2011

I am now back from Prague where I gave a talk on image stabilisation (and my holiday pictures). Hopefully a video of the talk will soon be online. In the meantime, I would like to explain a bit my efforts in written form, with some details slightly updated from the talk (the code progressed a bit since then).

UPDATE: The talk is now online.

I got interested in the issues of image stabilisation through a helium balloon photography project in which I participated. I want to make a nice time lapse video from the pictures I have taken, but they were taken from a camera that was moving, which would make the result very shaky without some kind of postprocessing.

Thankfully, I work at Igalia, which means that on top of my personal time, I could spend on this project some company time (what we call internally our hackfest time, of up to 5 hours per week).

Original problem statement

I have around 4h30 of pictures taken from a balloon 100 metres high. The pictures were taken at a rate of one per minute, which makes around 270 pictures. I want to make a nice time lapse out of it. Simply using the frames as is to build a video does not work well. Partly because I would probably be legally required to include a warning to epileptic people at the beginning of the video, but mostly because people actually watching it would wish they were epileptic to have a good excuse not to watch it.

This is due to the huge differences occurring between two consecutive frames.

Here is an example of two consecutive frames in that series:

http://guij.emont.org/blog/wp-content/uploads/2011/11/IMG_7539-300x225.jpg
http://guij.emont.org/blog/wp-content/uploads/2011/11/IMG_7540-300x225.jpg

As you can see, from one frame to the next, a lot of pixels would change. And that does not look pretty. It is also pretty obvious that they are both pictures of the same thing, and could be made to be pretty similar, mainly by rotating one of them, and maybe reprojecting it a bit so that things align properly even though the point of view changed a bit from one frame to the next.

Standing on the shoulders of giants

There was no question in my mind that I wanted to use GStreamer for the task, by writing an element or set of elements to do the stabilisation. The two big advantages of this approach are:

  • I can benefit from all the other elements of GStreamer, and I can easily do things like decode my pictures, turn them in a video, stabilise it and encode it in a format of my choice, all in one command.
  • Others could easily reuse my work, potentially in ways I could not think of. One idea would be to integrate that in PiTiVi in the future.

Then, after some research, I realised that OpenCV provides a lot of the tools needed for the task as well.

Since I am still in a prototyping/research stage, and I hate to write loads of boilerplate, I am using python for that project, though a later rewrite in C or C++ is not impossible.

First things first

I will not present things exactly in the order I researched them, but rather in the order I should have researched them: starting with a simpler problem, then getting into the complications of my balloon problem.

The simpler problem at hand is presented to you by Joe the Hippo:



Joe the shaky hippo (video)

As you can see, Joe almost looks like he’s on a boat. He isn’t, but the cameraman is, and the video was taken with a lot of zoom. The movement in that video stream has a particularity that can make things simpler: the position of a feature on the screen does not change much from one frame to the next, because a very short amount of time happens between them. We will see that some potentially very useful algorithms take advantage of that particularity.

The steps of image stabilisation

As I see it for the moment, there are two basic steps in image stabilisation:

  1. Find the optical flow (i.e. the movement) between two frames
  2. Apply a transformation that reverts that movement, on a global (frame) scale

Step 2. is made rather easy by OpenCV with the findHomography() and warpPerspective() functions, so we won’t talk much about it here.

Optical flow

For all that matters in this study, we can say that for each frame the optical flow is represented by two lists of point coordinates origins and destinations, such that the feature at the coordinate origins[i] in the previous frame is at the coordinate destinations[i] in the current frame.

Optical flow algorithms can be separated in two classes, depending on whether they provide the flow for all pixels (Dense optical flow algorithms) or only for selected pixels (Sparse optical flow algorithms). Both classes can theoretically provide us with the right data (origins and destinations point lists) to successfully compute the opposite transformation we want to apply using findHomography().

I tried one algorithm of each class, choosing the ones that seemed popular to me after reading a bit of [Bradski2008]. Here is what I managed to do with them.

Dense optical flow

I tried to use OpenCV’s implementation of the Horn-Schunck algorithm [Horn81]. I don’t know if I used it incorrectly, or if the algorithm simply cannot be applied to that situation, but this is all I could do to Joe with that:


Now Joe is shaky and flickery

As you can see, this basically added flickering. Since that, I did not find time to improve this case before I realised that this algorithm is considered as obsolete in OpenCV, and the new python bindings do not include it.

Note that this does not mean that dense optical flow sucks: David Jordan, a Google Summer of Code student, does awesome things with a dense algorithm by Proesmans et al. [Proesmans94].

Sparse optical flow

I played with the Lucas-Kanade algorithm [Lucas81], with the implementation provided by OpenCV. Once I managed to find a good set of parameters (which are now the default in the opticalflowfinder element), I got pretty good results:


Joe enjoys the stability of the river bank, undisturbed by the movements of the water (video)

And it is quite fast too. On my laptop (with an i5 processor), I can stabilise Joe the hippo in real time [1] (it is only a 640×480 video, though).

[1] For those who attended my talk at the Gstreamer Conference 2011: yes, now it is proper real time, I optimised the code a bit.

The balloon problem

As we seen in the previous section, for a shaky hippo video, [Horn81] isn’t any help, but [Lucas81] is pretty efficient. But can they be of any use for my balloon problem?

Unsuccessful results

I won’t show any video here, because there is nothing much to see. Instead, an explanation in pictures that show how the algorithms rate for the balloon time lapse.

This is what Horn-Schunck can do:

http://guij.emont.org/blog/wp-content/uploads/2011/11/balloon-hs-300x112.png

The picture shows two consecutive frames in the time lapse (the older is on the left). Each of the coloured lines goes from a point on the first image to the corresponding point on the second one, according to the algorithm (click on the image to see a larger version where the lines are more visible). Since Horn-Schunck is a dense algorithm, the coloured lines are only displayed for a random subset of points to avoid clutter.

Obviously, these lines show that the algorithm is completely wrong, and could not follow the big rotation happening between the two frames.

Does Lucas-Kanade rate better? Let’s see:

http://guij.emont.org/blog/wp-content/uploads/2011/11/balloon-lk-300x112.png

This is the same kind of visualisation, except that there is no need to chose a subset, since the algorithm already does that.

As for the result, it might be slightly less wrong than Horn-Schunck, but Lucas-Kanade does not seem to be of any help to us either.

The issue here, as said earlier, is that these two algorithms, like most optical flow algorithms, are making the assumption that a given feature will not move more than a few pixels from one frame to the next (for some value of “a few pixels”). This assumption is very clever for typical video streams taken at 25 or 30 frames per second. Unfortunately, it is obviously wrong in the case of our stream, where the camera has the time to move a lot between two frames (which are captured one minute apart).

Is all hope lost? Of course not!

Feature recognition

I found salvation in feature recognition. OpenCV provides a lot of feature recognition algorithms. I have tried only one of them so far, but I hope to find the time to compare it with others in the future.

The one I tried is SURF (for “Speeded Up Robust Features”, [Bay06]). It finds “interesting” features in an image and descriptors associated with them. The descriptors it provides are invariant to rotation and scaling, which means that it is in theory possible to find the same descriptors from frame to frame.

To be able to efficiently compare the sets of frame descriptors I get for two consecutive frames, I use FLANN, which is well integrated in OpenCV.

Here is a visualisation of how this method performs:

http://guij.emont.org/blog/wp-content/uploads/2011/11/balloon-surf-300x112.png

As you can see, this is obviously much better! There might be a few outliers, but OpenCV’s findHomography() can handle them perfectly, and here’s a proof video (I am not including it in the article since it is quite high resolution).

Obviously, the result is not perfect yet (especially in the end), but it is quite promising, and I hope to be able to fix the remaining glitches sooner than later.

Show me the code!

The code as well as a quick introduction on how to use it is available on github. Bugs and patches should be posted here.


[Bradski2008] G. Bradski and A. Kaehler , “Learning OpenCV”, ISBN 978-0-596-51613-0, 2008.
[Horn81] B. K. P. Horn and B. G. Schunck, “Determining optical flow,” Artificial Intelligence 17 (1981): 185–203, 1981. PDF
[Proesmans94] M. Proesmans, L. Van Gool, E. Pauwels, A. Oosterlinck, “Determination of optical flow and its discontinuities using non-linear diffusion”, J.-O. Eklundh (Ed.), Computer vision — ECCV ’94, Lecture Notes in Comp. Science, Vol. 801, Springer, Berlin, 295–2304, 1994. PDF
[Lucas81] B. Lucas and T. Kanade, “An Iterative Image Registration Technique with an Application to Stereo Vision”, Proc. of 7th International Joint Conference on Artificial Intelligence (IJCAI), 674–279 PDF
[Bay06] H. Bay, T. Tuytelaars and L. Van Gool, “SURF: Speeded Up Robust Features”, 9th European Conference on Computer Vision, 2006. PDF

Prague, we meet again!

Sunday, October 23rd, 2011
The bridge, seen from the Petřín lookout tower

Karlův most (Charles bridge)

 

Tomorrow I will fly to Prague, going to the GStreamer Conference, then to LinuxCon Europe and ELCE.
I’m excited to go back there, after having first visited this beautiful city in my last Eurotrip.
I will give a talk at the GStreamer Conference about my work on image stabilisation, related to that balloon project we did with Ugo.
I will go there sponsored by Igalia, with my friends and colleagues Víctor, Philippe and Javi. Víctor will give a talk as well, about the integration of syslink and GStreamer.

Playing with balloons

Monday, May 9th, 2011

These days, I’m spending a big part of my free time on a nice project with a few friends. That’s a project involving a balloon. We have recently started a blog called Balloon Freaks to talk about this, you might want to check it out.

Blog back online

Wednesday, April 27th, 2011

I finally found the time to migrate my blog to the new server I have (the old server died a few weeks ago).

I hope I managed to do the transition correctly, and that people following me through a feed reader and/or a planet won’t be flooded with a truckload of old articles. If that still happens, I apologise. Also, I know the alert readership that you are noticed that I went back to the default wordpress theme: this is by design (and a bit by laziness). I got tired of the old design, which was ugly anyway (it was hacked together from various bits by a very amateurish designer: me). So, here’s something sober, just like I want it to be.

As for more generic news, I am still happy, enjoying life in Barcelona, loving my job at Igalia, hacking on cool Grilo stuff (and sometime GStreamer stuff as well). Hopefully I will post some things regarding that in the not so distant future.

This might be difficult because most of my spare time these days is taken by what I call the “balloon project”, recently started with a few gifted friends of mine. More on that soon, I promise, as I have plenty of things to tell about this project that involves developments in a lot of domains in which I’d love to know more but am always discovering things, such as physics, electronics or computer science.

 

Fosdem

Saturday, January 22nd, 2011

Like a few friends have stated, I am now saying it loud and clear:
I'm going to FOSDEM, the Free and Open Source Software Developers' European Meeting

And for that I should thank the awesome company for which I work. I am sure great time will be had, as well as great conversations about free software, multimedia, technology, life, the universe and everything.

See you all at FOSDEM!

Recent hacks: python with gdb to follow gstreamer

Tuesday, November 9th, 2010

I’ve recently discovered one too little-known feature of gdb: you can script it with python. It’s even in the documentation, and there are nice tutorials available which I invite you to read.

The two main cool things you can do in python, is define new commands, and new convenience functions. Unfortunately, as you can read in this article, defining a new command requires a lot of boilerplate (and it’s the same for convenience functions).

Since I wanted to write a few commands, I couldn’t be bothered to copy-paste that boilerplate, so I ended up writing a nice magical class that makes registering a function easier. You can find that here.

With this new weapon in hand, I wrote new commands to be able to follow what happens in a GstAdapter. Basically, I wanted to be able to track the position in the original file of each byte that was going out of a given adapter. So I wrote some code to be called by gdb commands when some operations on a GstAdapter are done, doing the necessary calculations and storing, and voilà: I can print, with gdb, the file position of the data obtained with each call to gst_adapter_peek().

All the code and the gdb script can be found over there.

As you can see, a gdb script is still needed. I haven’t yet really tried to play with breakpoints in python, but I have a feeling this is not totally trivial, if at all possible. The other great limitation is speed. It feels like the calls to python stuff from gdb are very costly, and, in the case of GStreamer, don’t expect smooth playing if you use some gdb command implemented in python for every buffer transmitted.

Conclusion: it’s got limits, but overall, this ability to enhance gdb with python allows you to do things you couldn’t do easily otherwise, like store data in complex structures for your debugging, and prototype easily some debugging actions you want to do, where the only alternative is to write it in C and include it in the application/library you are debugging.

New job, new flat: new life!

Tuesday, October 5th, 2010

You might have noticed one way or another, that I have just started a new job at Igalia. I’m very excited about joining a company that seems to try and do everything the right way™, and with really great people.

At least for starters, and to flex my hacking muscles a bit, I’m going to hack on various gstreamer stuff, which is always fun. Also, I think this will be my first post to appear on planet Igalia, so: HI PLANET READERS!

Oh yeah, as the title says, I’ve just changed flat, am now living in a great flat in a brand new building with cool flatmates, so that’s good news as well! And yes, I’m still in Barcelona, now in Poble Sec, which proves a cool area so far.

Vista…

Thursday, November 13th, 2008

Here and now at work, I have to do stuff under windows (obviously, we want elisa to work under windows). On the machine I have at work, that means Vista Home in Spanish (that’s what was installed). To do that stuff, I need some tools, like the excellent bzr, written in python (this detail has its importance).

A while ago, for some reason, I wanted to use bzr-svn on that setup, so I installed it in C:\Program Files\bazaar\plugins. I don’t remember if I managed to make it work or not (did I say it was a while ago?), but I eventually uninstalled it, as well as any dependency that I might have installed for the occasion. Ever since that day, whenever I ran a bzr command, I had the following output preceding the expected output of said command:

No Python bindings for Subversion installed. See the bzr-svn README for details.
Unable to load plugin u'bzr_svn' from u'C:/Program Files/Bazaar/plugins'

Even though I don’t have any bzr_svn plugin in that directory! Yes, I tripled checked!

This had been driving me crazy for months now, and today, by pure chance, I stepped across a directory called C:\Users\guijemont\AppData\Local\VirtualStore that contained a Program Files directory. Having a look at it, I discovered, after months of struggling with that awful and undeserved error message that there was a:

C:\Users\guijemont\AppData\Local\VirtualStore\Program Files\bazaar\plugins\bzr_svn

Yeah, really. And it was full of .pyc files. What happened is that, whenever a user program tries to write somewhere where it shouldn’t (such as Program Files), UAC gives it the impression that it succeeded, and write the stuff in that VirtualStore instead of the real place. That is a convenience that might save the day to some programs that don’t behave and write in places where they shouldn’t.

Enters python. Python writes .pyc files, which are slightly compressed versions of the original .py files (and therefore faster to load). It writes them in the directory where the .py file resides (I guess it makes a lot of things easier to manage).

Then you mix both. Install bzr-svn, use it as a normal user, remove it: it’s not removed! Because the .pyc files are still there, hidden in VirtualStore.

And they say that OS is user-friendly? As I user, I find it friendly that a directory is deleted when I delete it…

Future of Pigment

Tuesday, March 11th, 2008

Pigment, (especially Jerakeen) is great stuff. I have been involved in that for about five months now, and I am quite proud of being part of this. For instance, Elisa or my recent screencasts speak for themselves: Pigment rocks!
Yet, Pigment could rock even more; here is why, and how…

The issues

As I said earlier, we are thinking of what Pigment 0.5 should look like. First, here is a list of things that suck in pigment 0.3, and need (heavy) rearchitecturing to be fixed:

  • no real 3D support: we can’t do rotations or efficiently apply a transformation on a group of drawables, we can’t handle lighting, different cameras or camera movement.
  • there is no support (at least not in C) to ease the application of a transformation to a given set of drawables. There is a basic “group” support in pigment-python, but it is slower than it should and limited.
  • writing a rendering plug-in is too complicated. A lot of the stuff that is done in plug-ins could be factorised in the core. This is partly why we only have an OpenGL rendering plug-in in Pigment 0.3
  • there is no programmatic way (there’s only logging) to asynchronously retrieve errors, even though most pigment functions trigger asynchronous events.
  • the animation system (which is done in python, but that is not an architectural issue, we could do it in C) is not synchronised with the rendering.
  • it is impossible to build pigment without GStreamer because we use GstObjects everywhere.

(more…)

Paf! Animation Framework!

Monday, March 10th, 2008

As mentioned earlier, these days, I’ve been thinking about animation issues. It seems the free world lacks an animation library that abstracts out the complexities of animations. A lot of libraries implement their own animation routines, which are generally incomplete and often unusable in other contexts than these libraries. I feel that we need (and I am ready to work on developing) a library that implements a good animation framework, and would provide the following functionalities:

  • a timeline, similar to what is implemented in QTimeline or GtkTimeline
  • an animation object, that handles the evolution of the animated variables from an evolution function, that could be constructed using…
  • an interpolator, that could interpolate in different modes (linear, bezier splines, cubic hermite splines, …) between key values/times/speeds/accelerations.

(more…)