Video decoding in a sandbox

I would like to explain a bit the stuff I've been working on recently at Igalia. It is about playing with GStreamer and a sandboxing system to try and make the playback of untrusted media more secure. Hopefully writing this will be an occasion for me to get more distance and understand things better, and for others to give me feedback and ideas. Particularly, even though that is for me a field of interest, I do not claim to have any real expertise in security, therefore comments by people who know better would be gladly welcome.

This story started when I decided to have a look at chromium and its internals. It turns out that one very specific aspect of this application is its sandboxing system. In a nutshell, a sandbox is a virtual container in which untrusted programs can be safely run. In the real world, sandboxes are rarely perfect, but they are a significant security improvement over not using one. Chromium uses a sandbox to run its rendering engine (WebKit), which is basically the part that transforms the code of a web page into the graphical representation of it that you see on your screen. The rationale for running WebKit in a sandbox is not that it is untrusted code in itself, but rather that it is a big and complex project that is bound to have bugs, like all big and complex projects. On top of that, the input given to it is quite often data from untrusted sources, which could potentially be forged so that it exploits security bugs to do bad things to your computer and your beloved files. Now, with WebKit running in a sandbox, if a web page has been forged by an attacker to exploit a vulnerability in WebKit, the attacker will only have access to the sandbox environment, which means that it won't be able to do things like access the data on your computer, install software or connect to remote hosts.

As you might know, I like to play with multimedia things, and have hacked quite a bit on or around GStreamer. Therefore, I quite automatically thought of something else that might be worth running in a sandbox: demuxers and decoders. They are relatively big and complex pieces of software to which we regularly pass a whole bunch of untrusted data, would it be in a web context or a more traditional desktop or mobile context.

Fortunately, Julien Tinnes, a developer of the chromium sandbox for GNU/Linux made a stand alone version of it called setuid-sandbox, which can be used by other projects to easily sandbox any process.

Architecture

The way setuid-sandbox works is rather straightforward: there is a sandboxme command that needs to be installed setuid root. You run sandboxme my_command and then from inside my_command, you first set up the file descriptors that you will need (being careful not to put there anything that could allow to escape the sandbox, more on that later), and then you call the provided chrootme() function, which will tell the sandboxme process to restrict the privileges that my_command has (e.g. it can still read and write on the fds that it has open, but it cannot open new ones).

Here is how I organised my integration of setuid-sandbox into GStreamer. What I want to do for now is to put what I think are the "most dangerous" parts (demuxing and decoding) in the sandbox, while letting the other components (mainly source and sinks) outside of the sandbox (for now at least). I decided to create a small program (called gst-decoder) that receives the original muxed and encoded video stream and outputs the decoded video and audio buffers. gst-decoder needs 3 channels of communication with the "controlling" process outside the sandbox (which is called the broker):

one to pass the original stream from the source element in the broker to gst-decoder

one to pass the video buffers from gst-decoder to the video sink element in the broker

one to pass the audio buffers from gst-decoder to the audio sink element in the broker

In the future, more channels for subtitle support or other features could be desirable.

Since I am lazy, I wanted to use off the shelf GStreamer elements to handle these communication channels. For the cases explained above, that would be:

the fdsink element on the broker side, and the fdsrc element in the sandbox

shmsink (in gst-decoder) and shmsrc (in the broker)

same elements as above

Since I expect other people to be equally lazy^W^W^Wwant their life to be made easier, my goal is to try and have this reasonably integrated in GStreamer, and easy to integrate in applications. For that, my best idea so far was to make a sandboxedecodebin element that, from the outside, works like decodebin or decodebin2, at least for simple cases: it has a sink pad that can take any format you would throw at decodebin, and it has an audio and a video source pads that output the decoded result. In the future, it might or might not be a good idea to try to integrate the "sandboxed" functionality in decodebin directly.

I implemented sandboxeddecodebin as a subclass of GstBin, and it has the following flow inside it:

fdsink -> [gst-decoder] | -> shmsrc (video) -> gdpdepay
                        | -> shmsrc (audio) -> gdpdepay

Note that gst-decoder is an external (sandboxed) process, and not a GStreamer element like the other entities of this data flow graph. The sink pad of fdsink and the source pads of the two gdpdepay elements are exported by sandboxeddecodebin through ghost pads, which provides a decodebin-like interface.

The gst-decoder program basically runs a pipeline that looks like that:

fdsrc ! decodebin2 name=decoder
decoder. ! video/x-raw-yuv;video/x-raw-rgb ! gdppay ! shmsink (video)
decoder. ! audio/x-raw-int;audio/x-raw-float ! gdppay ! shmsink (audio)

and it also makes sure to get the privilege dropped at the right time, which is discussed below.

When to drop privileges?

The ordering of operations needs to be thought carefully to combine GStreamer, and these elements in particular, with setuid-sandbox. Each of them brings its own set of conditions.

For setuid-sandbox, inside the sandbox (in gst-decoder):

before we call chrootme(): we can open new fds and do a lot of nice initialisation, and we don't want to parse any untrusted data

after we call chrootme(): we can't open new fds any more, or do similar initialisation tasks, but we can work on the data we received.

GStreamer has several states in which an element can be, with some rules on what should be done in which state. From the design documentation, the states are defined as follow:

NULL: This is the initial state of an element.

READY: The element should be prepared to go to PAUSED.

PAUSED: The element should be ready to accept and process data. Sink elements however only accept one buffer and then block.

PLAYING: The same as PAUSED except for live sources and sinks. Sinks accept and rendering data. Live sources produce data.

In particular, the elements that interest us here behave in the following way:

shmsink is responsible for the creation and destruction of the shared memory object and the associated control socket and creates them when going from NULL to READY and destroys them when going from READY to NULL. Since shmsink is used from inside the sandbox, this means that the state change NULL to READY needs to happen before chrootme(). This also means that it won't be able to clean up properly the shared memory object and the control socket.

fdsrc doesn't create nor destroy the fd it uses, so that can be done separately. Moreover, in the case of stdin, we leave that responsibility to the system.

And quite obviously, we want gst-decoder to handle buffers only after it has called chrootme(), so that it is ready to run potentially unsafe operations.

This is relatively easy: all we have to do is, in gst-decoder, to call chrootme() once we are in the READY state and before going to PAUSED.

Another issue with the privilege drop is that we use decodebin2 (things would be the same with decodebin), and it only loads the plugins it needs once it knows what kind of data it will have to decode. That is, it needs to load plugins after it has started to analyse potentially unsafe data. My solution to that is to preload all the installed plugins when gst-decoder starts, so that decodebin2 doesn't need any privilege to have access to the plugins it wants (they are already in memory).

This is obviously suboptimal in memory consumption. I can think of two ways to improve that:

use a white/black list of plugins to avoid loading plugins we are not likely to need (there are many things we're pretty sure not to need in gst-decoder, such as all sources and sinks or gnonlin)

use a separate typefinding sandboxed process that will determine what plugins are needed, then have gst-decoder take as argument the plugins that it needs to load before dropping privileges

Synchronising broker and sandbox

Another synchronisation issue is that the broker has to wait for the sandboxed process to be ready before interacting with it. As seen before, we have 3 channels through which they interact, and they are of two different types:

the pipe to which the broker writes, which points to stdin in the sandboxed process

the shared memory areas, and their associated control sockets created by the two shmsink

The first one is easy to synchronise: as long as the sandboxed process is not ready, it won't read on the pipe, and fdsink on the broker will just wait until it can write.

The second one is more complex: the shared memory areas are announced over the control socket when they are ready, so this part gets done correctly for free by shmsrc. But the control sockets need to exist when shmsrc tries to connect to them (this happens when going from READY to PAUSED). For now, my workaround is to sleep() for 2 seconds when sandboxeddecodebin goes from NULL to READY, after launching the subprocess. With this, the control sockets are very likely to be created when shmsrc goes from READY to PAUSED.

This is obviously very hackish , and I think I would prefer to use GFileMonitors to check when the sockets are created. Also, I don't know if it's better to do that in sandboxeddecodebin (blocking the switch to READY, but using that file monitoring instead of a sleep(), or going to READY asynchronously if that's possible?) or in shmsrc (in which case I think it should be optional and probably make shmsrc go to PAUSED asynchronously).

Making Preroll work

On the broker side, we have another tricky situation. We typically run a pipeline that contains all of this (the parts between angle brackets are outside of sandboxeddecodebin and given as examples):

<filesrc> ! fdsink (passes data to gst-decoder)
shmsrc (gets data from gst-decoder) ! gdpdepay ! <autoaudiosink>
shmsrc (gets data from gst-decoder) ! gdpdepay ! <autovideosink>

This pipeline is atypical in that it has a sink that is not really at the downstream end of it (fdsink, which sandboxeddecodebin uses to pass data to gst-decoder). Data would go through it, then through gst-decoder and its own pipeline, and then emerge back in the broker's pipeline in the shmsrc elements.

This is a problem at the preroll phase. Preroll is what usually happens when going to PAUSED: the sinks wait until they have a buffer to render before committing the state to PAUSED. The issue with our pipeline, is that the "real" sinks will only get the data they need to commit to the PAUSED state if fdsink lets the data through, but fdsink only passes data once it is in PLAYING state (apart maybe from one initial buffer). On top of that sandboxeddecodebin is a subclass of GstBin. By default, GstBin only changes to the next state (e.g. PLAYING) once all its elements have reached the previous one (e.g. PAUSED). This gives us a nice deadlock: the final (downstream) sinks are waiting for data to come to them to commit their change to PAUSED, GstBin is waiting for all its elements (including final sinks) to finish their transition to PAUSED before asking them to go to PLAYING, and fdsink is waiting to be asked to switch to PLAYING before it lets the data through (that the final sinks are waiting on). My workaround to solve this deadlock is to manually request fdsink to go to PLAYING when sandboxeddecodebin is switching to PAUSED. That way, fdsink is "one state ahead" of the rest, and lets the data go through. I haven't decided yet if it's a very ugly way of solving that issue or if it's an awesome clever hack. If you have an idea of a cleaner solution, feel free to suggest it in the comments!

Analysis of open file descriptors

Once the privileges have been dropped, the sandboxed process is very limited in what it can do, but it still can use all the fds that it has open, which might be a way for it to escape the limitations we want to put on it. For instance, imagine that the sandboxed process has an open fd on the device that contains your home directory (say, /dev/sda). By reading it, it can access all your data, even though the sandbox is designed not to let it open more files.

This precise example is very unlikely to happen in our case, but some less obvious fds could lead to ways to escape the sandbox. That is why I think it is necessary to analyse the file descriptors that are open in the sandboxed process and to try to understand the risks they bring.

I took a "snapshot" of the open fds of gst-decoder while it was decoding a video, and here is what it looks like:

guijemont@thirtytwo:~$ ls -lv /proc/5860/fd
total 0
lr-x------ 1 guijemont guijemont 64 2012-04-18 18:17 0 -> pipe:[8348338]
lrwx------ 1 guijemont guijemont 64 2012-04-18 18:17 1 -> /dev/pts/5
lrwx------ 1 guijemont guijemont 64 2012-04-18 18:17 2 -> /dev/pts/5
lrwx------ 1 guijemont guijemont 64 2012-04-18 18:17 3 -> anon_inode:[eventfd]
lr-x------ 1 guijemont guijemont 64 2012-04-18 18:17 4 -> pipe:[8348342]
l-wx------ 1 guijemont guijemont 64 2012-04-18 18:17 5 -> pipe:[8348342]
lrwx------ 1 guijemont guijemont 64 2012-04-18 18:17 6 -> socket:[8358884]
lr-x------ 1 guijemont guijemont 64 2012-04-18 18:17 7 -> pipe:[8359036]
l-wx------ 1 guijemont guijemont 64 2012-04-18 18:17 8 -> pipe:[8359036]
lrwx------ 1 guijemont guijemont 64 2012-04-18 18:17 9 -> anon_inode:[timerfd]
lrwx------ 1 guijemont guijemont 64 2012-04-18 18:17 10 -> /run/shm/shmpipe. 5860.    0
lrwx------ 1 guijemont guijemont 64 2012-04-18 18:17 11 -> socket:[8358886]
lrwx------ 1 guijemont guijemont 64 2012-04-18 18:17 12 -> socket:[8358887]
lrwx------ 1 guijemont guijemont 64 2012-04-18 18:17 13 -> socket:[8358888]
lrwx------ 1 guijemont guijemont 64 2012-04-18 18:17 14 -> /run/shm/shmpipe. 5860.    1
lrwx------ 1 guijemont guijemont 64 2012-04-18 18:17 15 -> socket:[8358890]
lrwx------ 1 guijemont guijemont 64 2012-04-18 18:17 16 -> socket:[8358891]
lrwx------ 1 guijemont guijemont 64 2012-04-18 18:17 17 -> socket:[8358892]
lrwx------ 1 guijemont guijemont 64 2012-04-18 18:17 18 -> socket:[8358893]
lrwx------ 1 guijemont guijemont 64 2012-04-18 18:17 19 -> socket:[8358894]
lrwx------ 1 guijemont guijemont 64 2012-04-18 18:17 20 -> socket:[8358895]
lrwx------ 1 guijemont guijemont 64 2012-04-18 18:17 21 -> socket:[8348346]
lrwx------ 1 guijemont guijemont 64 2012-04-18 18:17 22 -> socket:[8348347]

I used the "usual suspects" (strace and gdb) to look further into this and understand where each fd comes from, and try to get an idea of how necessary it is and of how much of a risk it brings.

O: that is stdin, and it is the pipe I create when starting sandboxme gst-decoder. Also, it is read only. I don't think an attacker could do much with this, and we need it anyway.
1 and 2: stdout and stderr, plugged to the pseudo tty where my test gst-launch command was running. This is clearly not necessary, and could be exploited for privilege escalation if there's a bad bug somewhere in devpts. I modified the code to make stdout and stderr point to /dev/null instead when gst-decoder is launched. There is an environment variable that can prevent that from happening when one wants to see the debug messages that are output by gst-decoder.
3: This is an event fd used by the GMainContext. I suspect that at least a few of the components we run to decode our stuff needs a GMainLoop, and therefore a GMainContext. And I don't think this is very dangerous, though I don't know much about the complexity and safety of the event system.
4 and 5: this is a pipe used by the GLib unix signal code. Both ends of the pipe are inside the sandbox, so I don't think this would be much of a problem.
6 and 21: shm area control socket for audio. There is one fd created by socket() that is bound to the right temporary file, then another fd is created by accept() when the broker connects. We definitely need that if we want to use shm, which I think we do for performance reasons (I did not run benchmarks though).
7 and 8: pipe open by some code in /usr/lib/frei0r-1/facedetect.so when it is g_module_open()'ed. I don't think we need that at all, and it might be a good motivation to try and not load all plugins. A limited risk though, since both ends of the pipe are inside the sandbox.
9: a timer fd opened in the same conditions as the pipe of fds 7 and 8 (by frei0r's facedetect). This one definitely looks like an unnecessary risk, though I don't know how much of a risk it actually is.
10 and 14: these are the shared memory areas (one for audio, one for video), so I think we definitely want them. The alternative would be to use regular sockets instead to pass the buffers, but I fear it might cost us much in performances for little added security, though this issue could deserve more investigation.
11, 12, 15, 16, 17, 18, 19 and 20: these 8 fds are actually 4 socket pairs, with each time both ends inside the sandbox. They are all created by gst_poll_new(), by the following pieces of code:
- shmsink in gst_shm_sink_start(). It does that twice: once for audio, once for video.
- fdsrc in gst_fd_src_start().
- the system clock (in gst_system_clock_init(), via gst_poll_new_timer()).
13 and 22: shm area control socket for video. There is one fd created by socket() that is bound to the right temporary file, then another fd is created by accept() when the broker connects. We definitely need that if we want to use shm, which I think we do for performance reasons (I did not run benchmarks though).

Play with it!

You can check out the code from its github repository, instructions are available here.

8 Comments

From: Nicolas Trangez
2012-05-08 13:16:34

Instead of relying on a setuid binary and chroot, you might want to look into seccomp filters as well, see http://lwn.net/Articles/494252/

From: Alex Elsayed
2012-05-08 18:59:58

One thing that might be interesting is if this went deeper - a sort of wrapper bin that took an arbitrary pipeline and sandboxed it. Perhaps even modularize the sandboxing method - you could have a sandboxbin that takes a pipeline, the name of the policy module (sandboxme, seccomp), and parameters for what to permit/deny.
Alternately, this could be very useful to build into gstreamer itself so as to protect arbitrary elements.

From: guijemont
2012-05-08 20:16:10

Yes, that is an idea. I think the ideal would be to have a system that can be easily adapted to one or another sandbox system.

From: guijemont
2012-05-08 20:19:54

We were indeed discussing that kind of idea on #gstreamer today (dont know if logs are available somewhere), and it does sound like an interesting direction, though there is still quite a lot of work to do...

From: Kevin
2013-02-11 17:47:58

It may be interesting to try and disable the new file open blocking code and instead chroot/bindmount the process into a directory that only has the gstreamer plugins/dependencies. It could then load plugins on demand and still would be relatively safe from escapes. Some further restrictions could be disabling of file creates and writing. Maybe simply bindmounting with a read only bindmount would work.

From: amluto
2013-02-11 19:40:47

One thing to watch out for: if youre using the setuid sandbox, theres a decent chance that the sandboxed code can DoS the caller by doing nasty things like ftruncating the shm areas. If you dont need to remap the fds, you could close them but leave them mapped, reducing the ability to mess with them.

From: Ryan Lortie
2013-02-11 20:21:40

Interesting post.
About fds 4 and 5: GMainContext no longer uses a pipe pair for UNIX signal handling, and stopped doing so around the time that eventfd was added (in the same unstable cycle).
Unless youre using an (unstable) version of GLib from between these two changes Id be surprised if that pipe pair is from GMainContext.
Its possible in some cases that GLib will fall back to using pipes instead of an eventfd, but that really shouldnt be happening on Linux.
If you find that a modern GLib really is creating a pair of pipes for signal reporting on a modern Linux kernel then can you please file a bug about that?
Thanks!

From: guijemont
2013-02-12 11:59:27

@Ryan: not sure any more what version of GLib I was using. I did that back in april/may, and theres a possibility that I was using a git glib from my irregularly updated jhbuild. I will try to find time this week end to have a look at that.

Name:	(required)
E-mail:	(required, not published)
Website:	(optional)