Create lightweight Docker containers with Buildroot

Highlights of this article (TL,DR): we’ll show how to use buildroot to create a basic but fully functional container using less than 4 MB of disk space (uncompressed). Then we will apply the same technique to obtain a PostgreSQL image which fits in less than 20 MB (not including your databases, of course).

You can play with those containers at once if you want. Just run “docker run jpetazzo/pglite”, and within seconds, you will have a PostgreSQL server running on your machine!

I like containers, because they are lighter than virtual machines. This means that they will use less disk space, less memory, and ultimately be cheaper and faster than their heavier counterparts. They also boot much faster. Great.

But how “lightweight” is “lightweight”? I wanted to know. We all wanted to know. We already have a small image, docker-ut, using a statically compiled buildbox (it’s built using this script). It uses about 7 MB of disk space, and is only good to run simple shell scripts; but it is fully functional—and perfect for Docker unit tests.

How can we build something even smaller? And how can we build something more useful (e.g., a PostgreSQL server), but with a ridiculously low footprint?

To build really small systems, you have to look at embedded systems. That’s where you find the experts about everything small-footprint and space efficient. In the world of embedded systems, sometimes you have to cram a complete system, including Linux kernel, drivers, start up scripts, essential libraries, web and SSH servers, WiFi access point management code, radius server, OpenVPN client, bittorrent downloader — all in 4 MB of flash. Sounds like what we need, right?

There are many tools out there to build images for embedded systems. We decided to use buildroot. Quoting buildroot’s project page: “Buildroot is a set of Makefiles and patches that makes it easy to generate a complete embedded Linux system.” Let’s put it to the test!

The first step is to download and unpack buildroot:

Buildroot itself is rather small, because it doesn’t include the source of all the things that it compiles. It will download those later. Now let’s dive in:

The first thing is to tell buildroot what we want to build. If you have ever built your own kernel, this step will look familiar:

For now, we will change just one thing: tell buildroot that we want to compile for a 64 bits traget. Go to the “target architecture” menu, and select x86_64. Then exit (save along the way). Now brew a big pot of coffee, and fire up the build:

This will take a while (from 10 minutes to a couple of hours, depending on your local machine beefiness). This takes so long because it will first compile a toolchain. It means that instead of using your default compiler and libraries, it will: download and compile a preset version of gcc; download and compile uclibc (a small-footprint libc); and then it will use those to compile everything else. This sounds like a lot of extra work, but it brings two huge advantages:

  • if you want to build for a different architecture (e.g. that Raspberry Pi), it will work exactly the same way;
  • it abstracts your local compiler: your version of gcc/clang/other is irrelevant, since your image will be built by the versions fixed by buildroot anyway.

At the end of the build, our minimalist container is ready! Let’s have a look:

You should see a small, lean, rootfs.tar file, containing the image to be imported in Docker. But it’s not quite ready yet. We need to fix a few things.

  • Docker sets the DNS configuration by bind-mounting over /etc/resolv.conf. This means that /etc/resolv.conf has to be a standard file. By default, buildroot makes it a symlink. We have to replace that symlink with a file (an empty file will do).
  • Likewise, Docker “injects” itself within containers by bind-mounting over /sbin/init. This means that /sbin/init should be a regular file as well. By default, buildroot makes it a symlink to busybox. We will change that, too.
  • Docker injects itself within containers, and (as of I write this) it is dynamically linked. This means that it requires a couple of libraries to run correctly. We will need to add those libraries to the container.

(Note: Docker will eventually switch to static linkage, which means that the last step won’t be necessary anymore.)

We could unpack the tar file, do our changes, and repack; but that would be boring. So instead, we will be fancy and update the file on the fly.

Let’s create an extra directory, and populate it with those “additions”:

The paths to the libraries might be different on your machine. In doubt, you can run ldd $(which docker) to see which libraries are used by your local Docker install.

Then, create a new tarball including those extra files:

Last but not least, the “import” command will bring this image into Docker. We will name it “dietfs”:

We’re done! Let’s make sure that everything worked properly, by creating a new container with this image:

For what it’s worth, I put together a small fixup script on Gist, to automate those steps, so you can also execute it like this:

The result is a rather small image; less than 3.5 MB:

Not Bad!
Not Bad!

Now, how do we build something more complex, like a PostgreSQL server?

Why PostgreSQL? Two reasons. One: it’s awesome. Two: I didn’t find a PostgreSQL package in buildroot, so it was an excellent opportunity to learn how to include something “from scratch”, as opposed to merely ticking a checkbox and recompiling away.

First, we want to create a directory for our new package. From buildroot’s top directory:

Then, we need to put a couple of files in that directory. For your convenience, I stored them on Gist:

Let’s have a look at those files now. First, Config.in: it is used by make menuconfig to display a checkbox for our new package (yay!), but also to define some build dependencies. In that case, we need IPV6 support.

How does one know which dependencies to use? I confess that I tried first with no dependency at all. The build failed, so I had a look at the error messages, saw that it complained about missing IPV6 headers; so I fixed the issue by adding the required dependencies.

The other file, postgres.mk, contains the actual build instructions:

As you can see, it is pretty straightforward. The main thing is to define some variables to tell buildroot where it should fetch PostgreSQL source code. We don’t have to provide actual build instructions, because PostgreSQL uses autotools. (“This project uses autotools” means that you typically compile it with "./configure && make && make install ; this probably rings a bell if you ever compiled a significant project  manually on any kind of UNIX system!)

The build instructions will actually be expanded from the last line. If you want more details about buildroot’s operation, have a look at buildroot’s autotools package tutorial.

We can see that postgres.mk also defines more dependencies: readline and zlib. So what’s the difference between the CONF_OPT, DEPENDENCIES, and the “depends” previously seen in Config.in?

  • CONF_OPT provides extra flags which will be passed to ./configure. In this case, the compilation was failing, telling me that I should specify the path to timezone data. I looked around and figured out the right flag.
  • DEPENDENCIES tells buildroot to compile extra libraries before taking care of our package. Guess what: when I tried to compile, it failed and complained about missing readline and zlib; so I added those dependencies and that’s it.
  • “depends” in Config.in is a toolchain dependency. It is not really a library; it merely tells buildroot “hey, when you will compile uclibc, make sure to include IPV6 support, will you?”. It has a strong implication: when you change the configuration of the toolchain (C library or compiler), you have to recompile everything: the toolchain and everything which was compiled with it. This will obviously be longer than just recompiling a single package. It is done with the command make clean all.

Last but not least, we need to include our Config.in file in the top-level Config.in. The quick and dirty way is to do this (from buildroot top directory):

Note: normally, we should do this in a neat submenu section within e.g. packages/Config.in. But this way will save us some hassle navigating through the menus.

Alright, now run make menuconfig again; go to “Toolchain”, enable IPV6 support, go back to the main menu, and enable “postgres”. Now recompile everything with make clean all. This will take a while.

Just like before, we need to “fixup” the resulting image:

We now have a Docker image with PostgreSQL in it; but it is not enough. We still need to setup the image to start PostgreSQL automatically, and even before that, PostgreSQL will have to initialize its data directory (with initdb). We will use a Dockerfile and a custom script for that.

What’s a Dockerfile? A Dockerfile contains basic instructions telling Docker how to build an image. When you use Docker for the first time, you will probably use “docker run” and “docker commit” to create new images; but you should quickly move to Dockerfiles and “docker build” because it automates those operations and makes it easier to share “recipes” to build images.

Let’s start with the custom script. We want this script to run automatically within the container when it starts. Make a new empty directory, and create the following init file in it:

PostgreSQL will refuse to run as root, so we use the default user (conveniently provided by buildroot). We create /data to hold PostgreSQL data files, assign it to the non-privileged user. We also generate a random password, save it to /pwfile, and display it (to make it easier to retrieve later). We can then run initdb to actually create the data files. Then, we extend pg_hba.conf to authorize connections from the network (by default, only local connections are allowed). The last step is to actually start the server.

Make sure that the script is executable:

Now, in the same directory, we will create the following Dockerfile, to actually inject the previous script in a new image:

The fixup.sh script has imported our image under the name “dietfs”, so our Dockerfile will start with from dietfs, to tell Docker that we want to use that image as a base. Then, we add all the files in the current directory to the root of our image. This will also inject the Dockerfile itself, but we don’t care. We expose TCP port 5432, and finally tell Docker that by default, when a container is created from this image, it should run our /init script. You can read more about the Dockerfile syntax in Docker’s documentation.

The next step is to build the new image using our Dockerfile:

That’s it. You can now start a new PostgreSQL instance:

The output will include the password, and then the first log messages from the server:

Weak Password Is Weak! Our password is random, but in only includes hexadecimal digits (i.e. [0-9a-f]). You can make it better by including base64 in the image, and using base64 instead of md5sum. Alternatively, you can use longer passwords.

Take note of the password. It’s OK to hit “Ctrl-C” now: the container will still run in the background. Let’s check which port was allocated for our container. docker ps will show us all the containers currently running; but to make things even simpler, we will use docker ps -l, which only shows the latest container.

Alright, that’s port 49168. Does it really work? Let’s check for ourselves! You can try locally if you have a PostgreSQL client installed on your Docker machine; or from anywhere else (just replace “localhost” with the hostname or IP address of your Docker machine).

A small note about sizes: the image takes about 16 MB, but the data files take almost 24 MB. So the total footprint is really about 40 MB.

What if we want to automate the creation of our PostgreSQL container, to run our own PostgreSQL-as-a-Service platform? Easy, with just a tiny bit of shell trickery!

That’s it! If you name your image “yourname/pglite” instead of just “pglite”, you will be able to “docker push” it to the Docker Public Registry, and to “docker pull” it from any other Docker host anywhere in the world. You are one PHP script away from setting up your own PostgreSQL-as-a-Service provider :-)

Extra notes: if you run into weird issues with casing (e.g. the xt_CONNMARK.h issue mentioned in the comments), check if you are building from a Vagrant/VirtualBox shared folder. If it is the case, try again from a “local” volume (i.e. not a shared folder) and see if it works better! Thanks to Bryan Murphy for reporting this.

About Jérôme Petazzoni

sam

Jérôme is a senior engineer at dotCloud, where he rotates between Ops, Support and Evangelist duties and has earned the nickname of “master Yoda”. In a previous life he built and operated large scale Xen hosting back when EC2 was just the name of a plane, supervized the deployment of fiber interconnects through the French subway, built a specialized GIS to visualize fiber infrastructure, specialized in commando deployments of large-scale computer systems in bandwidth-constrained environments such as conference centers, and various other feats of technical wizardry. He cares for the servers powering dotCloud, helps our users feel at home on the platform, and documents the many ways to use dotCloud in articles, tutorials and sample applications. He’s also an avid dotCloud power user who has deployed just about anything on dotCloud – look for one of his many custom services on our Github repository.

Connect with Jérôme on Twitter! @jpetazzo

EmailFacebookTwitter

7 Responses to “Create lightweight Docker containers with Buildroot”

  1. Carle

    “A small note about sizes: the image takes about 16 MB, but the data files take almost 24 MB. So the total footprint is really about 40 MB.”

    What are the “data files” on a fresh install of Postgres?

    • Jerome Petazzoni

      The “data files” are essentially the empty databases created by Postgres, and the transaction log. Even though the databases are empty, they need some space (because they are not technically empty: they contain a skeleton for tables, indexes, etc. Regarding the transaction log, it uses fixed-size segments, which are pre-allocated. It’s possible to change the size of the segments at compile time, but I thought it wasn’t worth the trouble. (Hey, it’s not as bad a MongoDB, which allocates 2 GB per database, “just in case” :D)

  2. Greg Edwards

    Awesome article. I’ve googled for an answer to this problem, but haven’t found one, so thought I would ask you. When I follow the instructions you gave, I get this error during the “make”:

    This is using a Vagrant box (VirtualBox) with Ubuntu 13.04 installed.

  3. Greg Edwards

    Problem solved. There was a case mismatch for many of the header files. Just had to edit the Kbuild file to change the expected case, and that solved it.

    • Greg Edwards

      Strange… after solving the case-mismatch, another error occurs. Not sure how to diagnose this one:

      • Jerome Petazzoni

        Very weird. Are you using the same version of buildroot?
        Did you enable anything special?

        • Greg Edwards

          Strange… rebuilding the VM made everything work, but then it still occurred intermittently after that.

          Dietfs, without the Postgres packages, seems useful, so I pushed that to https://index.docker.io/u/greglearns/dietfs/ in case others want to use it.

          Jerome, thank you so much for writing this up. It was great getting your guidance on this.

Comments are closed.