⟵ All articles

Why you should Dockerize your builds

October 15, 2020

Image credit: kees torn

Have you ever found yourself saying any of the following:

"That's weird, it works on my machine."

"I KNOW this was working yesterday before I updated my system!"

"Hello, <New Hire> - first thing to do is get your workstation set up. Just follow the instructions in these 6 different README files that were last updated 4 years ago. Feel free to ask our very friendly, very not-busy senior engineers if you run into any problems. We've budgeted 2-3 days."

"Gee, I really wish we could do continuous integration tests on real hardware but that sounds like a nightmare to set up."

The fundamental insight of Dockerized builds is a simple one - your source code alone does not fully determine the output of your build process. This seems like an obvious point, but if you've only ever done builds on your own workstation, it's easy to overlook the implicit dependencies this introduces - the specific version of your compiler and linker and their command-line arguments; system libraries and headers; possibly even the date and time (for example if you include a timestamp within your built image).

While this is not terrible when working as a lone developer, it can cause problems when working with other people - after all, "Software engineering is what happens to programming when you add time and other programmers." Developer toolchain configurations and versions can drift, old code can stop working properly with new versions of the compiler, and ultimately build outputs can end up differing slightly in hard-to-trace ways across machines.

Historically people have come up with a variety of solutions to these problems, but each has its own set of drawbacks. [See Appendix A]. Luckily, we have Docker now, which addresses all of these issues.

Docker gives us:

  1. Images that are totally isolated from each other - no more problems due to multiple versions of the same toolchain co-existing on the same machine
  2. Images that are self-contained by default - there's no chance of accidentally depending on something on your host machine.
  3. Images that are immutable - if two developers are using the same docker image, they're running the exact same code.
  4. Images that are stored separately from the repo, avoiding repository bloat
  5. First-class support for Linux, Windows, and MacOS
  6. Seamless integration with cloud-based build tools

In this series of blog posts we'll look at how to use a docker image to build your code locally, how to use that same image to do cloud-based builds, and finally how to use Lager to run continuous integration tests on real hardware.

 

Appendix A: Other solutions to the dependency problem.

  • Commit your entire build toolchain to source control - can lead to repository bloat, especially if multiple platforms / toolchains need to be supported; many source control systems don't effectively deal with large numbers of large binary files
  • Nix - No first-class Windows support, and may require substantial learning curve to set up a Nix package to support your build toolchain.
  • Saltstack / Puppet / Ansible / other configuration management software - Requires extra care to ensure all versions are fully specified and that there are no external dependencies (e.g. apt repositories). Requires a full rebuild each time e.g. in a CI environment, since it's a specification of how to produce a system and not the system itself.