Containerization

Containerization is a tool that allows you to launch and work with many different environments (known as “containers”) on a single computer, each of which might be running a different operating system, a different set of installed libraries, different environment variables, etc.  Each container operates in isolation from the others; effectively, you can think of each container as a totally different computer, which just happens to be sharing the same hardware as the other containers.  Although similar in some ways to Virtual Machines, containerization is based on fundamentally different technology, and is generally much easier to set up and has a dramatically smaller performance cost.  For most purposes, the performance cost of using containers is negligible.

This opens up all sorts of possibilities.  Do you own a Windows machine, but want to test a code on Linux?  No problem – just launch a Linux container on your Windows machine, and test away!  Are you having trouble reproducing a bug someone else has encountered, and suspect the problem might be dependent on some detail of the runtime environment?  There’s no need to mess with (and potentially break) your own environment in pursuit of the bug – just try some tests in a few containers, leaving your own environment unchanged.  In fact, because of the clean isolation and reproducibility of environments that is provided by containerization, anything you can do in a container should probably be done in a container.

Moreover, you can easily build and deploy containers.  For example, you could build a code you are developing (along with all of its dependencies) within a container, and then deploy the container.  Because the container contains your compiled software and everything needed to run it, including the operating system, the end user doesn’t need to install anything on their system.  All they need to do is launch your container and start running calculations.

By far the most commonly used containerization tool is Docker, which is what we recommend getting started with.  It is important to note that using Docker effectively requires root access.  HPC centers are never going to give you root access to their expensive machines, which precludes Docker from use in an HPC context.  Fortunately, there are several containerization alternatives that have been developed specifically for use on HPC machines, with the most prominent being Apptainer.  If you are interested in using containerization on an HPC machine, ask the organization that operates the machine about which containerization solution(s) they recommend.

Recommended Software (not usable with HPC):

Recommended Hosting Service:

Alternatives for HPC:

Container Orchestration: