You don’t have to back up everything about every container, but it’s important to back up configurations for running and managing them in case of disaster. Credit: Thinkstock Yes, your container infrastructure needs some type of backup. Kubernetes and Docker will not magically build themselves after a disaster. As discussed in a separate article, you don’t need to back up the running state of each container, but you will need to back up the configuration used to run and manage your containers. Here’s a quick reminder of what you’ll need to back up. Configuration and desired-state information The Dockerfiles used to build your images and all versions of those files The images created from the Dockerfile and used to run each container Kubernetes etcd & other – K8s databases that info on cluster state Deployments – YAML files describing each deployment Persistent data created or changed by containers Persistent volumes Databases Dockerfiles Docker containers are run from images, and images are built from Dockerfiles. A proper Docker configuration would first use some kind of repository such as GitHub as a version-control system for all Dockerfiles. Do not create ad hoc containers using ad hoc images built from ad hoc Dockerfiles. All Dockerfiles should be stored in a repository that allows you to pull historical versions of that Dockerfile should there be a problem with the current build. You should also have some kind of repository where you store the YAML files associated with each K8s deployment. These are text files that can benefit from a version-control system. These repositories then need to be backed up. One of the most popular repositories is GitHub, which offers a number of ways to back up your repository. There are a variety of scripts using the provided APIs to download a current backup of your repository. There are also third-party commercial tools you can use to backup GitHub or whatever repository you are using. If you haven’t follow the advice above and have running containers based on images that you no longer have the Dockerfiles for, you can use the docker image history command or a tool such as dfimage to create a Dockerfile from your current images. Put those Dockerfiles in a repository and start backing it up! But, honestly, don’t get in this situation. Always store and back up the Dockerfiles and YAML files used to create your environment. Docker Images The current images used to run your containers should also be stored in a repository. (Of course, if you’re running Docker images in Kubernetes, you’re already doing that.) You can use a private repo such as a Docker registry, or a public repo like Dockerhub. Cloud providers can also provide you a private repo to store your images. The contents of that repo should then be backed up. A simple Google search such as “Dockerhub backup” can yield a surprising number of options. If you do not have the current image used to run your containers, you can create one using the docker commit command. You can then create a Dockerfile from that image using docker image history or the tool dfimage. Kubernetes etcd The Kubernetes etcd database is very important and should be backed up using the etcdctl snapshot save db command. This will create the file snapshot.db in the current directory. That file should then be backed up to external storage. If you are using commercial backup software, you can easily trigger the etcdctl snapshot save command before taking a backup of the directory where the snapshot.db will be created. That’s one way you can integrate this backup into your commercial backup environment. Take a look at this recovery documentation. Persistent volumes There are a variety of ways that containers can be given access to persistent storage that can be used to store or create data. Traditional docker volumes reside in a subdirectory of the Docker configuration. Bind mounts are simply any directory on a Docker host that is mounted inside a container (using the bind mount command). For a variety of reasons, traditional volumes are preferred by the Docker community, but for the purposes of backup traditional volumes and bind mounts are essentially the same. You can also mount a network-file-system (NFS) directory or an object from an object-storage system as a volume inside a container. The method you use to backup your persistent volumes is going to be based on which of the above options you use for the container. However, all of them will have the same problem: If the data is changing, you will need to deal with that in order to get a consistent backup. One way is to shut down any containers using that particular volume. This is a bit old-school, but it’s one of the challenges created by the container world, since the typical method of putting a backup agent in the container isn’t really an option. Once shut down, the volume can be backed up. If it is a traditional Docker volume, you can back it up by mounting it to another container that won’t change its data while it’s backing up, and then creating a tar image of the volume in a bind-mounted volume that you then back up using whatever your backup system uses. However, this is really hard to do in Kubernetes. This is one reason stateful information is best stored in a database, not a filesystem. Please consider this issue when designing your K8s infrastructure. Also, if you’re using a bind-mounted directory, an NFS-mounted filesystem, or an object storage system as your persistent storage system, you can use whatever is the best way to back up that storage system. This could be a snapshot followed by replication, or simply running your commercial backup software on that system. These methods are likely to provide a much more consistent backup than a typical file-level backup of that same volume. Databases The next backup challenge is when a container is using a database to store its data. These databases need to be backed up in a way that will guarantee their integrity. Depending on the database, the method mentioned above might work: Shut down the container accessing the database, then back up the directory where its files are stored. However, the downtime required by this method may not be appropriate. Another method is to connect directly to the database engine itself and ask it to run a backup to a file you can then back up. If the database is running inside a container, you will need to first use a bind mount to attach a volume that it can back up, so its backup can exist outside the container. Then run the command that database uses (such as mysqldump) to create a backup. Then make sure to back up the file it creates using your backup system. What if you don’t know what containers are using what storage or what databases? One solution might be to use the docker ps command to list the running containers, then use the docker inspect command to display each container’s configuration. There is a section called Mounts that will tell you what volumes are mounted where. Any bind mounts will also be specified in the YAML files that you submitted to Kubernetes. Commercial backup solutions There are a variety of commercial backup solutions that can protect some or all of the data mentioned above. The following is a very quick summary: Commvault’s virtual server agent can act as a proxy to backup containers and their images. Cohesity offers data protection for K8s namespaces Heptio (now a VMware company) offers Velero, backup designed for K8s Contino, Datacore, and Portworx offer storage designed for K8s and containers, and also support backing up that information Given the variety of ways K8s and Docker can be configured, it’s very difficult to cover all of this in a single article. But hopefully, this has given you something to think about, or maybe helped you realize something you haven’t been backing up but should be. Keep it safe out there! Related content opinion Can your cloud backup provider fail? Cloud backup providers aren’t infallible. Be sure to ask hard questions of providers about their storage redundancy, geo-replication, data integrity measures, and disaster recovery capabilities. By Curtis Preston Apr 19, 2024 7 mins Backup and Recovery Cloud Computing Data Center news Cisco marries AI and security with cloud-based data center offering Cisco announces AI-based Hypershield, a self-upgrading security fabric that's designed to protect distributed applications, devices and data. By Michael Cooney Apr 18, 2024 5 mins Network Security Data Center how-to Shredding files on Linux with the shred command The shred command is a good option for removing files from a Linux system in a way that makes them virtually impossible to recover. By Sandra Henry-Stocker Apr 18, 2024 4 mins Linux news Intel announces edge AI processors New edge-optimized processors and FPGAs will power AI-enabled devices in vertical industries including retail, industrial and healthcare. By Andy Patrizio Apr 18, 2024 3 mins CPUs and Processors Edge Computing PODCASTS VIDEOS RESOURCES EVENTS NEWSLETTERS Newsletter Promo Module Test Description for newsletter promo module. Please enter a valid email address Subscribe