Kubernetes simplified backup
Backup is important unless you like to spend a lot of time recreating what was lost - if possible.
My kubernetes cluster is mostly stateless, with any state stored outside of the cluster on dedicated storage.
So to facilitate backup of the cluster I initially started doing full machine backups of all nodes, but that seemed silly since I can recreate a node in 5-10 minutes with my PXE setup that boots and installs Rocky Linux plus all the required prequisites to allow the machine to work as a kubernetes node.
When the machine is up and running its just a matter of following my own cook book and then the machine should be part of a cluster.
So “all” I need is a backup of the cluster configuration, which is stored inside the cluster itself inside etcd.
I found this page - which I copied most of the code from and that gave most of my control node backup script:
#!/bin/sh export MAILTO='[email protected]' export PBS_PASSWORD='cc535e84-b425-4bb4-8575-d8cb886d0e2f' DIR='/root/backup' cd /root if [ -d "$DIR" ]; then rm -rf $DIR fi mkdir $DIR cp -rx /k8s/config $DIR cp -rx /k8s/dockerconfig $DIR cp -rx /k8s/kubeletconfig $DIR sudo docker run --rm -v $(pwd)/backup:/backup \ --network host \ -v /etc/kubernetes/pki/etcd:/etc/kubernetes/pki/etcd \ --env ETCDCTL_API=3 \ k8s.gcr.io/etcd:3.4.3-0 \ etcdctl --endpoints=https://127.0.0.1:2379 \ --cacert=/etc/kubernetes/pki/etcd/ca.crt \ --cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt \ --key=/etc/kubernetes/pki/etcd/healthcheck-client.key \ snapshot save /backup/etcd-snapshot-latest.db BACKUP_ID=`hostname` /usr/local/sbin/proxmox-backup-client backup root.pxar:/root/backup --backup-id $BACKUP_ID --repository '[email protected][email protected]:backup'
Explained simply - it creates
/root/backup- and copies the configuration files into that directory - then it launches a docker instance that snapshots and saves the etcd database into the same directory.
This allows the proxmox-backup-client to backup the important parts of the node into my backup server, where I can get to is easily in case I lose both of my control plane nodes.
This script runs on my primary control-plane node as a cron job - so I always have a daily backup of the cluster.
To restore I would need to re-initialize the machine with my pxe boot and install the configuration/settings as per my setup guide - and then I would have to restore the kubernetes configuration and restore the etcd data inside the cluster.
Something similar to:
#!/bin/sh DIR='/root/backup' export PBS_PASSWORD='cc535e84-b425-4bb4-8575-d8cb886d0e2f' BACKUP_ID=`hostname` /usr/local/sbin/proxmox-backup-client restore host/$BACKUP_ID/2022-04-27T15:37:39Z root.pxar $DIR --repository '[email protected][email protected]:backup'
That will give me my backed up data located in
Then I can simply reverse my backup from above:
With the knowledge that this is how my mount binds are made so all my docker/kubernetes information is below
/k8s/docker /var/lib/docker none nofail,bind 0 0 /k8s/config /etc/kubernetes none nofail,bind 0 0 /k8s/dockerconfig /etc/docker none nofail,bind 0 0 /k8s/kubeletconfig /var/lib/kubelet none nofail,bind 0 0
sudo cp -rx $DIR/config /k8s sudo cp -rx $DIR/dockerconfig /k8s sudo cp -rx $DIR/kubeletconfig /K8s
Then we restore etcd:
sudo mkdir -p /k8s/etcd
sudo mkdir -p /k8s/etcd sudo docker run --rm \ -v $(pwd)/backup:/backup \ -v /k8s/etcd:/var/lib/etcd \ --env ETCDCTL_API=3 \ k8s.gcr.io/etcd:3.4.3-0 \ /bin/sh -c "etcdctl snapshot restore '/backup/etcd-snapshot-latest.db' ; mv /default.etcd/member/ /var/lib/etcd/"
With etcd restored into the correct location in the filesystem I can grab my cluster-init.yaml file from my git repository and run the cluster-init.
sudo kubeadm init --config ~/cluster-init.yml --upload-certs --ignore-preflight-errors=DirAvailable--var-lib-etcd
--ignore-preflight-errors=DirAvailable--var-lib-etcd argument simply tells kubeadm to not init a new etcd and re-use the existing directory - and not complain about an already existing directory.
If everything works as expected, the cluster should be up and running again and I should be able to re-add nodes etc.
Leave comments below if you want to add something.