Backing up proxmox backup with rsync and rclone
Introduction
I recently started using proxmox in my homelab instead of using VMWare ESXi - this lead me to Proxmox Backup Server since it allows delta backups like my previous backup solution for ESXi.
With my old backup solution I backed up my “backup” to the cloud to my provider rsync which simplied is just a ssh connection where they have enabled certain programs to run and the underlying storage they use is ZFS.
Unfortunately I do not have the budget to use their ZFS solution, since that would most definately be the speediest solution to backing up my backups because I am also running my filesystem on ZFS, which would have allowed me to just replicate snapshots which is very fast since it is just streams of data and not files that gets copied.
With my old backup solution VEEAM - a couple of HUGE files was generated that contained all the delta updates of the backups. The new solution stores a lot of files in a not so ideal directory structure in my opinion. So instead of having a coupld of HUGE files I need to transfer to the “cloud” - I have many thousands - in fact at this moment there are 97k files that is my backup - all these needs to be synced to rsync.net.
The solutions
A couple of methods springs to mind
- Simply copy the files via scp on a schedule
- rsync the files
- rclone the files
Simple is good and for a first time copy scp will probably be okay’is - but as a solution its not good, since it is not a sync protocol - and to make it into a sync protocol you would need to create scripts to compare destination with source and then only copy the delta.
Enter rsync
Rsync is a program that can synchronize directories. These directories can be on the same system, different systems, so its very flexible and works great. This is what I used previously to transfer my backups to rsync.net when I was using VEEAM - it was not great and transferring my weekly full backup took hours - even though I have a 1Gbit internet connection. But I had learned to live with the speed it took since it was simple and it worked.
When I converted my homelab to proxmox and also to their backup solution the requirements was different - no longer was it few massive files, but instead a lot of “smaller” files that needed to be transferred.
My initial thoughts were that rsync would be perfect for this, but it turned out that as time went by - the time taken just kept increasing from less than an hour to more than 4 hours.
I think the reason for this slow down is the way the files are stored which have an overhead every single time file system operations has to happen. The storage format is great for direct file access, since you can compute the exact location of any given file, but for operations like ls -l
or du -hs .
it is extremely slow.
I have had a discussion with them about it and they seem to be of the opinion that they have already chosen the best storage solution.
Enter rclone
Rclone is a program similar to rsync, but with integrations to different cloud providers, so you can sync a directory with e.g. amazon.
Running clone help backends
shows the current list of backends:
[email protected]:~# rclone help backends
All rclone backends:
alias Alias for an existing remote
acd Amazon Drive
azureblob Microsoft Azure Blob Storage
b2 Backblaze B2
box Box
crypt Encrypt/Decrypt a remote
cache Cache a remote
chunker Transparently chunk/split large files
drive Google Drive
dropbox Dropbox
fichier 1Fichier
ftp FTP Connection
gcs Google Cloud Storage (this is not Google Drive)
gphotos Google Photos
http http Connection
swift OpenStack Swift (Rackspace Cloud Files, Memset Memstore, OVH)
hubic Hubic
jottacloud Jottacloud
koofr Koofr
local Local Disk
mailru Mail.ru Cloud
memory In memory object storage system.
onedrive Microsoft OneDrive
opendrive OpenDrive
pcloud Pcloud
premiumizeme premiumize.me
putio Put.io
s3 Amazon S3 Compliant Storage Provider (AWS, Alibaba, Ceph, Digital Ocean, Dreamhost, IBM COS, Minio, Tencent COS, etc)
seafile seafile
sftp SSH/SFTP Connection
sharefile Citrix Sharefile
sugarsync Sugarsync
union Union merges the contents of several upstream fs
webdav Webdav
yandex Yandex Disk
As you can see the list is big and it probably keeps on growing. Rsync.net is not on the list, which is expected, since its a small provider compared to the big players - but that is okay since rclone also have generic providers, which just uses any server of a given type.
I will use the sftp
provider, since that is just a generic ssh
connection that is required.
The advantage of clone vs rsync is that rclone has native support for having multiple threads running and syncing, which is ideal if you have many small files like I have now.
Rclone uses a config file to store options for a given destination or remote as they are called in rclone.
So I started with running:
rclone config --config ./rsync.rclone.conf
Which opens up the configuration editor which will write to the configuration file I added as a parameter. If not --config
parameter is used - by default it stores the configuration in ~/.config/rclone.conf
- which is fine for most cases - but I like to have my configuration files stored in a central place, not in a users home directory.
When my configuration session is done - the configuration file ended up looking like this:
[rsync_net]
type = sftp
host = <redacted>.rsync.net
user = <redacted>
key_file = /root/.ssh/id_rsa
use_insecure_cipher = true
md5sum_command = md5 -r
sha1sum_command = sha1 -r
Which basically just tells rclone that a remote rsync_net
is using sftp and the options related to that remote. If I was using another backend type - different options would be present.
With that configuration file in hand I can craft a simple script that I can run with cron:
#!/bin/sh
write_msg()
{
echo $(date +"20%y-%m-%d %H:%M:%S") $1
}
duration()
{
DURATION=$2
HOUR=$((DURATION/3600))
HOURINSEC=$((HOUR*3600))
DURATION=$((DURATION-HOURINSEC))
MINUTE=$(((DURATION/60)))
SECOND=$((DURATION%60))
write_msg "$1 finished took $HOUR hours, $MINUTE minutes, $SECOND seconds"
}
if [ "$#" -ne 2 ]
then
echo "Invalid number of arguments, specificy <source> <destination>"
echo "where <source> is an absolute path to a directory locally, i.e. /mnt/backup/veeam"
echo "where <destination> is relative, i.e. backup/mybackup"
exit
fi
BASEDIR=$(dirname "$0")
CONFIG="$BASEDIR/rsync.rclone.conf"
SOURCE=$1
DEST=$2
THREADS=24
START=$(date +%s)
write_msg "rclone script running from $BASEDIR"
write_msg "Starting rclone of $SOURCE to rsync_net:$DEST"
CMD="rclone sync --progress --stats-one-line --stats=30s --transfers $THREADS--checkers $THREADS --config $CONFIG $SOURCE rsync_net:$DEST"
write_msg "using command: $CMD"
$CMD
END=$(date +%s)
duration "rclone" $((END-START))
Most of the script it irrelevant - the actual juicy parts is the line rclone sync --progress --stats-one-line --stats=30s --transfers 24 --checkers 24 --config $CONFIG $SOURCE rsync_net:$DEST
Which basically tells rclone to sync the $SOURCE
to $DEST
using 24 threads. This will not give 24 times the speed of a single thread - but since a lot of the time is spent waiting for I/O, then it makes sense to have more than one thread. 24 is twice the number of cores I have in my backup server - if I had less cores I would tweak accordingly.
Using these settings my backup time went from around an hour to less than 10 minutes - and I expect that even as my backup repository grows - rclone will keep that ratio between rsync performance and its own performance more or less the same.
I will probably tweak the --transfers
and --checkers
number to find a sweet spot - 24 might be too many - and would certainly be too many if my backup repository was on normal spinning harddrives - but since they reside on a ssd pool, it should easily sustain 24 concurrent reads.
A problem with using many threads is if rsync starts to throttle based on the number of connections and the threads is being blocked, then I would need to tweak the number of threads down until I hit the limit. But lets hope that does not happen until I have found a good number of threads.
Cron
So with the script at hand I have simply added the following line to /etc/crontab
30 10 * * * root /mnt/tank3/system/tasks/rclone.sh /mnt/backup/proxmox_backup proxmox_backup
Which states that @ 10:30 each day cron should run my script and sync /mnt/backup/proxmox_backup
to rsync.net into the folder proxmox_backup
.
So now I have a backup of my backup - and a much faster transfer speed than what I had with rsync. So if you have many files that change often - I would suggest you take a look at rclone as well.