Backing up proxmox backup with rsync and rclone

Sat, May 28, 2022 7-minute read

Introduction

I recently started using proxmox in my homelab instead of using VMWare ESXi - this lead me to Proxmox Backup Server since it allows delta backups like my previous backup solution for ESXi.

With my old backup solution I backed up my “backup” to the cloud to my provider rsync which simplied is just a ssh connection where they have enabled certain programs to run and the underlying storage they use is ZFS.

Unfortunately I do not have the budget to use their ZFS solution, since that would most definately be the speediest solution to backing up my backups because I am also running my filesystem on ZFS, which would have allowed me to just replicate snapshots which is very fast since it is just streams of data and not files that gets copied.

With my old backup solution VEEAM - a couple of HUGE files was generated that contained all the delta updates of the backups. The new solution stores a lot of files in a not so ideal directory structure in my opinion. So instead of having a coupld of HUGE files I need to transfer to the “cloud” - I have many thousands - in fact at this moment there are 97k files that is my backup - all these needs to be synced to rsync.net.

The solutions

A couple of methods springs to mind

  • Simply copy the files via scp on a schedule
  • rsync the files
  • rclone the files

Simple is good and for a first time copy scp will probably be okay’is - but as a solution its not good, since it is not a sync protocol - and to make it into a sync protocol you would need to create scripts to compare destination with source and then only copy the delta.

Enter rsync

Rsync is a program that can synchronize directories. These directories can be on the same system, different systems, so its very flexible and works great. This is what I used previously to transfer my backups to rsync.net when I was using VEEAM - it was not great and transferring my weekly full backup took hours - even though I have a 1Gbit internet connection. But I had learned to live with the speed it took since it was simple and it worked.

When I converted my homelab to proxmox and also to their backup solution the requirements was different - no longer was it few massive files, but instead a lot of “smaller” files that needed to be transferred.

My initial thoughts were that rsync would be perfect for this, but it turned out that as time went by - the time taken just kept increasing from less than an hour to more than 4 hours.

I think the reason for this slow down is the way the files are stored which have an overhead every single time file system operations has to happen. The storage format is great for direct file access, since you can compute the exact location of any given file, but for operations like ls -l or du -hs . it is extremely slow. I have had a discussion with them about it and they seem to be of the opinion that they have already chosen the best storage solution.

Enter rclone

Rclone is a program similar to rsync, but with integrations to different cloud providers, so you can sync a directory with e.g. amazon. Running clone help backends shows the current list of backends:

root@pve3:~# rclone help backends
All rclone backends:

  alias        Alias for an existing remote
  acd          Amazon Drive
  azureblob    Microsoft Azure Blob Storage
  b2           Backblaze B2
  box          Box
  crypt        Encrypt/Decrypt a remote
  cache        Cache a remote
  chunker      Transparently chunk/split large files
  drive        Google Drive
  dropbox      Dropbox
  fichier      1Fichier
  ftp          FTP Connection
  gcs          Google Cloud Storage (this is not Google Drive)
  gphotos      Google Photos
  http         http Connection
  swift        OpenStack Swift (Rackspace Cloud Files, Memset Memstore, OVH)
  hubic        Hubic
  jottacloud   Jottacloud
  koofr        Koofr
  local        Local Disk
  mailru       Mail.ru Cloud
  memory       In memory object storage system.
  onedrive     Microsoft OneDrive
  opendrive    OpenDrive
  pcloud       Pcloud
  premiumizeme premiumize.me
  putio        Put.io
  s3           Amazon S3 Compliant Storage Provider (AWS, Alibaba, Ceph, Digital Ocean, Dreamhost, IBM COS, Minio, Tencent COS, etc)
  seafile      seafile
  sftp         SSH/SFTP Connection
  sharefile    Citrix Sharefile
  sugarsync    Sugarsync
  union        Union merges the contents of several upstream fs
  webdav       Webdav
  yandex       Yandex Disk

As you can see the list is big and it probably keeps on growing. Rsync.net is not on the list, which is expected, since its a small provider compared to the big players - but that is okay since rclone also have generic providers, which just uses any server of a given type.

I will use the sftp provider, since that is just a generic ssh connection that is required.

The advantage of clone vs rsync is that rclone has native support for having multiple threads running and syncing, which is ideal if you have many small files like I have now.

Rclone uses a config file to store options for a given destination or remote as they are called in rclone.

So I started with running:

rclone config --config ./rsync.rclone.conf

Which opens up the configuration editor which will write to the configuration file I added as a parameter. If not --config parameter is used - by default it stores the configuration in ~/.config/rclone.conf - which is fine for most cases - but I like to have my configuration files stored in a central place, not in a users home directory.

When my configuration session is done - the configuration file ended up looking like this:

[rsync_net]
type = sftp
host = <redacted>.rsync.net
user = <redacted>
key_file = /root/.ssh/id_rsa
use_insecure_cipher = true
md5sum_command = md5 -r
sha1sum_command = sha1 -r

Which basically just tells rclone that a remote rsync_net is using sftp and the options related to that remote. If I was using another backend type - different options would be present.

With that configuration file in hand I can craft a simple script that I can run with cron:

#!/bin/sh

write_msg()
{
  echo $(date +"20%y-%m-%d %H:%M:%S") $1
}

duration()
{
DURATION=$2
HOUR=$((DURATION/3600))
HOURINSEC=$((HOUR*3600))
DURATION=$((DURATION-HOURINSEC))
MINUTE=$(((DURATION/60)))
SECOND=$((DURATION%60))
write_msg "$1 finished took $HOUR hours, $MINUTE minutes, $SECOND seconds"
}



if [ "$#" -ne 2 ]
then
  echo "Invalid number of arguments, specificy <source> <destination>"
  echo "where <source> is an absolute path to a directory locally, i.e. /mnt/backup/veeam"
  echo "where <destination> is relative, i.e. backup/mybackup"
  exit
fi



BASEDIR=$(dirname "$0")
CONFIG="$BASEDIR/rsync.rclone.conf"
SOURCE=$1
DEST=$2
THREADS=24

START=$(date +%s)

write_msg "rclone script running from $BASEDIR"
write_msg "Starting rclone of $SOURCE to rsync_net:$DEST"
CMD="rclone sync --progress --stats-one-line --stats=30s --transfers $THREADS--checkers $THREADS --config $CONFIG $SOURCE rsync_net:$DEST"
write_msg "using command: $CMD"
$CMD
END=$(date +%s)
duration  "rclone" $((END-START))

Most of the script it irrelevant - the actual juicy parts is the line rclone sync --progress --stats-one-line --stats=30s --transfers 24 --checkers 24 --config $CONFIG $SOURCE rsync_net:$DEST

Which basically tells rclone to sync the $SOURCE to $DEST using 24 threads. This will not give 24 times the speed of a single thread - but since a lot of the time is spent waiting for I/O, then it makes sense to have more than one thread. 24 is twice the number of cores I have in my backup server - if I had less cores I would tweak accordingly.

Using these settings my backup time went from around an hour to less than 10 minutes - and I expect that even as my backup repository grows - rclone will keep that ratio between rsync performance and its own performance more or less the same.

I will probably tweak the --transfers and --checkers number to find a sweet spot - 24 might be too many - and would certainly be too many if my backup repository was on normal spinning harddrives - but since they reside on a ssd pool, it should easily sustain 24 concurrent reads.

A problem with using many threads is if rsync starts to throttle based on the number of connections and the threads is being blocked, then I would need to tweak the number of threads down until I hit the limit. But lets hope that does not happen until I have found a good number of threads.

Cron

So with the script at hand I have simply added the following line to /etc/crontab

30 10   * * *   root    /mnt/tank3/system/tasks/rclone.sh /mnt/backup/proxmox_backup proxmox_backup

Which states that @ 10:30 each day cron should run my script and sync /mnt/backup/proxmox_backup to rsync.net into the folder proxmox_backup.

So now I have a backup of my backup - and a much faster transfer speed than what I had with rsync. So if you have many files that change often - I would suggest you take a look at rclone as well.