Performing push backups – Part 2: rsnapshot

Posted on 03 Apr at 05:04 PM by Janek Bevendorff | Comments (4) | Trackbacks (0)

After I discussed a possible backup solution using rdiff-backup in the last part of this series I want to show you the second tool which is rsnapshot.

As I already pointed out, I'm not using rdiff-backup anymore. The reason is mainly that it is simply too slow. I'm using a Raspberry Pi as my NAS and it is absolutely not capable of handling larger backups with rdiff-backup. It works for smaller backup sizes, but not for my entire home directory. Even when I pushed the initial full backup directly to the backup disk (not using my Raspberry), all future incremental backups were still unbearably slow. Even when no files changed at all, it took hours over hours for simply comparing all the files I had in my home directory to those on the NAS, whereas a full comparison using rsnapshot is done within five to ten minutes. Now keep this in mind and look at the fact that incomplete backups made with rdiff-backup can't be resumed. You could imagine that in the end you wouldn't have any backup at all. Basically all rdiff-backup would do is to compare and push your files over the day and abort in the evening when you shut down your workstation. And then the next day it would spend all the time reverting the incomplete backup and running another one which might not finish either.

So this is the main reason I stopped my experiments with rdiff-backup. It was a nice time, but I finally moved on. Therefore say hello to our new precious star: rsnapshot!

TL;DR: I have created a GitHub project page with a much more mature version of the scripts I discuss throughout this article. It is linked at the end.

Basically we want to do exactly the same with rsnapshot that we did with rdiff-backup. We want to push our backups from the client to the NAS and we want to keep the privileges separated. That means, there should be a general backup of all global system files run by root and then the backups of the individual home directories run by the users themselves (i.e. with their privileges and UIDs). The whole backup process should be as flexible as possible and each user should decide by himself which of his own files to include in a backup and which to explicitly exclude. Therefore we will again read the list of files and folders which are to be backed up from special files in a) the global system etc folder and b) the users' home directories.

One sticking point we have to solve is that rsnapshot by itself does not support push backups at all. Since it works with hardlinking it needs to operate on a local file system. But fear not, there is a simple yet effective solution, which I (to be honest) borrowed from mad-hacking.net. rsnapshot is based on rsync and rsync by itself is capable of pushing files to another host over SSH by starting a daemon process on the remote side. Furthermore, rsnapshot has the feature to keep the syncing and the rotation process separate, which is a good idea anyways when we push over the network (imagine all the bad things that might happen :-)). With the sync_first option enabled, rsnapshot uses a directory called .sync as the source for the rotation. This is the directory we will push our files to with rsync. That means we will use rsnapshot only for the rotation, not the real push backup itself.

The server side

Again, let's start with the server side, i.e. the NAS. As already done in the last part I will call the server/NAS bellatrix and the client that pushes its files altair.

The storage which will hold all the backups is mounted under /bkp and that's were our backup users will have their home directories. For each user on each client we will have one user on the server. So first of all create a new group and two new users for backups from altair on the server:

groupadd backup
useradd -G backup -b "/bkp" -m -p '*' -s /bin/sh "altair-root"
useradd -G backup -b "/bkp" -m -p '*' -s /bin/sh "altair-johndoe"

The first account will be for backups run as root on altair, the second one for backups run as the user johndoe.

Next we need to prepare the home directories. I'll do it for altair-root, the process is the same for altair-johndoe and any other user of course.

The backup files should all go to the directory $HOME/files and as described above we will use a folder called .sync inside of it for the actual syncing. Thus we need to create them both:

mkdir -p "/bkp/altair-root/files/.sync"

To enable root on altair to log into the NAS as altair-root we also need to set up an SSH key. But of course, we don't want to grant full shell access but only an rsync daemon instance. Therefore generate a new SSH key on the client and create a folder called .ssh inside /bkp/altair-root on the server. Inside that folder create a file called authorized_keys with the following content:

command="/usr/bin/rsync --server --daemon --config='/bkp/altair-root/rsync.conf' ." ssh-rsa AAAAB3NzaC1yc...

(replace the last part with the proper public key, of course)

Finally protect the file by setting the file owner to root:

chown -R root:root /bkp/altair-root/.ssh

This will ensure that a user that logs into bellatrix with that SSH key can only start an rsync daemon with the specified configuration file /bkp/altair-root/rsync.conf. Now create that file with these contents:

[push]
uid = altair-root
gid = altair-root
path = /bkp/altair-root/files/.sync
use chroot = 0
read only = 0
write only = 1
fake super = 1
max connections = 1
lock file = /bkp/altair-root/rsyncd.lock
post-xfer exec = /usr/local/bin/rs-rotate "/bkp/altair-root/rsnapshot.conf"

[pull]
uid = altair-root
gid = altair-root
path = /bkp/altair-root/files
use chroot = 0
read only = 1
fake super = 1

Save the file and make root own it, too.

The configuration directives above define two basic rsync modules, one named push and one named pull (which is read-only and intended for restoring backups from the NAS). For both modules the config file sets the proper target directories, which is the files/.sync folder for the actual backup process and simply the files directory for restoring backups. Processes connecting via rsync to these modules won't be able to escape these directories easily. One very important option in the push module is the fake super option. Since we're running with basic user privileges, we can't properly set things like ownership. Luckily, rsync is able to save this information to the extended user file attributes (xattrs) when this option is enabled. In the past this required the file system to be mounted with the user_xattr option, but if you're using Ext4, it should also work without since it's enabled by default. If you get a permission denied error, though, you might want to remount the file system with that option.

All the other lines should be more or less self-explanatory, but another very important one is

post-xfer exec = /usr/local/bin/rs-rotate "/bkp/altair-root/rsnapshot.conf"

This tells rsync to run the specified command after it finished the syncing process. We will use this to trigger the rotation. The file /usr/local/bin/rs-rotate (could be any other name and location, too) is a very simple shell script:

#!/bin/sh

if [ "$1" == "" ]; then
        echo "Usage: $(basename $0) <rsnapshot config>"
        exit
fi

if [ "$RSYNC_EXIT_STATUS" == "" ]; then
        echo "This script is intended to be run as rsync post-xfer hook." 2>&1
        exit 1
fi

if [ $RSYNC_EXIT_STATUS -eq 0 ]; then
        rsnapshot -c "$1" push
fi

When the rsync process exited cleanly (i.e. the syncing was entirely successful) it will ask rsnapshot to perform a rotation using the specified configuration file (which we also need to create):

config_version  1.2

cmd_cp          /usr/bin/cp
cmd_rm          /usr/bin/rm
cmd_rsync       /usr/bin/rsync
cmd_logger      /usr/bin/logger

retain          push            2
retain          daily           7
retain          weekly          4
retain          monthly         2

verbose         2
loglevel        3
one_fs          1
sync_first      1

snapshot_root   /bkp/altair-root/files
logfile         /bkp/altair-root/rsnapshot.log
lockfile        /bkp/altair-root/rsnapshot.pid
backup          /bkp/altair-root/files/.sync    ./

Save this to /bkp/altair-root/rsnapshot.conf.

The syntax is explained quickly. All entries are separated by tabs (not spaces!) and each line with text contains one configuration directive. The first few lines simply define were to find certain command line tools. More interesting are the retain lines. These specify the different backup levels and how many increments are kept. For the push backups we will keep 2 increments. The second increment will then be the basis for the first daily increment. Since we keep 7 dailies, the seventh daily increment will server as the basis for the first weekly and so on.

Important to note is the sync_first option which needs to be set. Otherwise our whole system would not work. With this option set, though, the syncing would be invoked by running rsnapshot with the sync command which we'll never do. Any other invocation of rsnapshot will simply rotate our files.

Finally the last four lines set the directories rsnapshot should work on. The very last line tells rsnapshot to use the .sync directory as a source and back it up to the current working dir, which is set by snapshot_root. So whichever files are in /bkp/altair-root/files/.sync will be used for rotation.

Side note: you might also want to split the config file and put everything above snapshot_root in a separate file and reference it via a include_conf directive. This would enable you to only write a very minimal config file for each backup user and simply reuse the options which are the same for everyone.

To invoke a rotation for a specific backup level, use rsnapshot <level> which is exactly what we did in the rsync post-xfer hook script above for the push level.

All the other levels can be rotated by a cron script that is run every day. But be careful: since we rely on push backups, you should only perform a rotation via cron when there are enough increments of the preceding level. Otherwise you'd successively delete your older backups. For instance, if we take the configuration above and keep two push increments and seven daily increments, the rotation cron would delete the seventh daily increment, rotate all preceding dailies by one number and then make the second push increment the new daily.0. No problem so far. But if you ran the rotation again without creating a new push increment first, rsnapshot would think that you had deleted all your files. The result would be that the last daily would be deleted again and all the earlier dailies would be rotated by one increment. But there would be no new daily.0 since there is no push.1, only a push.0. So after five more rotations you'd be left with only your push.0 and no dailies at all.

Therefore you should carefully check the number of existing increments of the preceding level first before doing a rotation. To do this you could put something like this in your cron script:

config=$(cat "path/to/the/rsnapshot/config.conf")

# Get number of preceding increments
config=$(echo "${config}" | grep -P '^retain\t')
config=$(echo "${config}" | grep -oPz "retain\t+(\w+)\t+(\d+)\nretain\s+${1}\t+" | sed -n 1p)
preceding_name=$(echo "${config}" | awk '{ print $2 }')
preceding_number=$(($(echo "${config}" | awk ' { print $3 }') - 1))

if [ "${preceding_name}" != "" ] &&
   [ -d "path/to/backup/dir/${preceding_name}.${preceding_number}" ]; then
        # Perform rotation
fi

Of course this also means that “daily” doesn't necessarily main exactly “daily” anymore. Now it rather means something like “backup from the day before the last push backup”.

The client side

We have the server side, now we need the client side, which is simple. Very simple. It is exactly the same as described in the last part of the series, you only need to replace the rdiff-backup command with this:

rsync \
    --rsh=ssh
    --archive \
    --acls \
    --delete \
    --delete-excluded \
    --include-from="${home_dir}/.rsnapshot-backup-filelist"
    --exclude="*" \
    / \
    "$(hostname)-${username}@${BACKUP_HOST}::push"

You do nothing more than rsync-ing your files to the server using the push module we defined in your rsync.conf on the server. Similarly you can pull files from there with rsync pretty much the same way using the pull module:

rsync \
    --rsh=ssh \
    --archive \
    "$(hostname)-${username}@${BACKUP_HOST}::pull/file/on/the/server ./where/the/file/should/go/to"

Of course you can also use the short options -e and -a instead of --rsh and --archive.

One more thing to note: the file that contains the list of files which should be backed up (~/.rsnapshot-backup-filelist in this case) works a little differently. By default all directories are excluded which are not explicitly included. That is also true for parent directories. So to include, e.g., /home/foo/bar/baz and all directories below you'd have to write:

/home
/home/foo
/home/foo/bar
/home/foo/bar/baz/***

Simply writing /home/foo/bar/baz/*** wouldn't be enough since /home is not included explicitly.

Note the slashes: no slash at the end means: “only the directory, not its contents”. A slash at the end, though, means: “the contents of this directory”. The version I used with the three asterisks means “this directory, the files inside it and inside all subdirectories”. So mind the little differences or otherwise your backup might be different from what you intended. For more information about rsync globbing patterns consult the FILTER RULES section in the rsync(1) man page.

For reference: my file currently looks about like this:

- *.swp
- *.tmp
- .directory
- Thumbs.db
- desktop.ini
- Desktop.ini
- .DS_Store
- *~
- .Trash/***

- /home/janek/.Xauthority
- /home/janek/.xsession-errors
- /home/janek/.cache
- /home/janek/.dbus
- /home/janek/.codeintel
- /home/janek/.zsh_history
- /home/janek/.bash_history
- /home/janek/.recently-used
- /home/janek/.pulse-cookie
- /home/janek/.config/pulse/***
- /home/janek/.local/tmp/***
- /home/janek/.dropbox/***
- /home-accel/janek/.kde4/cache-*/***
- /home-accel/janek/.kde4/socket-*/***
- /home-accel/janek/.kde4/tmp-*/***
- /home-accel/janek/.kde4/share/apps/nepomuk/***
- /home-accel/janek/.mozilla/firefox/*/Cache/***
- /home-accel/janek/.thunderbird/*/Cache/***

/home
/home/janek/***
/home-accel
/home-accel/janek/***
/srv
/srv/http
/srv/http/virtual
/srv/http/virtual/janek/***

You see: as with rdiff-backup you can also prefix lines with a - to specify explicit excludes. Lines starting with a + or no sign at all are treated as includes.

Some more polishing

That's pretty much it, but we can still tweak one thing ore two.

Providing read-only SFTP access

Currently we can only restore files from the backup server using rsync. This is okay for the restoring process itself, but it makes it hard to browse your backups. Wouldn't it be convenient if we could also mount the backup folder from the server using SFTP/SSHFS? That's very much possible.

You only need to do two things: first of all modify the authorized_keys file on the server like this:

command="/usr/bin/rs-run-ssh-cmd '/bkp/altair-root'" ssh-rsa AAAAB3NzaC1yc...

and then create the script /usr/bin/rs-run-ssh-cmd:

#!/bin/sh

home_dir=$1

if [ "${SSH_ORIGINAL_COMMAND}" == "internal-sftp" ] || [ "${SSH_ORIGINAL_COMMAND}" == "/usr/lib/ssh/sftp-server" ]; then
    cd "${home_dir}/files"
    exec /usr/lib/ssh/sftp-server -R
else
    exec /usr/bin/rsync --server --daemon --config="${home_dir}/rsync.conf" .
fi

echo "Session failed." >&2

exit 1

This will start an SFTP server when needed, otherwise simply the rsync daemon as before.

Chroot users into `/bkp`

If you want a little more security, you can also chroot all backup users into /bkp. For this to work you need to add these lines to your /etc/ssh/sshd_config on the server:

Match Group backup 
        ChrootDirectory /bkp/

After you've done that you need to make all files from outside the directory that are necessary for the services to operate, available inside the chroot environment. That means you either need to copy them to /bkp or use bind mounts. Copying has the advantage that you can copy only those files that are really needed, but it also creates a lot of duplicate files. I prefer bind mounts most of the time. Hardlinks are usually not a very good idea.

For rsnapshot you need to bind mount at least /bin, /usr/bin, /lib, /usr/lib and /usr/share/perl5:

mkdir -p "/bkp/"{"bin","usr/bin","lib","usr/lib","/usr/share/perl5"}
mount -o bind "/bin" "/bkp/bin"
mount -o bind "/usr/bin" "/bkp/usr/bin"
mount -o bind "/lib" "/bkp/lib"
mount -o bind "/usr/lib" "/bkp/usr/lib"
mount -o bind "/usr/share/perl5" "/bkp/usr/share/perl5"

If you're using the SFTP server, you also need /dev. Additionally a copy of the /etc/passwd file is necessary for the UID mapping, but you only need to keep those users which should be able to log into the chroot. So for your two backup users altair-root and altair-johndoe the following minimal /bkp/etc/passwd file would be enough:

altair-root:x:1001:1001::/bkp/altair-root:/bin/sh
altair-johndoe:x:1002:1002::/bkp/altair-johndoe:/bin/sh

(the UIDs and GIDs should of course correspond to the real UIDs and GIDs of the users)

Side note: OpenSSH provides an internal-sftp subsystem that works without any bind mounts and additional passwd files. But unfortunately, it is not possible to trigger it from a shell script. So in order to use it you'd either have to remove the command restriction from the authorized_keys file or use a completely different user for SFTP logins.

And finally: The GitHub project page

Because rsnapshot is what I finally settled upon, I needed a very flexible yet reliable backup system that is easy to maintain. So I created a bunch of large shell scripts which do all the work I described above and a lot more in a pretty convenient way.

I uploaded everything to GitHub where you can download it: rs-backup-suite on GitHub

Feel free to test it, modify it and redistribute it if you like.

Trackbacks

No Trackbacks for this entry.

Trackback specific URI for this entry

Comments

There have been 4 comments submitted yet. Add one as well!

Tim Press wrote on 04 Feb, 02:11 PM: (permalink)

I was looking at making push backups to my home server a few years ago and stumbled across the mad-hacking.net rsync solution , which appeared to do the job for me. Working out why it worked was a bit of a learning curve in my case. I like your idea of setting up an sftp server to browse the backups, though I can ssh to my server to do that also. One thing that slightly concerned me was using the undocumented "--server" feature of rsync. Its what the method relies on, what are your views on it?

Janek Bevendorff wrote on 04 Feb, 02:18 PM: (permalink)

Who says that --server is undocumented? It is documented, only discouraged for normal use. The man page says: "The options --server and --sender are used internally by rsync, and should never be typed by a user under normal circumstances. Some awareness of these options may be needed in certain scenarios, such as when setting up a login that can only run an rsync command. For instance, the sup‐ port directory of the rsync distribution has an example script named rrsync (for restricted rsync) that can be used with a restricted ssh login." So although it is not recommended to use --server in most cases, it is a valid option for rsync-only logins. I've been using rs-backup-suite for years now and the script has matured quite a bit on GitHub. No issues so far and I don't expect --server to be dropped any time soon.

clawoo wrote on 02 Oct, 01:05 PM: (permalink)

Just a note, on Ubuntu 17.04 following the guide leads to a "@ERROR: setgroups failed" error. Calling setcap cap_net_bind_service,cap_setgid=+ep /usr/bin/rsync fixed it.

Chit Ko Ko Win wrote on 23 May, 05:48 AM: (permalink)

Great Article