Restic for fast backups & restores
Tuesday, 16 Jun 2020
restic is a high speed backup and recovery tool written in golang. It supports multiple remote storage options, and handles dedupe and encryption by default. It has a matching restserver remote backup daemon that enables you to run your own remote storage option, and can conveniently be used directly, for local backups as well.
It is purely command line driven, which you may consider as an asset.
architecture and configuration
The architecture is boring and standard - multiple backup clients, writing over network, to a single backup server with a large partition. Clients can work in parallel, and use a per-client encryption key, and are unable to remove backed up data themselves
- an append-only configuration.
We’ll nominate a server, with a hostname of backup.example.net
, as
our backup server. It uses FreeBSD 12.1R and zfs by default, but this
setup should work on any supported FreeBSD version, and you can just
skip the zfs functions I use below.
Paranoid people can use zfs replication to a further offsite location, or to a tape drive, as required, for further safekeeping.
backup server
The backup server is shared both locally (direct file system), and over the network. This allows an admin to prune older backups, and prevents backup clients from doing so.
The restic(1) tool operates directly on the local respository, and restserver makes it available over the network to clients with appropriate credentials.
packages
Install packages sysutils/rest-server sysutils/restic
, and note that
it creates a user, group, and a directory /var/db/restserver
where
all restic backup blobs will be stored:
restserver:*:239:239::0:0:restic rest-server Daemon:/var/db/restserver:/usr/sbin/nolrgin
the dataset
The restserver daemon will store its tiny config files and secret key
in its homedir, which by default is the same /var/db/restserver
dir.
Let’s create a new zfs dataset, with a reasonably large record size, and clean up permissions:
# zfs create \
-o recordsize=1m \
-o exec=off \
-o setuid=off \
zroot/var/db/restserver
# chown root:restserver /var/db/restserver
# chmod 0750 /var/db/restserver
rc.conf
settings
By default, we only allow remote backups to append data, it’s mildly hacky but gives us a little safety in case of a moderately malicious attacker, or incompetent sysadmin, and we use private repos which effectively puts each backup client in its own tidy world.
# /etc/rc.conf.d/restserver
restserver_enable=YES
restserver_options="--no-auth --listen 127.0.0.1:8000 --private-repos --append-only"
Amend the IP address and port as you like, in my case I have spiped in front of restserver, to allow backup from my laptop via internet, and haproxy for more discerning backups via our zerotier mesh vpn. You can read up on these options in the restserver docs.
initialising the backup repo
The following steps should be done as the root user, to ensure secured permissions. We’ll generate a key from random data, use this for the repository, and ensure that the restic user can’t manipulate the key or secret.
# cd /var/db/restserver
# umask 077
# head -c 1000000000 /dev/random \
| sha512 \
| tee .restic
# restic init --repo /var/db/restserver --password-file .restic
# chown root:restserver .restic config /var/db/restserver
# chmod 0750 /var/db/restserver
# chmod 0440 .restic config
# chown -R restserver:restserver /var/db/restserver
As noted, store this passphrase carefully, possibly duplicated – you cannot restore without it!
start the daemon
# service restserver start
You should confirm that the port (default 8000) is accessible over the network, amending firewall rules and docs as required.
From here on in, almost everything can be done as the restserver
user
or whatever backup account is appropriate for a given user.
adding a backup client
Similar to the backup server, we’ll make a random key, and create a
user account specific to this backup client. During the server setup,
we used a flag --private-repos
which ensures that every client has
its own namespace, and this is the other side of that configuration.
To ensure that the repository can de-duplicate across multiple clients, the master encryption key is used (from our backup server), to create a derived one specific to this client:
# cd /root
# umask 077
# head -c 1000000000 /dev/random \
| sha512 \
| tee .restic
# restic --verbose --no-cache \
--repo rest:http://backup.example.net:8000/ \
key add
enter password for repository: .....
repository e8831311 opened successfully, password is correct
enter password for new key:
enter password again:
saved new key as <Key of root@z01.example.net,
created on 2020-06-16 12:11:38.04829251 +0000 UTC m=+22.329739230>
On our server, we can see there’s a new user/host created:
$ sudo -u restserver \
restic --repo /var/db/restserver \
--password-file=/var/db/restserver/.restic \
key list
repository e8831311 opened successfully, password is correct
ID User Host Created
--------------------------------------------------------------
4a06463e root z01.example.net 2020-06-16 12:11:38
906d85ec root f01.example.net 2019-07-20 16:21:37
ace94bba root f02.example.net 2019-07-20 17:49:31
e2774077 root f03.example.net 2019-07-15 17:40:12
ff4224db root i09.example.net 2019-07-15 17:36:40
...
And now we can run a simple test backup:
# restic --repo rest:http://backup.example.net:8000/ \
--password-file=/root/.restic \
--no-cache \
--verbose \
--tag $(hostname -s) \
--tag $(date -u +%Y%m%d-%H%M) \
--tag config \
backup /efi \
/etc \
/root \
/usr/local/etc \
/var/db/zerotier-one \
/boot/loader.conf
open repository
repository e8831311 opened successfully, password is correct
lock repository
load index files
start scan on [/efi /etc /root /usr/local/etc /boot/loader.conf /var/db/zerotier-one]
start backup on [/efi /etc /root /usr/local/etc /boot/loader.conf /var/db/zerotier-one]
scan finished in 46.197s: 2166 files, 5.217 MiB
Files: 2166 new, 0 changed, 0 unmodified
Dirs: 5 new, 0 changed, 0 unmodified
Data Blobs: 1016 new
Tree Blobs: 6 new
Added to the repo: 2.871 MiB
processed 2166 files, 5.217 MiB in 0:49
snapshot 0c50950d saved
doing restores
There are numerous ways to restore, but I think this is the easiest, given in most of my cases I have a tarball that is smaller than the available RAM on the system - I can just mount the backup server via fuse integration, and then pipe the backup of choice into tar or zfs as required.
For the smaller /config
tagged backups for each server, these can
simply be copied out as usual, to get up and running quickly, and then
a full recovery using whatever approach is best suited to the disaster
at hand.
# cd /
# REPO=http://backup:8000/
# TOKEN=/root/.restic
# restic --repo rest:${REPO} \
> --password-file=${TOKEN} \
> --no-cache \
> --verbose \
> mount /mnt
repository e8831311 opened successfully, password is correct
Now serving the repository at /mnt
When finished, quit with Ctrl-c or umount the mountpoint.
^C
signal interrupt received, cleaning up
scripts and automation
In all cases, there is a simple script /root/bin/restic.sh
run via
cron & emailing output daily.
The OpenBSD boxes back up using tar(1)
and on FreeBSD I use zfs(4)
.
All systems grab the /efi
partition and some useful configurations
before proceeding, as a separate config
backup, so that I can quickly
recover the bare bones of a system including appropriate network and
storage tunables, before recovering the whole data back.
I prefer to backup using OS native tools, and stream that to restic, as this allows anybody to handle the local recovery, even if they’re not familiar with restic itself.
All the backup operator needs to do, is to make available the tarball or zfs stream, and the other operator can do the rest.
Here are a couple of examples of more complicated backups. Note that
the --stdin-filename
is used to give both a PATH-like namespace
for each streamed backup type, and to give the “file” a friendly name
for recovery, that is not unique for each backup.
This allows later on to remove older snapshots, and to prune them, finally freeing up space in our disk repository. Use tags to make it easier to identify various backups, and try to keep the name of each backup the same, to simplify pruning and maintaining the repo.
OpenBSD using tar(1)
#!/bin/sh -eu
PATH=$PATH:/usr/local/bin:/usr/local/sbin
cd /
NOW=$(date -u +%Y%m%d-%H%M)
HOSTNAME=$(hostname -s)
REPO=http://backup.example.net:8000/
TOKEN=/root/.restic
# do the bare minimum to recover the system first
restic --repo rest:${REPO} \
--password-file=${TOKEN} \
--no-cache \
--verbose \
--tag ${HOSTNAME} \
--tag ${NOW} \
--tag config \
backup /efi /etc /root /usr/local/etc /var/db/zerotier-one
tar cpf - / \
| restic --repo rest:${REPO} \
--password-file=/root/.restic \
--no-cache \
--verbose \
--tag ${HOSTNAME} \
--tag ${NOW} \
--tag tarball \
--stdin \
--stdin-filename=/tarballs/${HOSTNAME}.tar \
backup
FreeBSD using zfs(4)
#!/bin/sh -eu
PATH=$PATH:/usr/local/bin:/usr/local/sbin
cd /
NOW=$(date -u +%Y%m%d-%H%M)
HOSTNAME=$(hostname -s)
ZPOOL=zroot
REPO=http://backup.example.net:8000/
TOKEN=/root/.restic
# do the bare minimum to recover the system first
restic --repo rest:${REPO} \
--password-file=${TOKEN} \
--no-cache \
--verbose \
--tag ${HOSTNAME} \
--tag ${NOW} \
--tag config \
backup /efi /etc /root /usr/local/etc /boot/loader.conf /var/db/zerotier-one
zfs snapshot -r ${ZPOOL}@${NOW}
zfs send -LvcpR ${ZPOOL}@${NOW} \
| restic --repo rest:${REPO} \
--password-file=/root/.restic \
--no-cache \
--verbose \
--tag ${HOSTNAME} \
--tag ${NOW} \
--tag zfs \
--stdin \
--stdin-filename=/tarballs/${HOSTNAME}.zfs \
backup
The output will be similar to this - restic startup and initialisation at the beginning, and the piped output from zfs following:
# /root/bin/restic.sh
open repository
repository e8831311 opened successfully, password is correct
lock repository
load index files
using parent snapshot 028cc821
read data from stdin
start scan on [/tarballs/z01.zfs]
start backup on [/tarballs/z01.zfs]
scan finished in 46.298s: 1 files, 0 B
full send of zroot@migration estimated size is 12.1K
send from @migration to zroot@20200129-1237:12.1-RELEASE-p1 estimated size is 624
send from @20200129-1237:12.1-RELEASE-p1 to zroot@20200320-1544:12.1-RELEASE-p3 estimated size is 624
send from @20200320-1544:12.1-RELEASE-p3 to zroot@20200421-1735:12.1-RELEASE-p3 estimated size is 624
...
send from @20200616-1318 to zroot/ROOT/12.1-RELEASE-p2@20200616-1607 estimated size is 624
TIME SENT SNAPSHOT
Files: 1 new, 0 changed, 0 unmodified
Dirs: 1 new, 0 changed, 0 unmodified
Data Blobs: 11896 new
Tree Blobs: 2 new
Added to the repo: 5.999 GiB
processed 1 files, 8.147 GiB in 37:11
snapshot 454b1981 saved
469.977u 112.802s 38:13.53 25.4% 6397+710k 53047+0io 0pf+0w
Pruning Snapshots
Over time, snapshots accumulate. Like most systems, there are some additional tasks to perform – forgetting, to mark a given snapshot as superfluous, which is a quick task, and pruning, which is, on my ~ 6TiB repo, a rather long one – over 80 minutes now. I run this weekly.
Forgetting Snapshots
We are tagging snapshots by short hostname, which makes cleaning up
very easy. The restic docs cover this in great detail, so I’ll skip
most of the explanation, but this pretty much does what you want
– keep a handful of older backups around for ages, and a few more
fresh ones just in case. Remember that we have ZFS for those little
daily rm -rf
transgressions.
$ time restic --repo /var/db/restserver \
--password-file=/var/db/restserver/.restic \
forget -v \
--keep-monthly 3 \
--keep-weekly 5 \
--keep-daily 10 \
--keep-last 5 \
--tag flatline
I also have a config
backup for each server - there’s really no
need to keep these longer than a week or so as they rarely change,
and we’re only concerned about an occasional rollback:
$ time restic --repo /var/db/restserver \
--password-file=/var/db/restserver/.restic \
forget -v \
--keep-last 5 \
--tag config
This does the right thing, and prunes all but the last 5 snapshots
tagged as config
for each server we backed up.
As a small annoyance, there’s no easy way to say “forget all the snapshots from this host/tag” – restic insists that you delete the final snapshot manually. This is really the only annoyance I’ve had so far with restic, and for a small number of slowly changing hosts, I can live with it.
Pruning the Forgotten Snapshots
As you’ll notice, forgetting is quick and easy, but the space doesn’t come back:
$ zpool list
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
embiggen 7T 6.71T 300G - - 43% 95% 1.00x ONLINE -
So I run the very straightforward prune
sub-command, and .. wait.
Over an hour to complete, with a lot of disk activity.
$ time restic --repo /var/db/restserver \
--password-file=/var/db/restserver/.restic \
prune -v
repository e8831311 opened successfully, password is correct
counting files in repo
building new index for repo
...
removed <data/5209d03c2d>
removed <data/7797e1c567>
removed <data/fbfe0d9663>
removed <data/1faa3db324>
removed <data/47b49f75e6>
removed <data/b3199bdfc7>
removed <data/b13f44fe84>
removed <data/26a5648ffc>
removed <data/554ec26b70>
removed <data/038d2052af>
[7:28] 100.00% 1061893 / 1061893 files deleted
done
________________________________________________________
Executed in 86.14 mins fish external
usr time 352.00 secs 1.61 millis 352.00 secs
sys time 459.16 secs 0.56 millis 459.16 secs
Once the pruning is done, the restic docs recommend running a check, which is, after the pruning, quite a bit faster:
$ time restic --repo /var/db/restserver \
--password-file=/var/db/restserver/.restic \
check -v
using temporary cache in /tmp/restic-check-cache-022959336
repository e8831311 opened successfully, password is correct
created new cache in /tmp/restic-check-cache-022959336
create exclusive lock for repository
load indexes
check all packs
check snapshots, trees and blobs
no errors were found
________________________________________________________
Executed in 4.33 secs fish external
usr time 3.96 secs 1263.00 micros 3.96 secs
sys time 0.19 secs 0.00 micros 0.19 secs
However a quick check of my zpool shows that the space hasn’t yet been freed - which is obviously because we snapshotted our restic backup store as well.
For bonus points, here’s how I clean up that zfs space:
$ zfs list -t snapshot embiggen/var/db/restserver
NAME USED AVAIL REFER MOUNTPOINT
embiggen/var/db/restserver@20200615-0526 516K - 177G -
embiggen/var/db/restserver@20200615-1106 0B - 177G -
embiggen/var/db/restserver@20200615-1916 0B - 177G -
embiggen/var/db/restserver@20200616-1115 572K - 177G -
embiggen/var/db/restserver@20200622-1502 0B - 367G -
embiggen/var/db/restserver@20200622-1508 0B - 367G -
embiggen/var/db/restserver@20200715-0915 218M - 805G -
embiggen/var/db/restserver@20200805-0732 252M - 1.12T -
embiggen/var/db/restserver@20200827-0846 912K - 1.63T -
embiggen/var/db/restserver@20200828-1359 0B - 1.63T -
embiggen/var/db/restserver@20200828-2117 0B - 1.63T -
embiggen/var/db/restserver@20201001-1433 0B - 2.80T -
embiggen/var/db/restserver@20201001-1542 0B - 2.80T -
embiggen/var/db/restserver@20201022-1144 252M - 3.46T -
embiggen/var/db/restserver@20201022-2049 260M - 3.64T -
embiggen/var/db/restserver@20201103-1124 0B - 4.01T -
embiggen/var/db/restserver@20201103-1136 0B - 4.01T -
embiggen/var/db/restserver@20201105-0020 19.9M - 4.01T -
embiggen/var/db/restserver@20201109-0628 0B - 163M -
$ zfs destroy -vrn embiggen/var/db/restserver@%20201105-0020
$ zpool list
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
embiggen 7T 1.52T 5.48T - - 7% 36% 1.00x ONLINE -
Unanswered Questions
- do the presumed bandwidth economies of local caching outweigh another local stash, and more wasted space?
- what happens if a per-host key is compromised? Does this expose the whole main repo key, assuming this is somehow derived from the main key and the local host key?
- does sending the entire tarball (or zfs or postgres etc) impact deduplication opportunities in any way?
- can we do anything with zfs datasets to get better deduplication?
On Caching
You’ll notice above that I don’t use the local cache, and this may mean I transfer more data over the network than necessary. I may tweak this policy over time, but so far the duration of backups is not so high that I wish the additional penalty of managing a zfs cache dataset on every client. In addition, the convenience of backing up the whole of the zpool in one fell swoop (which would necessarily pick up a cache directory), in my experience, is far better than a highly optimised local backup strategy that uses an opt-in strategy for each dataset. The latter has a high risk of finding out, at the point of system recovery, that critical files and folders were never backed up.
A local cache can be added to any client thus:
# zfs create zroot/var/cache/restic
# chown -R /var/cache/restic
# sed -i '' -E -e \
's/--no-cache /--cache-dir=/var/cache/restic /' \
/root/bin/restic.sh
On Performance
My initial experimentation suggests that even a small 4 core backup client can saturate a midrange ADSL connection, but other than that I have not found any useful bottlenecks in network and storage yet.
It may be that large servers with 10Gib+ NICs can push enough data to force restserver to use multiple CPU cores when writing, but I have not found any significant limitations yet other than the generally shabby local internet.
On backing up ZFS streams
FreeBSD, and zfs, is my bread and butter. What we want is to rely on
zfs snapshots for general “whoops lost some files” recovery, but have
a full recovery option via a complete zroot/ROOT
stream if needed.
To achieve this, we need to use only zfs send options that create a reproducible stream, so that restic can de-duplicate as much data as possible.
In old-school backups, this was called “synthetic full backups” or an “eternal incremental strategy”.
For example, here are 3 consecutive runs of backing up the same zfs datasets repeatedly, on a very inactive test server. We’re using exactly the same snapshot references, but there have been additional snapshots created on those datasets between and perhaps during the backups.
This backup command was executed 3 times in a row:
time zfs send -LvcpR zpool@20200616-1607 ''
| restic --repo rest:http://backup.example.net:8000/ \
--password-file=/root/.restic \
--verbose \
--no-cache \
--tag z01 \
--tag 20200616-1607 \
--tag zfs \
--stdin \
--stdin-filename=/tarballs/z01.zfs \
backup
restic --repo /var/db/restserver --password-file=/var/db/restserver/.restic snapshots --tag zfs,z01
repository e8831311 opened successfully, password is correct
snapshots for (host [z01.example.net]):
ID Time Host Tags Paths
----------------------------------------------------------------------------------------
82db07d8 2020-06-16 13:19:11 z01.example.net z01,20200616-1318,zfs /tarballs/z01.zfs
454b1981 2020-06-16 16:08:30 z01.example.net z01,20200616-1607,zfs /tarballs/z01.zfs
d7d6fb8e 2020-06-16 19:13:00 z01.example.net z01,20200616-1607,zfs /tarballs/z01.zfs
9b157847 2020-06-16 19:55:07 z01.example.net z01,20200616-1607,zfs /tarballs/z01.zfs
----------------------------------------------------------------------------------------
4 snapshots
This backup is ~ 9GiB in total, and as it’s purely zfs snapshots, is not
changing. The streamed data should be an ideal option for restic’s sliding
window block approach to shine, despite using zfs stream’s internal dedupe,
and leaving blocks that are compressed inside zfs, still compressed in the
stream. I would have expected better results than ~ only 1/3 of the data
shared between 3 backups, but I cannot be sure how much impact the various
zfs send
parameters used -LDevcpR
and many snapshots, have on the zfs
stream itself. Maybe that is the greater issue.
I used restic’s diff
function to show me what it sees as common between
the snapshots, and it’s really not a lot. This explains why I have so much
space on my storage array!
$ restic --repo /var/db/restserver \
--password-file=/var/db/restserver/.restic \
diff 454b1981 d7d6fb8e
repository e8831311 opened successfully, password is correct
comparing snapshot 454b1981 to d7d6fb8e:
M /tarballs/z01.zfs
Files: 0 new, 0 removed, 1 changed
Dirs: 0 new, 0 removed
Others: 0 new, 0 removed
Data Blobs: 12794 new, 12794 removed
Tree Blobs: 1 new, 1 removed
Added: 6.455 GiB
Removed: 6.455 GiB
$ restic --repo /var/db/restserver \
--password-file=/var/db/restserver/.restic \
diff d7d6fb8e 9b157847
repository e8831311 opened successfully, password is correct
comparing snapshot d7d6fb8e to 9b157847:
M /tarballs/z01.zfs
Files: 0 new, 0 removed, 1 changed
Dirs: 0 new, 0 removed
Others: 0 new, 0 removed
Data Blobs: 12852 new, 12852 removed
Tree Blobs: 1 new, 1 removed
Added: 6.483 GiB
Removed: 6.483 GiB
$ restic --repo /var/db/restserver \
--password-file=/var/db/restserver/.restic \
diff 454b1981 9b157847
repository e8831311 opened successfully, password is correct
comparing snapshot 454b1981 to 9b157847:
M /tarballs/z01.zfs
Files: 0 new, 0 removed, 1 changed
Dirs: 0 new, 0 removed
Others: 0 new, 0 removed
Data Blobs: 12853 new, 12853 removed
Tree Blobs: 1 new, 1 removed
Added: 6.484 GiB
Removed: 6.484 GiB
For a ~ 8GiB backup of the same data, zfs and restic are not behaving well together at all!
I expect with some alterations to how zfs generates the streams, particularly around command-line arguments, this may be a lot more efficient in future.
Closing Thoughts
I like it. The community forums and IRC channel are courteous & helpful, and I hope this post has given a little back for their time in writing and supporting this very neat piece of software.
restic is a pleasure to use, easy to set up, and screamingly fast. It fits my #1 use case which is having off-site backups of medium-size zfs streams (under 100GiB) in the very unlikely event of losing all servers and their replicated databases, and fast recovery of those streams.
I still use tarsnap for the business-critical data (invoices and accounting records, any legal stuff), which manages the compression and delta-blocks better than restic, from my somewhat cursory tests, and also has the advantage that it’s commercially supported by the very helpful Colin Percival, but tarsnap’s single large-file restore performance isn’t sufficient in this case.
I can see, in the unlikely event of a total multi-site remote server catastrophe, simply driving our offsite backup server to the nearest datacentre, plugging it into the internet, and recovering our stuff very very quickly, with minimal stress.
Additional Reading
- great interview with the lead dev Alexander Neumann on gotime podcast
- Filippo Valsorda has done a very interesting write-up on the crypto internals
- how the chunking is done internally using rabin chunks
- the restic docs have an extensive references section in its docs
- restserver has some docs in the README, it’s all you really need in practise
- restic(1) has a great man page too
- the forums have many more questions and answers