Big Backup Bonanza
Tuesday, Aug 8, 2023Show Notes
In this episode:
- Alan is backing up with rsnapshot
- Martin suggests BorgBackup is better
- Mark is using rclone to keep family photos safe
You can send your feedback via show@linuxmatters.sh or the
Contact Form. If you’d like to hang out with other listeners and share your feedback with the community, you can
join us on:
- The Linux Matters Chatters on Telegram.
- The Linux Matters Subreddit.
If you enjoy the show, please consider supporting us.
Transcript
Alan Pope 0:00
I’m seeing the change to our regularly scheduled programming. I thought
we’d all talk about the same topic this week, but from different angles. Okay? Yeah. Yeah, cool.
Backups you should do them, you really should do them is the overriding message you should get from
this. And I want to talk about how I do backups. And then you guys can talk about yours. So I’ll
start with mine. Most of my systems are Linux, laptops, desktops, servers, they’re all Linux. And
so they pretty much all reside in my house, or are remotely accessible via SSH. So I’ve got a few
servers out there in the cloud, I only have one location, I’m not backing up from lots of places
other than out there on the internet. And so for many years, I’ve used the same tool for doing
backups. And it’s called an art snapshot. And it’s been around a long time, it’s a
little bit quirky. And it’s a little bit weird to set up. But once it’s set up, it’s
pretty much fire and forget. So I have a server upstairs in the loft at home. And that’s running
our snapshot. So it has a regular cron job to run our snapshot. In fact, it has four cron jobs to run
our snapshot. And what it does is the configuration for our snapshot tells it which machines to backup
remotely. And it does that over SSH as root. So it has the SSH keys all set up. So that machine can SSH
to any system as root, and access the file system of that remote system. And then it copies that all the
files from that remote system onto the server at home. And it does that on a regular basis for multiple
machines. So it wakes up every few hours, and backs up my laptop, my desktop, my web server, a game
server I’m running and all kinds of other machines. And the neat thing about our snapshot is it
uses our sync over SSH. And it only backs up changed or new files. So it’s not doing a full backup
every single time. It’s the first backup takes a long time because it backs up everything that you
specify. And you don’t have to backup the whole system, you can just backup certain folders. So I
only backup certain important folders. But the important part is it runs us on a regular basis and backs
up all the files on all the systems and it works. And do
Mark Johnson 2:36
you just get one backup of whatever the latest state is? Or does it do some sort
of time window where you can go back and say I need to know what it was yesterday or something like
that?
Alan Pope 2:48
Yeah, it’s configurable, right. So the way I have it set up is that it runs
every six hours. And it keeps a certain number of those backups. When it gets to the limit of however
many you set. Let’s say it’s six, it renames, the oldest one to another name. And
they’re called alpha, beta, delta gamma. And so the Alpha backups, I’ve got six Alpha
backups, and then when it starts the next one, and it seems there already six that have been done over
the last six backups, it takes the oldest one and renames it beta dot zero. And then away we go again,
and it does another backup. And then the next time it runs beta zero gets renamed beta one, until we
have enough beta backups. And then the oldest betta backup gets renamed delta and then the oldest delta
one gets renamed gamma. The net result of that is I have backups going quite far back. But I only keep
seven of the most recent ones. And then the others are daily, the betta ones are daily. And then the
older ones are weekly, and then the older ones that are monthly. So I effectively have hourly, daily,
weekly, monthly, roughly. But because I don’t backup every hour, it’s not actually hourly.
It’s every six hours and then every day and then every week and then every month. So I go back
like six months, maybe a year of backups, but not every single backup. It’s not like ZFS
snapshots. It’s a Delta taking a snapshot in time, at some point over the last year. And
I’ve been doing this for years on all my systems. And it has saved my bacon a bunch of times where
I’ve aggressively deleted files off of a laptop and then wanted to go and find them or needed to
go and find some configuration file from a year ago because I’ve had to redeploy something or
something like that. And there are certainly better ways to do it. It’s not the fastest backup in
the world. Because the way it copies the files from Alpha zero alpha one and then alpha one to alpha
two. It takes a lot of time to do that. Copying, it’s basically CPE a folder to another folder, it
does save space by using hard links. So you don’t end up with lots of duplicate copies of the same
files. But if you modify a lot of files, and a lot of files get changed, and that a lot of files get
backed up. But I basically don’t look at it very often. Because whenever I look at the log file,
it’s just working, and it has done for years. And so it’s good old, reliable, our snapshot.
And I’m quite happy with it. And when I’ve never needed to get files from it, they’ve
been there. And that’s kept me very happy.
Martin Wimpress 5:34
And I think that’s a critical piece, right? Because people talk about
making backups. And I’ve spoken to people about the rigour with which they take backups. And I was
like, what’s the restore process? Like? And they’re like, No, I’ve never really
tested. Like, yeah,
well should probably do that, because you may just have a load of garbage.
Alan Pope 5:56
Yeah. And I’ve restored like entire folders full of music, and, you know,
folders full of PDFs and stuff. It’s not just individual files, but it’s all there going
back over a year. And I’ve recently been getting rid of some of my old backups, because I just
don’t need them anymore. Like, I’m on my fourth generation of laptop that I’ve been
using our snapshot with this server. And it’s keeping all those old laptop snapshots there. I call
them snapshots, their backups, really. And so every time it’s copying Alpha zero to alpha one, and
alpha one to alpha two is copying not only my current laptop, but the previous laptop and the laptop
before that, because all those backups are still sat on the server. They’re all in one big folder,
which means it’s super time consuming, copying a whole bunch of files that are never going to
change because those laptops don’t exist anymore. So I think there’s ways I could improve
this. But I’m too far down the line of using, I’d have to just replace it completely, I
think.
Martin Wimpress 6:57
Right? And you said you do your incrementals out like every six hours. Yeah.
Do you have any idea how long it takes your incremental backup to run?
Alan Pope 7:07
That’s a good question. So the most recent one that started at four
o’clock today took an hour to Rm, the oldest Alpha backup, and then it took less than a minute to
move all of the Alpha backups up when it moves, not copies, yeah. And then to copy Alpha zero to alpha
one took an hour in 10 minutes. And then actually doing the backups took 15 minutes, not long at all. So
the actual backup, the diff of what has changed since midday to four o’clock on all of those
systems is 15 minutes worth of copying files over the network. But the prep took over an hour to do. So
it’s horribly inefficient. Now bear in mind, this is running on an eighth Gen HP microserver with
four ageing four terabyte drives in them, I could probably buy a Synology with 112 terabyte drive, and
it would be just be like insanely fast. But that’s where we are. That’s the system
I’ve got. And I don’t really want to go buying a whole load of more infrastructure for
backups. I should note, I probably didn’t explain it particularly well. I have got a blog post. If
you just do a search for Popey. Our snapshot. You’ll find my blog from a couple of years ago why I
explained this in much more detail about how it all works and how it fits together. And I’ll link
Martin Wimpress 8:30
it in the show notes. I too have used our snapshot very happily for many
years. But I’m here to tell you Resistance is futile bog backup is here to replace our snapshot.
Okay. I have recently done the very thing that you posturing, I have completely replaced my backup
solution with bulk backup. And I suppose it’s important to summarise why I decided to replace our
snapshot after so many years of being happy with it.
Alan Pope 9:01
Is it anything to do with the reasons that I gave about being slow? And
Martin Wimpress 9:05
yes, yeah. Yeah. So I use our snapshot to do two types of backups. One, two,
do backups of my home directory. And that is to protect me from when I do a silly and I go and delete a
bunch of stuff accidentally that I really didn’t mean to delete and I need to recover it. And I
also use it to backup my servers, which is mostly static content, it’s videos and music and things
of that nature and some other stuff and I’ll get into that a bit later. But those workstation
backups of my home directory, I’ve got terabytes of data and millions of files and our snapshot
takes a long time to run even on very, very fast all NVMe storage solutions for the source where the
data resides and the target to where it is being backed up. So, spinning rust is not the only limiting
factor here, it is just a inefficient process. So I wanted to be able to take snapshots of my data at
much more frequent intervals to give me more tight restore points more
Alan Pope 10:18
factor, silly factor
Martin Wimpress 10:21
for when I inevitably RM something in the wrong place and destroy hours of
work. So I was thinking, I would switch my file systems to Zed Fs and get into ZFS replication. And you
know what, life’s too short for that, you know, for your laptops and your trivial stuff at home,
getting into balancing ZFS and doing storage architectures and all of this sort of thing. And I
don’t trust B tree Fs, send your hate mail to Joe at late night linux.com has bitten me and
I’m not going back there and XF s I love. But XF s is missing some features that would be good for
a backup solution. Like there’s no native compression on the file system, for example, something
that you can do with Btrfs. So I wanted to find a solution that enabled me to keep x Fs, which I trust,
and I like, but also faster than our snapshot, more space efficient than our snapshot. But you talked
about like how convenient the Restore is from our snapshot, because you can just go to an any point in
time version, and all of your stuff is there. Yeah. And I love that. And also love as you say, when our
snapshot is configured, it just comes along, and it does its job. And I wanted that. So bulk backup is
all of these things. So it is very fast, much faster than my snapshots. And I’ll get to that in
just a moment. But the space efficiency is interesting because it does deduplication as well as
compression. So when you create a repository for your backups, you can say what compression you want to
use on that repository. So I’ve created two repositories, one, which is a target for stuff that is
probably going to compress well, like my documents and general data. And I have another one where I have
no compression applied, which is where the music backups and the video backups and the photo backups go
to because you get little to no benefit from RE compressing those things. And it just adds to the time
the backups take. In addition, it has a very simple means of encrypting your data. And at some point in
the future, I want to get to sending a another copy of my backups to some off site place. I’ve
currently got the office in town and the server here. So I have two places. But I’d like something
that’s completely separate as well. And the restore process for Borg is very elegant. You have
they call them archives, but it’s basically a point in time backup. And you can just say, Go and
mount that. And it uses fuse mounting, and it suddenly just appears just like a directory anywhere on
your system. And you can then use our sync to copy a bunch of files or or the whole thing. But you
don’t. It’s not like the old days of backup software where you have some way of saying right
extract this whole backup. You just basically mount the pointing time backup that you’re
interested in. And then you can restore some, any files that you want out of there, back over the
target, just like I would do with our snapshots. So I love that. So because of all of these reasons, it
has replaced our snapshot for me. And I’m using it for my workstation backups to do regular
snapshots of my home directory. And that also includes backing up my key base folder. So I have got data
in key base for projects that I work on with other people where we keep secrets that we want to share
with members of the project. But I also want to have backups of that stuff in case keybase ever
disappears, right? So it’s able to peek inside there. But the backup is encrypted. So you know, I
now have a great way to back that stuff up. On my workstation backups are many terabytes and many
millions of files and my incremental backups now take 30 seconds. So I am now doing hourly backups. So
Mark was asking about retention. Earlier my retention for my home directory is I have 24 hourly backups.
I have seven dailies for weeklies Six monthlies, and one annual now, I’ve only been using this for
about four weeks at the moment. So I don’t have that history of stuff yet. And then I’m
using it on the server as well. And the servers doing, you know, backups of the music and the videos.
But I also have is a headless version of Dropbox running on the server. And I backup our dropbox folder
as well. So if Dropbox ever goes away, or something disastrous happens, I now have our Dropbox stuff
backed up. And I think what’s worth pointing out here is that you can completely automate Borg
backup itself from the command line in much the same way you would with our snapshot. But there are some
front ends that make it even more palatable and straightforward to use. So on my workstations, I’m
using a bit of software called Vawter, which is a graphical application that is dead simple to use. And
you can create backup archives very, very quickly and say what you want to backup, I has a simple
mechanism of excluding, so I touch a file in any folders that I don’t want to backup, I just
create a hidden file called.no backup. And Borg sees that and goes, Oh, well, I don’t back this
one up. So if I’ve got things I want to exclude, it’s easy to do. So I’m using Vawter
on the workstation. It sits in the indicator area with all of the other things. And it glows red when
it’s doing its backups. And you can click on it. And if you want to restore, you can just click
the in the list of backups and say mount it. And it just appears as a fuse file system in your file
browser. And you can go and poke around and looking at and all that sort of thing that is
Alan Pope 16:46
super neat. It does. I mean that voltar does look I mean that the video on the
homepage is on a Mac. So I assume the Linux version looks exactly the same, exactly the
Martin Wimpress 16:54
same, right. And as you point out, this will work on Mac, Linux and Windows.
So like you, I only care about Linux. But if there are people out there who have got a mixture of
machines in their home, and they want a solution that will can work everywhere. Then Borg plus Vawter
will give you that. And then on the server I’m using a utility called Borg Matic which basically
presents as a YAML file, where you configure what you want to backup. And it’s just a much simpler
interface to Dr. Borg itself. So that’s what I’m using. And I don’t have 10 years of
experience with it, like I do with us snapshot. But so far, it’s giving me the our snapshot
experience that I’ve enjoyed, but it’s quicker, faster, using less disk space, as well.
Nice. And when you have Borg installed on remote endpoints over SSH, it can accelerate the backups over
SSH by virtue of having bhog on both ends of the connection. So it’s not just doing like a file
system transaction, it actually has a protocol between the to the client and the server. So it’s
very, very fast remotely as well,
Alan Pope 18:14
is bog installable. As a like Deb on Linux and as a packages for everything.
Martin Wimpress 18:20
I’m going to say yes, I’m obviously using on Nix OS. It’s
working fine here. I have seen that that the guides talk about the many Linux is that it’s
available for a tool, and it’s sponsored by Borg based.com, who are basically a hosting company
that offer you remote storage, and also hurts now offer storage box solutions, which are bog enabled. So
if you want an off site solution, there are hosting providers that support the project by providing you
with cloud based storage that is accelerated for bog,
Alan Pope 18:57
okay, sold. Linux matters is part of the late night Linux family. If you enjoy the
show, please consider supporting us and the rest of the late night Linux team using the PayPal or
Patreon links at Linux matters.sh/support. For $5 a month on Patreon you can enjoy an ad free feed of
our show, or for $10 get access to all the late night Linux shows ad free. You can get in touch with us
via email show at Linux matters.sh or chat with other listeners in our telegram group. All the details
are at Linux matters.sh/contact.
Mark Johnson 19:37
My backup solutions are a bit different because I think the main use case
I’m thinking of is a bit different. I’m not thinking about backing up entire machines or my
entire home directory all the time. In case I mess something up. I’m more thinking about what are
the things that are crucial for my family to make sure that we never lose which are things like our
Family Photos. And currently, we have quite a good system within the house whereby our photos are
instantly uploaded from our phones to our next cloud server, and synced between all the devices which
talk to that. So we’ve got plenty of copies of them around the house. If one machine goes pop,
nothing’s getting lost. But were there to be a complete disaster. And all the machines get lost
somehow, I wanted to have some sort of further off site backup that I could do. And I’d be
thinking about the same sort of things that we’ve already been talking about, not just how do I
back it up and do it easily and make sure it’s encrypted. But also, what’s the recovery
process? Like? How easy is it to test the recovery process after just doing your first backup, and I
recently discovered a tool called our clone. And our clone is a similar sort of tool, it will
synchronise files from machine to machine. But the way it does it is quite clever and really flexible
and gives you a lot of interesting options to cover a lot of different cases. So the way it works is it
gives you a load of essentially options for pluggable backends. And that could be somewhere on a file
system, it could be somewhere over SSH or SFTP, or any other sort of network protocol. But it could also
be any other way there is of storing files remotely. And that could include something like s3, object
storage, even something like WebDAV or IMAP, or anything which could feasibly store a file, you could
create an R clone back end to do it. So in my case, I’ve gone with an s3 compatible option.
I’ve actually gone with it Dr. E to storage because they sponsor our clone. And I thought if
I’m going to give money to someone, I may as well give it to people who support the project. So
that’s good. Yeah, you can just use the the generic s3 back end and point it anywhere, any
provider that supports that API, or there’s a few which support a version of the API like
Backblaze that have their own back end. But because this is all private family stuff, I wanted to make
sure it was encrypted. And the way that our cloud handles things like that is as well as these things,
which are actually a back end for storing files somewhere, there’s a set of sorts of meta
backends, for doing other things. So there’s one called union, which will take a load of backends,
and stick them together as though they were one back end. So you might have a load of small accounts
across various providers, but you want to just backup to them, somehow use that storage as a blob, or
have multiple accounts and backup your stuff to all of those at once. And there’s your backends
for doing that by putting them in front of the real back ends. And similarly, with encryption, the way
that you do that is you create your back end for where you’re actually storing the files, then you
create an encryption back end, which points to the real back end, and then you back up to the encryption
back end, which then transparently does all of that, basically. And the process is exactly the same.
Like you’ve got all of these commands in our clone for doing various things. And they all work the
same, whichever back end, you’re using nice.
Martin Wimpress 23:01
I’ve been aware of our claim for some time. And it’s been on my
list of things to take a proper look at. And I’ve never got around to it. But the way that
I’ve understood it is it’s like our sink for Cloud Storage.
Mark Johnson 23:14
Yes. I mean, that’s basically how it operates. But it’s cloud
storage and any other storage, right? You could equally use it with with another server on your land, or
just files locally if you wanted to. So you could use it like our sync like acid. Yeah, exactly.
Alan Pope 23:30
So the question is, what happens if you need to restore a file? How do you get
stuff back.
Mark Johnson 23:36
So the backing up is simple. You know, I’ve got a cron job, which says,
Take this directory and clone it to this back end. And that’s all I need to tell our clone. It
does that however often I want it to. And you see, if you look at the bucket in the object storage, you
just see a bunch of encrypted file names. But here’s the clever bit like Borg, you have the option
on your machine where our clone is running, to say mount this back end, and you get it mounted as a
directory Nice. So you can do that. You can also say take this back in and expose it over whatever,
remote protocols so you could have it on your cert, you could have our code running on your server, but
say actually give me that as a web dev mount. And then from somewhere else on your network, you could do
the same thing. You could also take the same config file from the server that’s doing the backups
and put that on another machine like your laptop and then run the same command to mount that back end.
And it will do that on your local machine. But aside from doing all of this on the command line, you
also get the option of running a web GUI which is actually what I use to do the configuration. So you
have an interactive CLI for adding remotes and then the CLI commands for running it but you can also do
it all through quite a decent although it describes itself as an experimental web front end, which you
run. You do our clone All CD dash dash web GUI on whichever machine has the config file that you want.
And then as well as setting it all up like that, you get the option to browse any of the remotes, so I
can go in there and I can say, show me my backup destination. And it’ll show me the real file
names and the real files, and then I can just click on one and download it. Or I can basically manage
all of the mounts that I’ve got. So I can from there, I can say mount this, and then that appears
in my file manager. And I can browse it at the moment in terms of historical backups, the way I’ve
approached, that is using s3 versioning. So if I have files that I delete, that will be retained as a
version or if they get updated, they’ll get a new version of them in the bucket, which I can then
expose through our clone, the old versions, as well as the current version, this is still evolving.
I’ve only put this in place recently. And I’m currently only doing just our family photos at
the moment to see how it goes and how much it’s going to cost. But it’s certainly something
I’m considering expanding to the other important things.
Alan Pope 26:06
I find it interesting that you don’t care so much about like non family
photos, or like your home directory or anything like that. If my
Mark Johnson 26:15
home directory blows up, I can reinstall and start again. I don’t you
know, handcraft my doc files like Martin does.
Martin Wimpress 26:23
I used to do that. But now I don’t have to because I have everything
powered by NYX. So when I come to restore a recovering machine, which I did recently, I reinstalled my
laptop. The instal process gives me an instal fully configured, including all of the directories in my
home directory where my data lives. So then I just imported my water backup profile, and ran a restore.
And I was back up and running. And that whole process from instal to fully restored system was under an
hour, which is really nice, but like you mark, the server backup is backing up our important stuff. So
we have sync thing on all of our laptops, with our documents and our photos. So like you
everything’s in multiple places. But because we’ve now got this magical internet LAN, if I
delete something in sync thing, it almost instantaneously gets deleted from everyone else’s
machine, which is why I wanted this bulletproof backup sitting on top of it.
Alan Pope 27:25
It’s interesting that you say about installing your your system from scratch
and then restoring from the most recent backup that you have in bulk. And now I think about it, if I get
a new laptop, I do a clean instal and I cherry pick a few things out of the backups of my previous
laptop like my music collection, for example. And maybe a couple of other folders and everything else I
use. Same thing for it, I’m thinking actually backup a tremendous amount of stuff that I am never
ever going to restore my there’s a whole truckload of stuff like the cache folder and the dot
local in my home, I am almost guaranteed never to want to restore an old browser profile cache folder,
that’s just not going to happen. So I think I need to optimise my backups a little bit better.
Martin Wimpress 28:15
So I used to do exactly what you’re describing with our snapshot, I
would just say backup, my home directory dot files, the whole thing. And then, as you say, be selective
in the restore process. I only need the Documents folder in the downloads folder and this and that. But
now with Borg, I’ve changed my strategy to say these are the directories that have data in it that
I want to backup. And it really is data, you know, is user data not configuration anymore. Definitely
not caches. So when I do a restore, I know it’s that’s all of the stuff that I want.
Alan Pope 28:46
I don’t trust myself, I don’t I don’t. I don’t trust myself
that I would pick all the right folders and then somewhere down the line, I would create a folder in my
home. And it’s never been backed up. That’s one problem. I don’t trust myself to put
things in the right place. Maybe I need more discipline with my backups.
Martin Wimpress 29:02
So that’s how we’re all doing our backups. Is anyone else out
there doing it differently. We’d be interested to hear what your solutions are. In particular, I
would like to hear what people are doing for backing up and viewing their family photos. I have a
suboptimal solution at the moment. Your help finding a new solution would be most welcome

