Personal Data Backup Strategies

Theoretical options and practical solutions

“Yes honey.. I swear we need this many HDDs..”

We all accumulate digital files that we can’t bare to loose. Many only realize the importance of backups when they loose something important. The most common case being lost baby pictures that were on someone’s phone.

In this blog post I aim to outline the available options and give some recommendations based on my many personal trials and tribulations on the subject.

Fundamentals

First let me refresh the fundamentals that you should be aware of:

All storage mediums will fail

It’s important to remember that all types of storage local mediums like, hard disks, pen drives, DVDs, etc will all eventually fail. There are no expectations. There is also bit rot (also called data rot) to contend with.

3–2–1 backup rule

source : link

This rule of thumb is a good general guideline that will allow you to recover from most forms of failures. Think of it as a north star to guide you.

Convenience vs Cost

As it is with many other aspects, there is a clear trade off between convenience vs the overall cost. A cloud storage solution is easy, but you have to pay a reoccurring subscription. A DIY NAS is cheaper to run, but you need to set everything up and do maintenance yourself.

Privacy tradeoff

Tech giants like Google provide “free” cloud storage and sometimes even unlimited storage for compressed photos. But of course, nothing is free; your data let’s Google profile you in order to serve you ads.

RAID is not a back up

Don’t confuse RAID (an array of disks) with a backup. Data stored on a RAID only counts as a single copy, albeit a more robust copy.

Backup strategies

Basic : Cloud storage and backups

source: link

At the most basic level of backups, use a cloud storage solution (e.g. Google Drive) as a “virtual pen drive”. Put your most important files there and use client apps to automate backs. The less ubiquitous cloud backup services (e.g. Crash Plan) are also an option as they are cheaper but don’t let you easily access and share files like with cloud storage services. Note that you are at the mercy of these service provides to not loose your data.

Basic : Removable local storage media

source: link

The other basic option is to periodically attach an external hard disk and sync your files. Use software like AOMEI Backupper to sync files. This works OK for a single computer, but even then you will have to set reminders and schedule time to do the backup. Thus the backups tend to be less frequent. Also, there is no offsite backup, so it’s possible to loose your data in case of a disaster situation like a burglary or house fire.

Combining removable local storage and cloud storage together can get you very close to a robust backup strategy; albeit that it doesn’t scale well beyond a single user and requires effort to manually sync stuff.

Intermediate: Local NAS

The next step up is to use a Network Attach Storage (NAS). For the uninitiated, think of it as an external hard disk attached to your network or a private Google Drive on your local network. The key feature is that it should be always available so that you can automate the backups. It is great at handling multiple users and is the ideal solution for a family.

For the hardware, I perceive three tiers:

  1. DIY low cost options : Raspberry Pi / Old laptop + USB HDD + little bit of elbow grease
source: link

2. NAS appliances : Turn key solutions from Synology, QNAP and even Western Digital. But you pay for the convenience.

NAS appliance from Synology

3. DIY NAS PC : Replicate the above mentioned appliances with salvaged old and cheap PC parts and an OS like TrueNAS. Requires a lot of elbow grease, but you get a lot of flexibility.

DIY NAS : link

There is no ideal hardware solution here, you will have to make a trade off between cost, flexibility, capacity and effort.

Your NAS can work as either a backup sink from other computers, or it can have the “master copy” where people work directly off of the NAS (e.g. via SMB share).

A NAS might have an array of redundant disks (e.g. via software solution like Linux MD or ZFS) but it only counts as a single backup copy of your data. You still need to backup the data inline with the 3–2–1 rule. If you have a large NAS, you may require a second NAS for backups. For smaller amounts of data, an external HDD or cloud backup solution is viable.

A bonus of running a NAS is that you can have it run other useful services like Plex for media streaming (a private Netflix of sorts) and whole network ad blocking.

Advance: DIY offsite solution

Sometimes you can’t trust cloud backup service provides or you have too much data to be able to make it economical. This is when you should consider a DIY solution. There are two main options here:

  1. DIY on public cloud, using something like Amazon Glacier. The cost per GB will be less, but you still need to upload everything and maintain the remote server instance.
  2. Use a second NAS offsite (at a friend’s or family’s place) as a sink. The key advantage here is that you can make a initial replication locally and then you only have to transmit the new data via internet to the offsite NAS.

NAS vendors like Synology offer services to easily backup to a second NAS offsite. Elbow grease will be required if you are go the DIY NAS route.

Note: some cloud backup services let you mail in the initial data in HDDs. But it’s not common outside of the USA.

My real world setup

The strategies I outline above are relatively straightforward in theory. But in practice, the requirements can be complex and other limitations (financial and technology) can complicate matters quite a lot. Let’s now looks at my own setup as a case study of what a DIY solution looks like.

The solution

1. The Hardware

  • DIY local NAS running TrueNAS OS (Name: Greed) : Intel G4560, 16GB RAM and 3TB x 5 disks
  • DIY offsite NAS running TrueNAS OS (Name: Gluttony) : Intel i5 4590, 16GB RAM and 3TB x 3 disks
  • Primary desktop running Pop_OS! (Ubuntu derivative) (Name: Lust) : Intel Xeon E5 2680 V3, bunch of NVMe SSDs and 3TB x 1 disk.

2. The Data

  • The master copy of all my less performance critical data (media, documents, software, etc) will be in Greed. I use TrueNAS’s inbuilt ZFS based tools to manage snapshots. Let’s call this dataset Silver.
  • The master copy of my performance critical data (photos and videos) will be on Lust’s SSDs. I use Sanoid on Lust to manage ZFS snapshots. Let’s call this dataset Gold.
  • Other computers in household can access the data (Gold and Silver) via SMB share on Greed

3. The Data flow

  • Lust uses Syncoid to periodically push updates of Gold to Greed.
  • Lust also uses Syncoid to periodically pull updates of Silver from Greed.
  • Gluttony is placed offsite (at my mom’s place). It has SSH access to Greed via an OpenVPN server that I’ve setup in my Linode VPS (named Limbo). Limbo runs some of my other personal services as well.
  • Gluttony periodically pulls updates of Gold from Greed.
  • There is no second backup copy of silver, as it is not that important (cost vs benefit wise)
High level overview of my setup for data backups

Solution highlights

  • Total number of computer’s storing data (number of copies) : 3
  • Total number usable storage space : 9 TB
  • Total number of HDD space used : 30TB (+ 6TB of cold spares)
  • Monthly reoccurring fees : $5 for Limbo (though I am using it for other services too)

PS: brownie points if you figured out the pattern in the machine names : Limbo, Lust, Greed & Gluttony

Conclusion

Backing up your data is important. There are inexpensive solutions available if you don’t have much data to backup. However, when your storage requirements increase, be repaired to shell out cash and/or get your hands dirty.

Happy data hording!

Embedded Systems & Linux Techie