ZFS on Ubuntu 20.04 LTS

Part 2 of my series on Tips and Tricks for the Linux Desktop

ZFS on Linux (ZOL) logo

ZFS is a “next gen” filesystem that brings many useful features to the table. With the Ubuntu 20.04 LTS release, I think it’s finally ready for the tech literate data hoarders to embrace it!

This blog post aims to describe what is currently possible with ZFS and is not a step-by-step guide. Where possible, I have linked to more detailed instructions.

ZFS Primer

Before we begin, a quick overview of ZFS for the uninitiated.

ZFS was originally developed by the folks at Sun for the Solaris OS. It was later open sourced, forked and ported to Linux. Though curiously, due to licensing complications, it’s not part of the Linux kernel.

At a very high level, you can think of it as a volume manager (akin to LVM) and an advance filesystem rolled in to one. One of it’s numerous benefits is that it brings “Git like” functionality to your storage :-)

Key features:

  • Volume management : This is the ability to create “virtual partitions” (volumes) from one or more physical storage devices. A cool feature of volumes is that you don’t have to fret about how many GB you are gonna allocate to each “partition”. The free space juggling is handled by the manager, so you no longer have to resize partitions when you start running out of space on one. If you are using more than one disk, this also provides software RAID features like mirror, stripe and parity.
  • Data integrity : Uses checksum at the block and at every hierarchical organization level to detect and correct bit rot. Includes self healing features.
  • Atomic CoW (copy-on-write) Snapshots : This is the big feature folks!

if you ask a traditional filesystem to modify a file in-place, it does precisely what you asked it to. If you ask a copy-on-write filesystem to do the same thing, it says “okay” — but it’s lying to you.

Instead, the copy-on-write filesystem writes out a new version of the block you modified, then updates the file’s metadata to unlink the old block, and link the new block you just wrote. — Jim Slater (source)

So, what this meas is that CoW filesystems are basically storing the difference at the block and management levels while maintaining a journal of the changes, similar to how Git would make commits. This makes it robust against things like power failures, makes it easy to save states (called a snapshot) and switch between them.

  • Fast asynchronous incremental replication

Asynchronous replication means that you can take an atomic snapshot of an entire filesystem and easily move the entire thing, block-by-block, on to a remote filesystem. Unlike traditional “synchronization” methods, where you would need to crawl over the filesystem on both sides first to figure out what’s changed. Computer A knows exactly what has changed between snapshots 1 and 2, and it can immediately begin squirting that data — and only that data — to computer B. — Jim Slater (source)

Key Drawbacks

  • The above mentioned features come with an increased compute cost in terms of CPU and RAM utilization
  • Your storage drive’s max throughput will reduce. For example when I tested a Samsung 970 EVO formatted to ext4, I’d get ~260MB/s when doing 4KiB, Queue depth 1 writes. That went down to ~60MB/s with ZFS!
  • ZFS volume management doesn’t let you easily add a single disk to an existing ZFS storage pool(collection of drives). To expand your storage you need to either replace all the drives on your existing storage pool or you need to add new pool with it’s own redundancy

Overall, I think provided that you have a relatively large SSD and a decent CPU, ZFS is the way to go over a legacy file system like ext4. The only caveat here is that if you have compute workloads that have significant storage bottlenecks. E.g. large compiling jobs. For such cases you are better of sticking to ext4, though consider using LVM or mdadm on top of the drives.

Further reading

Using ZFS on Ubuntu

At this point it would help if you took some time to read the “Intro to ZFS” article linked above. A basic grasp of what a zpool and dataset is would be helpful. If you haven’t and still insist on continue reading, think of zpool as a collection of storage mediums and that a dataset is a filesystem that is constructed on top of a zpool. Let’s start off by install ZFS tools with the following command:

$ sudo apt install zfsutils-linux

Now the next step is to create a zpool. You can use an entire disk, disk partition, LVM volume or even use sparse files. If want to add some disk redundancy, you can create a zpool from multiple disk drives. For this blog I’ll demonstrate using a virtual disk drive on a virtual machine.

Pro tip: Even when you are using an entire disk drive, it is better that you create a single smaller partition and pass that partition to ZFS instead of the entire drive. This servers two purposes.

  1. It will be easier for you to mix drives from multiple brands, as there are slight variation in absolute disk capacities
  2. If you are using SSD, you should anyway have about 10% free space so that your drive has sufficient scratch space for the wear leveling algorithms to work. This improves the consistency tremendously.

For my virtual drive /dev/vdb; run the following command to delete all partitions and reset it

$ sudo sgdisk - -zap-all /dev/vdb

Then run the following command to create a partition with 1GB of trailing free space (Adjust these values to suite your setup and drive capacities).

$ sudo sgdisk -n1:0:-1G -t1:BF00 /dev/vdb

This will result in a /dev/vdb1 partition

Now let’s create a zpool with this vbd1 partition:

$ sudo zpool create -f -o ashift=12 -O compression=lz4 my_pool /dev/vdb1

Here we are passing the ashift arg corresponding to 4k sector size and we are also enabling LZ4 compression. The name of my new zpool is “my_pool”.

Next, we create a dataset using:

$ sudo zfs create -o mountpoint=/home/me_doing/important_docs my_pool/docs

Here we are choosing to mount the new dataset at an arbitrary path. You can adjust this or leave it to its default.

Now let’s see how you can reap the benefits of ZFS using it’s snapshot feature. Let’s create some files in the dataset and then create a snapshot to save that state using the command:

$ sudo zfs snapshot my_pool/docs@milestone_1

you can view the snapshots using the zfs list command as shown in the screenshot above.

Now you can experiment with “accidentally” changing a file and then recovering the change using the command:

$ zfs rollback my_pool/docs@milestone_1

Pretty cool huh? There are more powerful tools that let you view diffs and create clones from snapshots. But I won’t get into them here.

As previously mentioned, another benefit of ZFS is its ability to easily (quickly and with minimum data exchange) create backups. ZFS provides tools to pipe the synchronization data over SSH and is ridiculously ease to use.

VM to VM data replication using ZFS over SSH

Replication is incredibly fast compared to traditional file based syncing methods like rsync. The command syntax is as follows:

$ zfs send <local_dataset>@<snapshot> | ssh <remote_user>@<remote host> zfs receive <remote_dataset>

Note that you may need to delegate permissions to non-root uses as summarized here: http://asvignesh.in/zfs-send-receive-non-root-account/

I know what you are thinking.. “OK so, checkpoints and fast replications are great, but do we really need to use the command line for everything?”. A GUI would have been beneficial to hide some of the command line complexity here, but there is nothing robust on Ubuntu yet. FreeNAS is ZFS on easy mode but it’s targeting a NAS application and not for desktop use. The most promising GUI tool I’ve come across is: cockpit-zfs-manager.

For the time being, ZFS on Ubuntu is for the command line ninjas only. However, one way you can getaway with the CLI complexity is by automating everything so that you don’t have to manually type stuff in.

Just like how it is with git and commits, use don’t get to reap the benefits of ZFS without creating many snapshots. However creating numerous snapshots and then later deleting them isn’t exactly convenient.

This is where a policy based snapshot manager like Sanoid comes in.

The recommended policy is something that would looks like a Fibonacci sequence; snapshots that are initially close together chronologically and then a fewer and more apart later. E.g. 24 hourly snapshots, 7 daily snapshots, 3 monthly snapshots and 2 annual snapshots. If this policy ran for 2+ years you would end with 24+7+3+2 snapshots that span a duration of 2 years, but where most of the snapshots are recent. For example, here is a screenshot of my automatically created snapshots:

My snapshot list spanning about a month

Once you define the policy that meets your requirements, Sanoid would manage the automatic creation and deletion of snapshots. You only need to get your hands dirty with the command line when you need to revert to a saved snapshot. Read the Sanoid readme about how they use it to easily recover a VM that was hit by ransomware.

Sanoid also includes a tool called Syncoid, which automates the replication tasks; typically over SSH. Think of it as a wrapper to the ZFS send and receive commands that we saw earlier. You can set it up as a cron job to automatically run replication tasks.

With Sanoid and Syncoid together you can have an automated mechanism to take snapshots and backs up your data!

Pro tip: if you want Syncoid to automatically prune snapshots in the target, use this Pull Request which hasn’t yet made it to an official release.

ZFS on root

ZSys architecture

Ubuntu 19.10 added experimental support for ZFS on root. This means that you can have your system (“/”) is installed on ZFS. It also added support for ZSys (ZFS System). ZSys aims to make basic and advanced ZFS concepts easily accessible and transparent to anyone, like providing automated snapshots, an easy way to rollback, offline instant updates, easy backup support and so on. The coolest user facing feature is that it allows you to quickly and easily revert to your computer through the GRUB menu! Think of it like the “Windows Restore” feature on steroids. You can learn more about ZSys here.

In many ways ZSys overlaps with Sanoid’s functionality, but it’s scope is broader and it currently doesn’t quite do Sanoid’s job completely. Maybe in the near future when ZSys matures, we won’t need Sanoid anymore.

With Ubuntu 20.04, you can manually save system states with the zsysctl save command; but then you would have to manually manage them and that is a bit too cumbersome. It is in my opinion better to leave ZSys on autopilot and let it do it’s thing with your “System”, while you use Sanoid to manage your important user files.

Now let’s focus on how to setup ZFS on root and it’s key features.

You need to do a clean install for this. If you are intending to install Ubuntu using an entire drive, you can use the setup wizard for that. If you want to experiment with installing on multiple drives(mirrored, stripped or ) or to only part of a drive, then you have to manually format the drives and install the OS following this official guide.

To install using the setup wizard, proceed as normal till you get to the “Installation type” page. Here, click “Advanced features” and select the ZFS option. That’s it!

Select the ZFS option here to enable ZFS on root

Note that ZFS on boot is still marked as experimental. I hear that the experimental tag will be dropped in 20.10 release and that the changes will back ported to 20.04.

After you boot in to the OS the changes won’t be apparent till you start to look under the hood. If you check the disk utility and/or run $ zfs mount, you will begin to see how ZFS’s basic features are being leveraged to support an OS.

The datasets and mount points used to support ZFS on boot

Basically you system is now comprised of many ZFS datasets. ZSys is basically responsible to manage “system states” which would involve managing a collection of snapshots from all datasets to form a collective “state”. For the sake of bravity, I won’t go in to more details here.

The biggest change you will notice is that apt installs take a bit longer to finish. If you pay attention to the long messages you will realize why.

Top: Regular install, Bottom: with ZFS boot

There is an extra line that prints on the ZFS on root system saying that it’s updating GRUB. This is ZSys saving the state and updating GRUB with the option to revert back to it. The keen eyed among you might have also noticed that the ZFS system is using more RAM. That is because ZFS is using an in-memory cache called “ARC”. If there is memory pressure, ZFS will release the memory allocated by ARC. Use the $ arc_summary command if you are curious about ARC.

Interestingly Docker is ZFS aware and will use datasets and snapshots to save images and their layers when it detects a ZFS filesystem. Docker’s ZFS implications are transparent to the users, with some performance gains.

Docker images using ZFS for storage

I mentioned several times that ZSys lets you change states via the GRUB menu. So let’s see it in action. For this demo, I first install some random app. Then I created some files in my home directory and proceeded to install yet another app. Thus I am expecting the system to have created two save states (before each install). Next to make getting to the GRUB menu easier, open the /etc/default/grub file with your favorite CLU text editor with sudo. Set GRUB_TIMEOUT_STYLE=menu. Then run $ sudo update-grub

Now reboot. You will see a option called “History for Ubuntu 20.04 LTS”

Now you will see the two saved states.

When you select a state, you can select to revert only the system or system and user data (data in the home directory).

These reverts go beyond the standard ZFS rollback feature as they have made revert a non destructive action: current and intermediate states aren’t destroyed, and you can imagine even reverting the revert.

My Setup

To help reinforce what I’ve described above, let me share how I have setup my machine. I have one SSD as my boot drive with ZFS on root setup and a separate drive for my important data.

In the above screenshot you can see the bpool and rpool zpools that are used by ZSys. lusty_pool in this case is the zpool I use for my important files. It is not managed by ZSys.

If you have only one drive like in a laptop; I suggest that you follow the manual ZFS on root install guide and then create an extra partition to use as a zpool for your important files. You can alternatively go with LVM volumes instead of partitions. This would give you some flexibility with volume sizes. However there will be a performance hit due to the ZFS + LVM overhead.

I’d recommend creating multiple datasets for your different types of important data. This allows you to have different snapshot policies for them. Here you can see that I have created three datasets; docs, photos and VMs. I have setup Sanoid to create frequent snaphots for my VMs, but only to span a small duration. Photos and docs in contrast have been setup for less frequent snapshots that span a longer duration. I have also setup Syncoid to backup only my photos and docs to a second PC running FreeNAS.

I’ll dig deeper into my personal setup on a later date, when I hope to cover backup strategies with ZFS in a separate blog post.

Thank you for reading!

Embedded Systems & Linux Techie