A few months ago I finally took the plunge and moved all the data on our home network to ZFS. I have been meaning to write about this for some time, and in a recent weekly meeting with my department’s systems engineer I realised it’s way overdue. So here follows a brief introduction to the use of ZFS for data storage on Ubuntu.
Just to be clear, this is not about root on ZFS (which as I read is now much easier on recent Ubuntu versions). It’s also not about configuring a NAS server – there are several good guides for that already, and much of what I propose is not best-practice in that scenario. I also purposely stick with the current long-term-support version of Ubuntu, which will soon change. Anyone on LTS (like me) will remain with 18.04 at least until summer, after the first point release is out, and distribution upgrades are enabled. In any case I expect most of this will still work after the upgrade.
I also include details on a couple of advanced concepts. First, I discuss the use of an SSD to cache the data volumes, with similar functionality to what I had described in my earlier blog post for LVM. I also talk about automatic checkpointing, which protects against user error, and is one of the main reasons I wanted to switch to ZFS. Eventually, I’ll also explain the process I’m using for backups, replicating entire ZFS filesystems on a separate server, with full checkpoint history maintained at both ends. But that will have to wait for another day.
General Reasoning on Partitioning
My general reasoning remains similar to what it was on an LVM-only system. I assume a pure Ubuntu setup, and I still want all disks to use full-disk encryption (because computation is fast, and I don’t want to worry about any of my data when disks fail or get retired). I have now moved to a two-partition setup for the OS (just boot and root), and all this lives on a fast SSD. Any space on the SSD left over will be used only for caching. Data lives on separate hard disks, and the only change from my previous setup is that this is now based on ZFS rather than LVM.
At a minimum, this setup will require one SSD that is large enough for the OS and caching space, and one HDD for the data.
Partitioning for the Operating System
First of all, partition the SSD, which will also be our boot drive, as follows:
- If you’re using a GUID Partition Table (GPT) rather than a Master Boot Record (MBR) table, you will need to allocate a tiny (1M) BIOS boot partition, to allow GRUB to boot from that drive
- A small (512M) boot partition – this is needed because grub doesn’t support bootup from an encrypted LVM partition, or at least it didn’t the last time I made an install.
- The rest is allocated to a second partition that will become an encrypted PV.
With LVM, create a system VG (let’s call it sysvg) with the encrypted SSD second partition. Within sysvg create a LV for root, sizing as needed. As a general guide, I tend to keep root at 64GiB, depending on usage. All told, this takes up a small fraction of the SSD (in my case 500GB).
I also keep 256GiB of the SSD reserved for a scratch filesystem, just to have a dedicated area on the SSD that I can use for fast access to temporary data. For this I create 256GiB LV (let’s call it scratch) that is completely independent of my data volume strategy. In this I create a standard ext4 filesystem, which I mount somewhere convenient (e.g. /opt/scratch).
Partitioning for the Data Volumes
Since I want every drive to be full-disk-encrypted, I diverge from the usual ZFS practice, which recommends that ZFS sees the disks directly as a device, without any partitioning. So I partition the HDD into a single GPT partition, which I encrypt using LUKS/dm-crypt. I add this partition to my crypttab using its UUID (which one can find with blkid), let’s call this datapv1.
Now I’m ready to create a ZFS pool (let’s call it data) with this encrypted drive as follows:
zpool create data /dev/mapper/datapv1
If I wanted redundancy here, I would use a different command to create the pool. For example, to create a pool with two drives in mirror configuration:
zpool create data mirror /dev/mapper/datapv1 /dev/mapper/datapv2
where I’m assuming the existence of a second HDD (ideally identical to the first) with an encrypted partition called datapv2. This would be a suitable setup with a minimum number of drives and enough redundancy to recover from the failure of any one of those drives.
Caching the Data Volumes
ZFS uses separate cache devices for writing (called the ZFS Intent Log or ZIL) and reading (called the Level 2 Adaptive Replacement Cache or L2ARC). The L2ARC, as its name implies, serves as a second level to the Adaptive Replacement Cache or ARC, which sits in RAM. This is why NAS servers based on ZFS need a large amount of RAM.
In my case, I want to use the main system SSD for both functions. The ZIL doesn’t need to be large, as this depends entirely on the write data rate that is expected in the system. In my case I create a 4GiB LV (which I call wcache) in sysvg for this purpose. L2ARC also doesn’t need to be large, as this is for a workstation rather than a server, and it’s expected this will be switched off daily (ZFS caching is not persistent across reboots). In my case I decided to dedicate the remaining ~140GiB to L2ARC, so I created a LV with this (which I call rcache) for L2ARC.
The rcache and wcache LVs need to be configured to act as L2ARC and ZIL respectively for the ZFS pool created earlier. This can be done with:
zpool add data cache rcache zpool add data log wcache
Note that these devices can be removed at any time from the pool without affecting data integrity. This can be done with:
zpool remove data rcache zpool remove data wcache
You can also visualise the cache usage with the following:
zpool iostat -v
So far I have only really talked about the creation and setup of the ZFS pool. However, ZFS is both a logical volume manager (which is as far as we’ve covered) and also a filesystem. In each pool we can create a number of filesystems, each of which inherits the properties of the pool and can also be set with its own properties. One nice thing about these filesystems is that they all share the same pool and there is no concept of size at filesystem level (though you can set quotas if this is important to you).
In my case, I create separate filesystems for user data (to mount on /home), music, etc. I’ll explain the creation of the user data partition as an example, everything else simply follows the same process. Creating the filesystem is as simple as:
zfs create data/users
This automatically mounts the filesystem as /data/users. This is often good enough, but in this case I really want this to mount as /home. This can be done by setting the appropriate property of the filesystem using:
zfs set mountpoint=/home data/users
Observe that I didn’t mention /etc/fstab anywhere, as ZFS mounts its filesystems independently of this. ZFS also allows for automatic export of filesystem on NFS or Samba, but I won’t go into details here.
ZFS allows the user to create a cheap snapshot of the filesystem at any point. As the name implies this is simply a view into the filesystem, frozen in time. Any changes in the filesystem beyond that point will not affect the snapshot – that is, any changed files get copied on write, any deleted files simply lose the index from the current filesystem (without the actual data being removed). This means that the total space used is the union of all snapshots of all filesystems. Checkpoints are cheap (and instantaneous) because all they affect is the index.
All this means that snapshots are a convenient way to keep ‘backups’ of the filesystems at any point in time. On Ubuntu it is very straightforward to have these checkpoints created on a rolling schedule, simply by installing a single package:
apt-get install zfs-auto-snapshot
By default this will take frequent (15-min), hourly, daily, weekly, and monthly snapshots, in each case keeping the last several ones as needed (e.g. for hourly it keeps the last 24). You can check on what snapshots exist (and how much space each takes) using:
zfs list -t snapshot
- Aaron Toponce’s excellent and detailed pages on installing ZFS on Debian