[Note 9-Apr-2020: Since writing this post I migrated my data HDDs to ZFS; I write about ZFS, and on how to cache on SSD, in a more recent post. The OS remains on SSD using LVM.]
[Updated 7-Nov-2019: clarified need for thin-provisioning-tools package.]
[Updated 30-Aug-2019: added instructions for setting caching mode to writeback.]
For the last several years I have been using a combination of a fast SSD and a large (but still respectably fast) HDD strategy on my desktops. The main idea was to keep the OS and software on the SSD, speeding up boot times and general computer usage, while storing my data on a much larger HDD. In Ubuntu terms, I keep boot, root, usr, var on separate partitions on the SSD, while home sits on the HDD. (I also keep another HDD for backups, but that’s matter for another time.) This setup generally works well, and in fact leaves a significant amount of extra space on the SSD that I don’t really need for the OS and software. Up to now I have been using that in a scratch partition, mostly for my software builds.
Now I thought it would make more sense to remove the scratch partition and instead use that space (about 60GiB on my desktop) as a cache for the data partition. The only reason I even got to this point is that lately it seems my PC is a bit sluggish on initial login, which is when a bunch of software is loading and getting its settings from my home directory. So while the software loads fast, checking the settings takes time. I read up a bit, and found that the Linux device mapper has a module (dm-cache) specifically for this purpose. So I read the docs, a few blogs explaining how to set things up, and got going. Unfortunately, there are a few caveats that no one seems to mention, so I thought I’d document them here. These differences are caused by my setup, so perhaps no one else encountered these before. I have good reason for keeping things the way I do, so I’ll document that reasoning here, in the hope it may be also useful to someone else.
General Reasoning on Partitioning
First of all, my setups are always pure Ubuntu (I keep several VMs with Windows, for various tasks, and use those as needed), and disk usage is based on LVM over dm-crypt for full-disk encryption. Nowadays computation is fast, and disks always fail or get retired, plus I don’t want to worry about my data in case of theft, warranty returns, etc. LVM is great because it lets me resize and move my partitions as needed, mostly even on a live system. (Think migrating your data to a new bigger HDD when the time comes to upgrade/replace.) With support for RAID, caching, and much else, I generally can’t be bothered with hardware RAID any more.
I like to keep my OS and software separate, as it simplifies management and backups. Since I want these to live on the SSD, I’ll make that my boot drive, and partition as follows:
- A small (512M) boot partition – this is needed because grub doesn’t support bootup from an encrypted LVM partition, or at least it didn’t the last time I made an install.
- The rest is allocated to a second partition (in my case sda5) that will become an encrypted PV.
With LVM, I create a system VG (let’s call it sysvg) with the encrypted SSD second partition. Within sysvg I then create LVs for root, usr, and var, sizing as needed. As a general guide, I tend to keep root at 4GiB, usr at 32GiB, and var 8-16GiB depending on usage. All told, the OS and software require less than 64GiB, so about half of even a (small, by today’s standards) 128G SSD.
Any additional HDD simply gets everything in a single partition, which also will become an encrypted PV. I generally allocate these to different VGs depending on use, so for example I had a datavg for my data and a backupvg for my backups. That way I can easily have multiple drives per VG, and easily move partitions between them, while being sure that each VG sits on separate drives. (That way, for example, I’m certain that my backups and data partitions are on physically separate drives.)
Partitioning for Caching
Unfortunately, for caching, there is a caveat that no-one seems to mention. The cache volumes and data volumes that will be tied together need to be in the same VG. This makes perfect sense when you think about it, as they will become one logical unit. In my case this meant that I needed to have my SSD and data HDD in the same VG. (I had the SSD in sysvg and data HDD in datavg before, for logical separation between the two.)
To resolve this, I decided to start keeping my data in sysvg. This was relatively easy to accomplish by merging my datavg into sysvg, as follows:
- Unmount all mounted partitions from datavg. This was a bit complicated because /home sits there, so I couldn’t be logged in on my user account to do this (or /home would report as busy). Easiest solution was to ssh in as root from my laptop (with a key, which is why I couldn’t use a password on the console).
- Merge the VGs using:
vgmerge -v sysvg datavg
- Update fstab to mount the data partitions from the correct VG (not necessary if you’re using UUID to determine the partitions).
- Reboot to check everything works fine.
To create the cache volumes you’ll need space for three LVs: the cache data, cache metadata, and a spare metadata partition. The latter can be disabled if you want, but given the relatively small size I opted to keep it there. It is necessary if you ever need to repair the metadata volume. The whole data+metadata setup is necessary because the cache is effective a thinly provisioned volume, which is an elegant way to implement this, as the volume can appear the same size as the volume it is caching, while actually only having enough space for small part of that data. The metadata volume keeps track of which parts of the main data volume are in the cache.
Assuming you want to use whatever remains on your SSD for the cache, the tricky bit is to figure out how much space to leave for the two metadata partitions. Rule of thumb is 1000 times less than the cache data, but that isn’t exact. Fortunately, the tools themselves will let you know what they want, so you don’t have to worry about a thing. The simplest way to create a cache pool using all remaining space on the encrypted SSD PV in sysvg is with the single command:
lvcreate -n cache -l 100%FREE --type cache-pool sysvg /dev/mapper/sda5_crypt
Obviously replace the PV path (and LV/VG names) as needed. This automatically determines how much space is available on the given PV (which must be part of the name VG of course), and split that up into the three LVs needed. It will also bind those partitions together as a cache pool.
Applying the Cache to the Data Volume
Once we have the cache pool created, we need to attach it to the data volume we want to cache. This is easily done using:
lvconvert --type cache --cachepool sysvg/cache sysvg/users
In this case, the users LV starts using the cache LV as a cache pool immediately. There is no need to unmount anything, as all this happens at a lower (LVM PE) level.
The caching mode defaults to writethrough, which means that all writes happen in both the cache and the underlying HDD. This increases write latency considerably, but is safer in the sense that the cache volume is always disposable. To speed up writes, we can change the caching mode to writeback, where writes happen only in the cache, and will be copied to the HDD asynchronously. This can be achieved with:
lvchange --cachemode writeback sysvg/users
Note that you can always change strategy back to writethrough at any time, with the same command, simply replacing writeback with writethrough.
Prepare for Reboot
At this point the cache is active and working. You can check on how much of the cache is in use using the usual lvs command. However, we need to be sure we’re able to boot up the system next time we try. A few steps are needed for this:
- Install the thin-provisioning-tools package, which contains the cache_check binary. Not entirely sure if this is the key dependency, but it’s a tool worth having once any volume is cached, anyway. We also had a case at work where the system was unable to activate or mount the cached volume; this was resolved when we installed this package and rebuilt initramfs.
- Disable any key file on the data PV (replace with none in /etc/crypttab). The problem here is that on bootup, now you have two PVs that are part of sysvg, on which root resides. To be able to activate the VG completely, both PVs need to be decrypted, and initramfs doesn’t want a keyfile in there, because it sits in the unencrypted boot partition. So if you used to have a keyfile (as I did) for the data PV, you cannot use that any more. Instead you will be asked to enter the password interactively for this PV as well (i.e. to boot, you now need to enter two passwords, one for sda5 and one for datapv1 in my case). Strangely, this problem does not show up unless you activate caching.
- Update the initramfs, ensuring these settings are known at boot time, using the usual:
update-initramfs -u -k all
Should you need to detach the caching at any point, all you need to do is to use the following command:
lvconvert --split-cache sysvg/users
This will keep the cache pool but simply detach it from the data volume. Any writes in the cache are flushed to the data partition before detaching, so again this should be safe to use on a live partition.
Detaching the cache is necessary, for example, if you need to resize the data volume, as it is not possible to resize a cached volume directly. The steps to follow are simple: detach cache, resize data volume, re-attach cache as before.
I’ve seen a number of sites documenting how to set up caching on the root partition, but frankly I can’t see the point. (In case you were wondering, that would require a few extra steps, because a few things are missing from the standard initramfs.) If you have a SSD and a HDD, why bother putting your OS and software on the HDD? Just keep those partitions (boot, root, var, usr in my case) on the SSD, and use whatever remains to cache the HDD, which now only contains your precious data. It’s not like the OS + software consume much space anyway, and having them on the SSD makes sure that access to those is always fast.
Perhaps the reason many found themselves in that situation is that they already had a working system on HDD and only later bought a SSD to speed things up. I was in the same situation myself when I got my SSD several years ago. But with LVM it’s trivial to simply add the SSD PV to your system VG, then move the partitions from one device to the other (using pvmove). You can even do this on a live system.