In multiple posts so far I've created a ZFS pool using pretty much the same parameters. But I never bothered to explain why I chose them. Until now...
From my latest ZFS-related post, I have the following pool creation command:
zpool create -o ashift=12 -o autotrim=on \
-O compression=lz4 -O normalization=formD \
-O acltype=posixacl -O xattr=sa -O dnodesize=auto -O atime=off \
-O encryption=aes-256-gcm -O keylocation=prompt -O keyformat=passphrase \
-O canmount=off -O mountpoint=none $POOL $DISK
This setting controls the block size of your pool and should match whatever your (spinning) disk uses. Realistically, you'll probably use 4K sectors thus 12 is a good starting value. Why the heck 12? Well, this is expressed as 2ⁿ and 2¹² is 4 KB. I like to force it because often ZFS might wrongly auto-detect value 9 (512 bytes) which shouldn't be really used these days. This is not really ZFS' fault but consequence of some disks being darn liars to preserve compatibility.
Even if you do have 512-byte disks today, any replacement down the road will be at least 4K. Since the only way to change this option is to recreate the pool one should think ahead and go with 4K immediately.
When it comes to SSD setups there might be some benefit in going even higher since SSD usually use 8K or even larger erase blocks. However, since SSDs are much more forgiving when it comes to the random access, most of time it's simply not worth it because large block sizes will cause other issues (e.g. slack space).
Support for trim is really important for SSD and completely irrelevant when it comes to the spinning rust. Since my NAS uses good-old hard drives, this setting really doesn't apply. But I also use ZFS on my laptop and there it makes a huge difference. So I include it always just not to forget it by accident when it matters.
zstd seems to be a compression darling, I still prefer
lz4 for my local datasets because it's much easier on the CPU. There's also an option to turn off compression completely, but I honestly cannot determine any speed improvement in a general case. Using compression is like receiving free space, so why not?
As ZFS uses Unicode (UTF-8 more specifically), it has an interesting problem that two filenames might look the same but they might have two different expressions. Most known example might be Å which can be expressed either as Å or as combination of A and a separate ring mark. From the point of user, both these are the same. But they have a different binary expression (U+00C5 vs U0041 U+030A).
Setting normalization explicitly just ensures each file name is stored in its canonical Unicode representation and thus things that look the same are going to be the same. I personally like
formD on a philosophical level but any normalization will do the same. Just don't stick with default value of
This option allows you to store extra access attributes not covered by a "standard" user/group/world affair. The most common need for these attributes is with SELinux. However, even if you're not using SELinux, you should enable it as it doesn't really impact anything if not used. And you might consider using SELinux in the future.
This option will tell ZFS to store extra access attributes (see above) with the metadata. This is a huge performance boost if you use them. If you don't use them it has no effect so you might as well future-proof your setup.
Assuming you already save all these extra attributes, it's obvious they cannot really fit nicely in one metadata node. Unless it's a big one. Once set, this option (assuming
feature@large_dnode=enabled) will allow larger than normal metadata at the cost of some compatibility. Assuming you have ZFS 0.8.4 or above, you really have nothing to worry about.
Posix standard specifies that one should always update access time whenever file or directory is accessed. You went into your home directory - update. You opened a file without changing anything - update. These darn updates really stack up and there is really no general use case where you would need to know when the file was read. This flag will turn off these updates.
I like my datasets encrypted. Ideally one would use full disk encryption but using ZFS native encryption is a close second with unique benefits at a cost of minor data leaks (essentially only ZFS dataset names). And GCM encryption is usually the fastest here.
Call me old-fashioned but I prefer a passphrase to a binary key. Reason is that I can enter passphrase more easily in a pinch.
For my laptop I keep
prompt as a key source so I can easily type it. For servers, I use
file:// syntax here since I keep my passphrase on a TmpUsb USB drive. This allows me to reboot server without entering key every time but in the case it's ever stolen my data is inaccessible.
As a rule, I try not to have top-level dataset mountable. I just use it to set defaults and data goes only in sub-datasets.
And that's all the explanation I'm ready to offer.