Search

Debian 7 - Setup ZFS with RAIDZ pool on your Linux Server

Contents[Hide]

dropcap-debian-zfsonlinux

Few days after the release of Debian Wheezy, I decided to install a brand new HP N54L micro server to use it as a small company NAS cum Server and to use ZFS filesystem for the data storage. As I wanted to fully manage my system, I decided not to go for a NAS distribution, but to use a plain Debian install and to setup a ZFS filesystem.

ZFS is a filesystem originally developed by Sun for Solaris OS. It has been ported to Linux by the zfs on linux project.

Its most interesting functionnalities are :

  • convergence of filesystem and volume manager
  • software raidz (software raid5 equivalent)
  • online data compression
  • snapshots

This filesystem is so simple, efficient and advanced, that I'm sure it will become a Linux standard very, very soon. Other FS may become part of the past sooner than expected ...

This guide explains how to install and configure a ZFS RAIDZ pool, how to setup snapshots and how to handle its day to day maintenance. A pre-requisite is to run Debian Wheezy server with a separate system disk (ZFS won't be on the boot device).

It doesn't explain in detail all ZFS options and possibilities, but it explains all the steps to get a fully running zfs raidz pool that will give you the flexibility of a professional grade NAS at the cost of a geek tool box :-)

1. Prerequisite

ZFS is declared stable only on 64bits OS.

So it is highly advisable to install it on a Debian Wheezy Amd64 system.

2. Why RAIDZ & How Many Drives ?

With ZFS filesystem, RAIDZ is very popular as it gives the best tradeoff of hardware failure protection versus storage.

It is very similar to RAID5, but without the write-hole penalty that RAID5 encounters. It is popular for storage archives or light traffic data.

To populate a RAIDZ pool, you need a minimum of 3 drives (same as RAID5).

But it is important to know that one main limitation of RAIDZ is that you can't add a new drive to an existing pool. You can only add a new pool (3 drives minimum) to an existing one.

So, if your server can accommodate 4 or 5 drives, it may be important to declare your RAIDZ pool with the maximum number of drives straight from the beginning. This can save you from some extension headaches ...

3. Install Packages

zfs on linux is not available from debian repositories.

But the project is providing its own repository. So, you just need to declare this repository and the key used to sing the packages to install the needed packages :

Terminal
# wget http://archive.zfsonlinux.org/debian/pool/main/z/zfsonlinux/zfsonlinux_4_all.deb
# dpkg -i zfsonlinux_4_all.deb
# wget http://zfsonlinux.org/4D5843EA.asc -O - | apt-key add -
# apt-get update
# apt-get install debian-zfs parted ntfs-3g mountall

Now that ZFS is installed, we can check that it is operational and that there is still no pool declared !

Terminal
# zpool status
no pools available

4. ZFS Partitioning

There are different /dev names that can be used when creating a ZFS pool.

Each option has advantages and drawbacks, the right choice for your ZFS pool really depends on your requirements.

  • /dev/sdX : Best for development/test pools as names are not persistent.
  • /dev/disk/by-id/: Nice for small systems with a single disk controller, allows to mix disks without import problem.
  • /dev/disk/by-path/: Good for large pools as name describes the PCI bus number, enclosure name and port number.
  • /dev/disk/by-vdev/: Best for large pools, but relies on having a /etc/zfs/vdev_id.conf file properly configured for your system.

In my NAS environment with a 4 disk controller, I will create the pool by disk ID. This will allow to interchange disk in slots without any side effect.

You can list the SATA drives (here ID starts by ata-WDC_) that will be added to the RAIDZ pool.

Terminal
# ls -l /dev/disk/by-id/*
...
lrwxrwxrwx 1 root root 9 mai 31 21:55 /dev/disk/by-id/ata-WDC_WD20EFRX-68AX9N0_WD-WMC301384865 -> ../../sdb
lrwxrwxrwx 1 root root 9 mai 31 21:56 /dev/disk/by-id/ata-WDC_WD20EFRX-68AX9N0_WD-WMC301555904 -> ../../sda
lrwxrwxrwx 1 root root 9 mai 31 21:56 /dev/disk/by-id/ata-WDC_WD20EFRX-68AX9N0_WD-WMC301557680 -> ../../sdd
lrwxrwxrwx 1 root root 9 mai 31 21:56 /dev/disk/by-id/ata-WDC_WD20EFRX-68AX9N0_WD-WMC301571260 -> ../../sdc
...

If your drives have a capacity of more than 2 Tb, the full size won't be recognised with classical partition formats.

To go beyond that limit, you will need to use a GUID Partition Table scheme.

So, to be on the safe side, it's better to use GPT whatever is your disk size. You can convert your drives to GPT by using parted.

Terminal
# parted /dev/disk/by-id/ata-WDC_WD20EFRX-68AX9N0_WD-WMC301384865 mklabel gpt
# parted /dev/disk/by-id/ata-WDC_WD20EFRX-68AX9N0_WD-WMC301555904 mklabel gpt
# parted /dev/disk/by-id/ata-WDC_WD20EFRX-68AX9N0_WD-WMC301557680 mklabel gpt
# parted /dev/disk/by-id/ata-WDC_WD20EFRX-68AX9N0_WD-WMC301571260 mklabel gpt

Your disks are now ready to be added to a RAIDZ pool.

5. Pool creation

Next step is to create a ZFS pool using RAIDZ format.

This format is roughly equivalent to RAID5. It needs a minimum of 3 drives and it can handle a single drive failure in the pool.

Here, we will create a RAIDZ pool named naspool with a 4 disks array.

During the pool creation, option ashift=12 is very important as it declares that the drives are using sectors of 4096 bytes (which is the case for most modern high capacity drives).

It allows to increase performance for these drives. This option can not be set after the pool creation.

After creating the pool, there are two important options to set :

  • atime=off disables access time recording. It slightly increases the disk performance.
  • dedup=off disables deduplication. Deduplication can save some of your disk space, but at a very high cost in terms of RAM. Even if that ratio is not the exact one, you have to provide around 4 Gb of RAM per Tb of available space in the RAIDZ pool. Some detailed informations are available from this Oracle article.

Terminal
# zpool create -m none -o ashift=12 naspool raidz ata-WDC_WD20EFRX-68AX9N0_WD-WMC301384865 ata-WDC_WD20EFRX-68AX9N0_WD-WMC301555904 ata-WDC_WD20EFRX-68AX9N0_WD-WMC301557680 ata-WDC_WD20EFRX-68AX9N0_WD-WMC301571260
# zfs set atime=off naspool
# zfs set dedup=off naspool
# zpool list
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
naspool 7,25T 1,02M 7,25T 0% 1.00x ONLINE -

One of the main limitation of RAIDZ is that you can not add one more drive to the array.
You can replace your drives with some higher capacity ones, but you can't add a drive.
So make sure that you declare the maximum number of drives to your RAIDZ array from the day one.

6. Mount Filesystem

A data filesystem can be created in the newly created pool & can be mounted under /mnt/data.

This filesystem will inherite from all the characteristics of naspool.

Terminal
# mkdir /mnt/data
# zfs create naspool/data
# zfs set mountpoint=/mnt/data naspool/data

The ZFS filesystem has been mounted manually under /mnt/data.

Terminal
# df
Sys. fich. 1K-blocks Util. Disponible Uti% Monté sur
rootfs 58998956 1173288 54828628 3% /
udev 10240 0 10240 0% /dev
tmpfs 192948 240 192708 1% /run
/dev/disk/by-uuid/e41e5f76-4b07-4e86-a6c7-00bef3b4720e 58998956 1173288 54828628 3% /
tmpfs 5120 0 5120 0% /run/lock
tmpfs 901760 0 901760 0% /run/shm
naspool/data 5567642752 256 5567642496 1% /mnt/data

As the mount path has been declared to ZFS, it can also be automatically mounted during boot time and unmounted during shutdown.

This can be done thru the ZFS configuration file :

/etc/default/zfs
...
ZFS_MOUNT='yes'
...
ZFS_UNMOUNT='yes'
...

Every ZFS filesystem with a known mount point will now be automatically mounted during the server boot process.

7. Setup Snapshots

Snapshot is one of the ZFS killer feature.

Snapshots allow to freeze your files content at any time and at almost no extra cost in term of space.

They can become a saving feature in case of file corruption or file deletion. They will allow you to easily retreive previous versions of your precious files.

Some detailed explainations about snapshot feature are available from Oracle site.

Under Solaris, Time Slider is providing the mechanisms for snapshots automation. But at the time of this article, it is still not available under zfsonlinux.

As it happens most of the time in the Open Source community, a project has come to circumvent this lack : zfs-auto-snapshot.

It extends zfsonlinux by providing a script that allows to handle standard snapshot periodicities (hourly, daily, ...) and provides auto-rotate mechanism.

It is a very simple and efficient replacement for Time Slider.

7.1. Install zfs-auto-snapshot

As the complete zfs-auto-snapshot feature is handle by a single script, first step is to download and install this script from the project site.

Terminal
# wget -O /usr/local/sbin/zfs-auto-snapshot.sh https://raw.github.com/zfsonlinux/zfs-auto-snapshot/master/src/zfs-auto-snapshot.sh
# chmod +x /usr/local/sbin/zfs-auto-snapshot.sh

7.2. Configure Snapshots

One parameter in the script allow to set the snapshot directory name. As for most users the term zfs-auto-snap doesn't mean much, we can change it to use the term backup.

/usr/local/sbin/zfs-auto-snapshot.sh

# Set default program options.
...
opt_prefix='backup'
...

You can now :

  • declare the snapshot as enabled thru a hidden property (it's not compulsory under zfsonlinux ...)
  • make the directory .zfs visible (to get direct access to the snapshots when showing hidden files).

Terminal
# zfs set com.sun:auto-snapshot=true naspool/data
# zfs get all naspool/data | grep auto-snapshot
naspool/data com.sun:auto-snapshot true local
# zfs set snapdir=visible naspool/data

7.3. Automatic Snapshots

Now that auto snapshot feature is enabled, you can activate the hourly, daily and weekly snapshots.

The simplest option is to use the default cron.hourly, cron.daily and cron.weekly mechanism provided by Debian.

Because of a Debian limitation, your cron scripts name should strictly respect these rules :

  • first letter should be [a-z] or [0-9]
  • all following letters should be [a-z], [0-9] or -

It means that your script names should never include . or they will be simply ignored by crontab.

Here are some scripts samples to handle the foloowing snapshots policy :

  • every hour with a 1 day retention
  • every day with a 1 month retention
  • every week with a 2 month retention

/etc/cron.hourly/zfs-snapshot-hourly
#!/bin/bash

# set PATH
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin

# set filesystem name
ZFS_FS="naspool/data"

# run snapshot
zfs-auto-snapshot.sh --quiet --syslog --label=hourly --keep=24 "$ZFS_FS"

/etc/cron.daily/zfs-snapshot-daily
#!/bin/bash

# set PATH
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin

# set filesystem name
ZFS_FS="naspool/data"

# run snapshot
zfs-auto-snapshot.sh --quiet --syslog --label=daily --keep=31 "$ZFS_FS"

/etc/cron.weekly/zfs-snapshot-weekly
#!/bin/bash

# set PATH
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin

# set filesystem name
ZFS_FS="naspool/data"

# run snapshot
zfs-auto-snapshot.sh --quiet --syslog --label=weekly --keep=8 "$ZFS_FS"

All files should be executable.

Terminal
# wget -O /etc/cron.hourly/zfs-snapshot-hourly https://raw.githubusercontent.com/NicolasBernaerts/debian-scripts/master/zfs/zfs-snapshot-hourly
# wget -O /etc/cron.daily/zfs-snapshot-daily https://raw.githubusercontent.com/NicolasBernaerts/debian-scripts/master/zfs/zfs-snapshot-daily
# wget -O /etc/cron.weekly/zfs-snapshot-weekly https://raw.githubusercontent.com/NicolasBernaerts/debian-scripts/master/zfs/zfs-snapshot-weekly
# chmod +x /etc/cron.hourly/zfs-snapshot-hourly
# chmod +x /etc/cron.daily/zfs-snapshot-daily
# chmod +x /etc/cron.weekly/zfs-snapshot-weekly

They will now be run automatically by your server cron.

After some time, you can list the available snapshots :

Terminal
# zfs list -t snapshot
NAME USED AVAIL REFER MOUNTPOINT
naspool/data@backup_daily-2013-06-18-0625 291K - 12,5G -
naspool/data@backup_daily-2013-06-19-0625 2,06M - 12,5G -
naspool/data@backup_daily-2013-06-20-0626 296K - 12,6G -
naspool/data@backup_daily-2013-06-21-0628 1,33M - 12,6G -
naspool/data@backup_daily-2013-06-22-0627 0 - 12,6G -
naspool/data@backup_daily-2013-06-23-0628 0 - 12,6G -
naspool/data@backup_weekly-2013-06-23-0647 0 - 12,6G -
naspool/data@backup_daily-2013-06-24-0628 0 - 12,6G -
naspool/data@backup_daily-2013-06-25-0628 500K - 12,6G -
naspool/data@backup_daily-2013-06-26-0628 2,50M - 12,6G -
naspool/data@backup_hourly-2013-06-26-2317 0 - 12,6G -
naspool/data@backup_hourly-2013-06-27-0017 0 - 12,6G -
naspool/data@backup_hourly-2013-06-27-0117 0 - 12,6G -

You can notice that I get here a 7Mb snapshot overhead for 13 snapshots on a 12.6Gb resource ...

8. Periodic Scrub

A scrub is a specific ZFS procedure used for checking ZFS File System Integrity.

It is one of the most important regular maintenance operations that should be performed on a ZFS pool.

A scrub reads all the data from the drive and writes it to another location in the storage pool. This operation will detect unreadable sectors and consistency problems.

It is important to run it regularly to ensure that you'll be aware of problems when they develop and before it is becoming too late.

Some detailed explainations about the scrub process are available from Oracle site.

The scrub command returns immediately. You can check progression thru the status command.

Terminal
# zpool scrub naspool
# zpool status
pool: naspool
state: ONLINE
scan: scrub in progress since Tue Jun 11 08:56:05 2013
52,3G scanned out of 2,62T at 367M/s, 2h2m to go
0 repaired, 1,95% done
config:

NAME STATE READ WRITE CKSUM
naspool ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
ata-WDC_WD20EFRX-68AX9N0_WD-WMC301384865 ONLINE 0 0 0
ata-WDC_WD20EFRX-68AX9N0_WD-WMC301555904 ONLINE 0 0 0
ata-WDC_WD20EFRX-68AX9N0_WD-WMC301557680 ONLINE 0 0 0
ata-WDC_WD20EFRX-68AX9N0_WD-WMC301571260 ONLINE 0 0 0

errors: No known data errors

A good habit is to run a scub once a week.

/etc/cron.weekly/zfs-scrub-weekly
#!/bin/bash

# set PATH
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin

# set pool name
ZFS_POOL="naspool"

# start scrub
zpool scrub "$ZFS_POOL"

Make the file executable :

Terminal
# wget -O /etc/cron.weekly/zfs-scrub-weekly https://raw.githubusercontent.com/NicolasBernaerts/debian-scripts/master/zfs/zfs-scrub-weekly
# chmod +x /etc/cron.weekly/zfs-scrub-weekly

Your weekly maintenance is now planned.

Once this has been run once and it is over, status command will give you a report :

Terminal
# zpool status
pool: naspool
state: ONLINE
scan: scrub repaired 0 in 2h55m with 0 errors on Tue Jun 11 11:51:52 2013
config:

NAME STATE READ WRITE CKSUM
naspool ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
ata-WDC_WD20EFRX-68AX9N0_WD-WMC301384865 ONLINE 0 0 0
ata-WDC_WD20EFRX-68AX9N0_WD-WMC301555904 ONLINE 0 0 0
ata-WDC_WD20EFRX-68AX9N0_WD-WMC301557680 ONLINE 0 0 0
ata-WDC_WD20EFRX-68AX9N0_WD-WMC301571260 ONLINE 0 0 0

errors: No known data errors

9. Handle next update

Under Debian, during the update of ZFSonLinux, some ZFS libraries are going thru some major version upgrade.

And in that case, for these new libraries to replace previous ones, you need to upgrade your system with dist-upgrade (instead of a classic upgrade).

If you fail to do so, you'll be missing some libraries after next reboot ... and your precious data handled by ZFSonLinux partitions won't be accessible !

So, to be on the safe side, always upgrade your system with :

Terminal
# apt-get update
# apt-get dist-upgrade

 

Your Debian server is now running a RAIDZ ZFS data pool that will provide you with some very high end filesystem features.

Hope it helps.

Signature Technoblog

This article is published "as is", without any warranty that it will work for your specific need.
If you think this article needs some complement, or simply if you think it saved you lots of time & trouble,
just let me know at This email address is being protected from spambots. You need JavaScript enabled to view it.. Cheers !

icon linux icon debian icon apache icon mysql icon php icon piwik icon googleplus