Linux experts Seravo background Linux Debian SUSE
Seravo blog: Linux and open source – technology and strategy

The perfect Btrfs setup for a server

is probably the most modern filesystem of all widely used filesystems on Linux. In this article we explain how to use Btrfs as the only filesystem on a server machine, and how that enables some sweet capabilities, like very resilient RAID-1, flexible adding or replacing of disk drives, using snapshots for quick backups and so on.


The techniques described in this article were tested using the Ubuntu 16.04 server install, but are applicable on any system with about the same versions of btrfs, grub (2.02), the kernel (4.4) and the likes.

The hardware requirements for a btrfs based RAID-1 disk setup are very flexible. The amount of disks can be anything above two. The size of the disks in the RAID array do not need to be identical, thanks to the flexibility of btrfs RAID-1 as it works on the data level, and not just on device level like traditional mdadm does. Btrfs also includes the features traditionally provided by LVM, so Btrfs conveniently replaces both mdadm and LVM in a single easy to use tool. The best practice is to start with a setup that has 2–4 disks, and then later keep adding new disks when more space is needed, and the size of those disks is whatever is the best price/size ratio at that later time.

The Btrfs setup

When the hardware is ready, the next step is to install the operating system (=Linux). During the partitioning phase, create one big partition that fills all of the disk. There is no need to create a /boot partition nor a swap disk. For Grub compatibility reasons we need to create a real partition (eg. sda1, sdb1..) on every disk and not assign the whole disk to Btrfs, even though Btrfs would support that too. Remember to mark every primary partition (eg. sda1, sdb1..) bootable in the partition table.

After the partitioning step, select the first disk partition (e.g. sda1) as the root filesystem and use Btrfs as the filesystem type. Complete the installation and boot.

After boot you can expand the root filesystem to use all disks with the command:

btrfs device add /dev/sdb1 /dev/sdc1 /dev/sdd1 /

You can check the status of the btrfs system with btrfs fi show (fi is short for filesystem):

$ sudo btrfs fi show
Label: 'root' uuid: 31e77d75-c07d-44dd-b969-d640dfdf5f81
Total devices 4 FS bytes used 1.78GiB
devid 1 size 884.94GiB used 4.02GiB path /dev/sda1
devid 2 size 265.42GiB used 0.00B path /dev/sdb1
devid 3 size 283.18GiB used 0.00B path /dev/sdc1
devid 4 size 265.42GiB used 0.00B path /dev/sdd1

This pools the devices together and creates a big root filesystem. To make it a RAID-1 system run:

sudo btrfs balance start -v -mconvert=raid1 -dconvert=raid1 /

After this, the available disk space halves but becomes resilient against single disk failures. The read speed might also increase a bit, as data can be accesses in parallel on at least two devices.

The command btrfs fi usage is a new command that explains how disk space is used and how much might be available:

$ sudo btrfs fi usage /
 Device size: 1.66TiB
 Device allocated: 6.06GiB
 Device unallocated: 1.65TiB
 Device missing: 0.00B
 Used: 3.53GiB
 Free (estimated): 846.76GiB (min: 846.76GiB)
 Data ratio: 2.00
 Metadata ratio: 2.00
 Global reserve: 32.00MiB (used: 0.00B)

Data,RAID1: Size:2.00GiB, Used:1.69GiB
 /dev/sda1 2.00GiB
 /dev/sdc1 2.00GiB

Metadata,RAID1: Size:1.00GiB, Used:72.81MiB
 /dev/sda1 1.00GiB
 /dev/sdc1 1.00GiB

System,RAID1: Size:32.00MiB, Used:16.00KiB
 /dev/sda1 32.00MiB
 /dev/sdc1 32.00MiB

 /dev/sda1 881.90GiB
 /dev/sdb1 265.42GiB
 /dev/sdc1 280.15GiB
 /dev/sdd1 265.42GiB

By default the Linux system boot will hang if any of the devices used by the Btrfs root filesystem is missing. This is not the ideal behavior in a server environment, as we rather have the system boot and continue to operate in a degraded mode, so that services continue to work and admins can access remotely and assess the next steps.

To enable btrfs to boot in degraded mode we need to add the ‘degraded‘ mount option to two locations. First we need to make sure that Grub can mount the root filesystem and access the kernel. To to that we edit the rootflags line in /etc/grub.d/10_linux to include the option ‘degraded‘ like this:

GRUB_CMDLINE_LINUX="rootflags=degraded,subvol=${rootsubvol} ${GRUB_CMDLINE_LINUX}"

For the Grub config to take effect we need to run ‘update-grub‘ and after that install the new Grub on the master boot sector (MBR) of every disk. That can easily be scripted like this:

for x in a b c d; do sudo grub-install /dev/sd$x; done
Secondly we need to allow the Linux system to mount its filesystems in degraded more by adding the same option to /etc/fstab like this:

UUID=.... / btrfs degraded,noatime,nodiratime,subvol=@ 0 1

Note that also noatime and noadirtime have been selected, as they increase performance with the drawback of not recording access times to files or directories, but that feature is almost never used by anything, so it does not have any practical drawback.

With the setup above we now have a system with 4 disks, each disk containing one partition and those partitions are pooled together with Btrfs RAID-1. If any of the disks fail, the system will continue to operate and can also resume to operate after a reboot (thanks to mount option ‘degraded’) and it does not matter which of the disks break, as any disk is good for booting (thanks to having Grub in every disks’ MBR). If a disk failure occurs, it is up to the system administrator to detect it (e.g. from syslog) and then add a new disk and run ‘btrfs device replace...‘ as explained in our Btrfs recovery article.

Using ZRAM for swap

Note that this setup does not have any swap partitions. We can’t put a swap partition on the raw disk, as there is no redundancy on raw disk and if any of the disks fail, the swap partition and all memory stored on it would be lost and the kernel would most likely panic and halt. As btrfs RAID-1 is not a block level thing, we cannot have a swap partition on it either. We could have a swap file, but btrfs isn’t any good for keeping swap files. Our solution was not to have any traditional swap partition at all, but instead use ZRAM to store resident memory in a compressed format.

To install zram simply run:

apt install zram-config

After next reboot there will automatically be a zram device that the system uses for swapping. It does not matter how much RAM a system has, because at some point the kernel will anyway swap something our from active memory to swap to use the active memory more efficiently. Using ZRAM for swap will prevent it from going to real disk therefore make both swap out and swap in faster (though with some cost of more CPU use).

Using snapshots

Would you like to make a full system backup that does not consume any disk space? On a copy-on-write filesystem like Btrfs it is possible to create snapshots as a window into the filesystem state at a certain point in time.

A practical way to do it could be to have a directory called /snaphosts/ in the under the root filesystem and then save snapshots there at regular intervals. Using the -r option we make the snapshot read-only, which is ideal for backups.

$ sudo mkdir /snapshots
$ sudo btrfs subvolume snapshot -r / /snapshots/root.$(date +%Y%m%d-%H%M)
Create a readonly snapshot of '/' in '/snapshots/root.20160919-0954'

$ tree -L 3 /snapshots

`-- root.20160919-0954
 |-- bin
 |-- boot
 |-- dev
 |-- etc
 |-- home
 |-- initrd.img -> boot/initrd.img-4.4.0-36-generic
 |-- initrd.img.old -> boot/initrd.img-4.4.0-31-generic
 |-- lib
 |-- lib64
 |-- media
 |-- mnt


To be able to track how much disk space a snapshot uses, or more exactly to view the amount of data that changed between two snapshots, we can use Btrfs quota groups. The are not enabled by default, so start by running:

$ sudo bftrs quota enable /

After that you can view the subvolumes (snapshots) disk usage:

$ sudo btrfs qgroup show /

qgroupid rfer excl 
-------- ---- ---- 
0/5 16.00KiB 16.00KiB 
0/257 1.75GiB 47.74MiB 
0/258 48.00KiB 48.00KiB 
0/267 0.00B 16.00EiB 
0/268 48.00KiB 16.00EiB 
0/269 1.75GiB 44.95MiB

To find out which subvolume ID is mounted as what, list them with:

$ sudo btrfs subvolume list /
ID 257 gen 5367 top level 5 path @
ID 258 gen 5366 top level 5 path @home
ID 269 gen 5354 top level 257 path snapshots/root.20160919-0954

To make a subvolume the new root (after reboot) study the btrfs subvolume set-default command, and to manipulate other properties of subvolumes, see the Btrfs property command.


4 thoughts on “The perfect Btrfs setup for a server

  1. Gerrit says:

    Hi there,
    do you know any reason why I should NOT use raw devices (i.e. “/dev/sdb”) with btrfs in RAID1? Besides the GRUB compatibility which is mentioned in the article of course.
    I red somewhere that there was/is(?) a bug in the btrfs RAID implementation where you can loose data if one disk fails in a RAID1 on raw devices. Did you ever hear about that?

    Thanks for your reply.

    Best regards

    1. Greg says:

      I have been testing BTRFS in a single disk multiple partition config and have had all sorts of problems with booting and mounting even mounting with degraded read only options has proven at least for me impossible at times. I have assigned multiple partitions mounted by UUID and even with just one corrupt partition seen as a UUID device you get wrong fs, device missing errors and I have not worked out cloning either. I was hoping to run BTRFS for all my Linux machines with Windows REFS as intermediate storage while processing photos and use BTRFS for network storage for protection from corruption and bitrot. I’m using Freenas with ZFS and ECC memory for primary storage and NTFS, EXT4, BTRFS and sometimes XFS for laptops, desktop/workstations and may soon deploy REFS as intermediate storage. I don’t dedupe and get by fine with 16GB of memory and in fact pulled out one of the CPU’s on the ZFS machine since all 8 cores (two 4 cores) were idle during max samba transfer. Iscsi proved unreliable for remounting. I would suggest looking at SuSe’s sub-volume setup especially if you have a lot of writes. Test, test and then retest recovery/replacement scenarios before you even consider BTRFS for production use. I would want to see broad 3rd party tool support before mission critical deployment or at least a proven track record like ZFS.

  2. Jamie says:

    BTRFS documentation states that RAID1 “Needs at least two available devices always. Can get stuck in irreversible read-only mode if only one device is present.”(

    Specifically, degraded rw will work ONLY ONCE. Given this, would it be better NOT to mount degraded by default in order to make sure you are ready with a replacement disk?

  3. nero 50 says:

    would you show how you partitioned?

Leave a Reply

Your email address will not be published.