# Storage in XCP-ng

Storage in XCP-ng is quite a large topic. This section is dedicated to it. Keywords are:

  • SR: Storage Repository, the place for your VM disks (VDI SR)
  • VDI: a virtual disk
  • ISO SR: special SR only for ISO files (in read only)

Please take into consideration, that Xen API (XAPI) via their storage module (SMAPI) is doing all the heavy lifting on your storage. You don't need to format drives manually.

TIP

We encourage people to use file based SR (local ext, NFS, XOSAN…) because it's easier to deal with. If you want to know more, read the rest.

# Storage types

There are two types of storage:

  • Thin Provisioned: you only use the space your VM has filled with data.
  • Thick Provisioned: you use the space of your VMs disk(s) size.

In addition to this, storage can be either local or shared between hosts of a pool.

There are storage types that are officially supported, and others that are provided as-is, in the hope that they are useful to you. Actually, we do maintain them too, but they receive less testing than the officially supported ones.

Type of Storage Repository Name Thin Provisioned Shared Storage Officially Supported
file based Local EXT X X
NFS X X X
File X X (use with caution)
XOSAN v2 X X Soon
ZFS X
XFS X
GlusterFS X X
CephFS X X
MooseFS X X
block based Local LVM X
iSCSI X X
HBA X X
Ceph iSCSI gateway X
CephRBD X

WARNING

Cost of thick provisioning is relatively high when you do snapshots (used for backup). If you can use a thin provisioned storage instead, such as Local EXT or NFS, you'll save a LOT of space.

# Local

A local SR is using a disk or a partition of your local disk, to create a space for your VM disks. Local LVM will use logical volumes, whereas Local EXT will create an ext4 filesystem and put .vhd files in it.

TIP

The concept is simple: tell XCP-ng which disk or partition you want to use, and it will do everything for you! Don't do anything yourself (no need to create a logical volume or a filesystem).

WARNING

As XCP-ng will handle everything for you, be aware that the device or partition will be formatted.

  • Don't create a SR over a device or partition that contains important data.
  • If you want to attach an existing SR to your pool, don't create a new local SR over it, else your virtual disks will be deleted. Instead, use the xe sr-introduce command. Further explanation can be found in the official XenServer documentation: https://support.citrix.com/article/CTX121896/how-to-introduce-a-local-storage-repository-in-xenserver

In Xen Orchestra:

Via xe CLI for a local EXT SR (where sdaX is a partition, but it can be the entire device e.g. sdc):

xe sr-create host-uuid=<host UUID> type=ext content-type=user name-label="Local Ext" device-config:device=/dev/sdaX

In addition to the two main, rock-solid, local storages (EXT and LVM), XCP-ng offers storage drivers for other types of local storage (ZFS, XFS, etc.).

# NFS

Shared, thin-provisioned storage. Efficient, recommended for ease of maintenance and space savings.

In Xen Orchestra, go in the "New" menu entry, then Storage, and select NFS. Follow instructions from there.

TIP

Your host will mount the top-level NFS share you provide initially (example: /share/xen), then create folder(s) inside of that, then mount those directly instead (example: /share/xen/515982ab-476e-17b7-0e61-e68fef8d7d31). This means your NFS server or appliance must be set to allow sub-directory mounts, or adding the SR will fail. In FreeNAS or TrueNAS, this checkbox is called "All dirs" in the NFS share properties.

# File

Local, thin-provisioned. Not recommended.

The file storage driver allows you to use any local directory as storage.

Example:

xe sr-create host-uuid=<host UUID> type=file content-type=user name-label="Local File SR" device-config:location=/path/to/storage

Avoid using it with mountpoints for remote storage: if for some reason the filesystem is not mounted when the SR is scanned for virtual disks, the file driver will believe that the SR is empty and drop all VDI metadata for that storage.

# XOSANv2

Shared, thin-provisioned storage.

XOSANv2 is an hyperconvergence solution. In short, your local storage are combined into a big shared storage.

TIP

XOSANv2 is coming soon in XCP-ng. Hang on!

# ZFS

Local, thin-provisioned. Available since XCP-ng 8.2.

TIP

Additional package required and available in our repositories: zfs. Then either reboot or run modprobe -v zfs to load the kernel module.

Due to the variety of parameters of ZFS, the SR driver does not automate everything. You need to create your ZFS pool and volumes yourself, e.g. on partition sda4:

zpool create -o ashift=12 -m /mnt/zfs tank /dev/sda4

Now you can create the SR on top of it:

xe sr-create host-uuid=<HOST_UUID> type=zfs content-type=user name-label=LocalZFS device-config:location=/mnt/zfs/

TIP

Please report any problems (performance or otherwise) you might encounter with ZFS. Our forum (opens new window) is here for that!

WARNING

Note: If you use ZFS, assign at least 16GB RAM to avoid swapping. ZFS (in standard configuration) uses half the Dom0 RAM as cache!

# ZFS Knowledge & status

Do not hesitate to take a look at these links for more advanced explanations:

You can monitor your ZFS pool using:

# Get the global status.
zpool status

# More info concerning the performance.
zpool iostat -v 1

# ZFS module parameters

To get the list of supported parameters, you can execute:

man zfs-module-parameters

It's possible to write/read parameters on the fly. For example:

# Read zfs_txg_timeout param.
cat /sys/module/zfs/parameters/zfs_txg_timeout
5
# Write zfs_txg_timeout param.
echo 10 > /sys/module/zfs/parameters/zfs_txg_timeout

# Better performance (advanced options)

There are many options to increase the performance of ZFS SRs:

  • Modify the module parameter zfs_txg_timeout: Flush dirty data to disk at least every N seconds (maximum txg duration). By default 5.
  • Disable sync to disk: zfs set sync=disabled tank/zfssr
  • Turn on compression (it's cheap but effective): zfs set compress=lz4 tank/zfssr
  • Disable accesstime log: zfs set atime=off tank/zfssr

Check ZFS documentation to understand the pros and cons of each optimization.

# XFS

Local, thin-provisioned storage.

TIP

Additional package required and available in our repositories: xfsprogs.

On XCP-ng before 8.2, you also need sm-additional-drivers.

Works in the same way as the Local EXT storage driver: you hand it a device and it will format it and prepare it for your VMs automatically.

Via xe CLI for a local XFS SR (where sdaX is a partition, but it can be the entire device e.g. sdc):

xe sr-create host-uuid=<host UUID> type=xfs content-type=user name-label="Local XFS" device-config:device=/dev/sdaX

# Glusterfs

Shared, thin-provisioned storage. Available since XCP-ng 8.2.

TIP

Additional package required and available in our repositories: glusterfs-server.

You can use this driver to connect to an existing Gluster storage (opens new window) volume and configure it as a shared SR for all your hosts in the pool. For example, a Gluster storage with 3 nodes (192.168.1.11, 192.168.1.12 and 192.168.1.13) and a volume name called glustervolume will be thin provisioned with the command:

xe sr-create content-type=user type=glusterfs name-label=GlusterSharedStorage shared=true device-config:server=192.168.1.11:/glustervolume device-config:backupservers=192.168.1.12:192.168.1.13

# CephFS

Shared, thin-provisioned storage. Available since XCP-ng 8.2.

WARNING

This way of using Ceph requires installing ceph-common inside dom0 from outside the official XCP-ng repositories. It is reported to be working by some users, but isn't recommended officially (see Additional packages). You will also need to be careful about system updates and upgrades.

You can use this driver to connect to an existing Ceph storage filesystem, and configure it as a shared SR for all your hosts in the pool. This driver uses mount.ceph from ceph-common package of centos-release-ceph-nautilus repo. So user needs to install it before creating the SR. Without it, the SR creation would fail with an error like below

Error code: SR_BACKEND_FAILURE_47
Error parameters: , The SR is not available [opterr=ceph is not installed],

Installation steps

# yum install centos-release-ceph-nautilus --enablerepo=extras
# yum install ceph-common --enablerepo=base

Create /etc/ceph/admin.secret with your access secret for CephFS.

# cat /etc/ceph/admin.secret
AQBX21dfVMJtBhAA2qthmLyp7Wxz+T5YgoxzeQ==

Now you can create the SR where server is your mon ip.

# xe sr-create type=cephfs name-label=ceph device-config:server=172.16.10.10 device-config:serverpath=/xcpsr device-config:options=name=admin,secretfile=/etc/ceph/admin.secret

TIP

  • For serverpath it would be good idea to use an empty folder from the CephFS instead of /.
  • You may specify serverport option if you are using any other port than 6789.
  • Do not use admin keyring for production, but make a separate key with only necessary privileges https://docs.ceph.com/en/latest/rados/operations/user-management/

# MooseFS

Shared, thin-provisioned storage. Available since XCP-ng 8.2.

MooseFS is a fault-tolerant, highly available, highly performing, scaling-out, network distributed file system. It is POSIX compliant and acts like any other Unix-like file system. SR driver was contributed directly by MooseFS Development Team.

WARNING

  • The MooseFS client is not included with XCP-ng, so it must be installed on dom0 from the official MooseFS repository.
  • By default, the MooseFS repository will be set as enabled. This means that any system update will also update the MooseFS client. Please, consider disabling the repository after installation.

Installation steps

curl "https://ppa.moosefs.com/RPM-GPG-KEY-MooseFS" > /etc/pki/rpm-gpg/RPM-GPG-KEY-MooseFS
curl "http://ppa.moosefs.com/MooseFS-3-el7.repo" > /etc/yum.repos.d/MooseFS.repo
yum install moosefs-client

TIP

  • By default, the moosefs storage driver is not enabled and must be whitelisted in XAPI's configuration.
  • The list of accepted storage drivers is defined in /etc/xapi.conf but we must never modify this file directly. Instead, copy the sm-plugins definition from it, add moosefs to the line, and write the resulting line to a new /etc/xapi.conf.d/99-enable-moosefs.conf file.
  • ⚠️ XAPI only takes the last definition of sm-plugins it reads into account. Files are read in alphabetical order. If there's already a configuration file in /etc/xapi.conf.d (to enable another additional storage driver), take it into consideration when you write your new definition of sm-plugins.

Now when the MooseFS client is installed you can connect to an existing MooseFS cluster (opens new window) and create a shared SR for all hosts in the pool.


# xe sr-create type=moosefs name-label=MooseFS-SR content-type=user shared=True device-config:masterhost=mfsmaster.host.name device-config:masterport=9421 device-config:rootpath=/xcp-ng

Basically, to connect the driver to our cluster we have to know two parameters:

  • masterhost - MooseFS master host name or IP, default mfsmaster
  • masterport - MooseFS master port, default 9421

We also suggest to use a folder on the MooseFS cluster as a root path rather than using the direct path of the cluster.

# iSCSI

Shared, thick-provisioned storage.

In Xen Orchestra, go in the "New" menu entry, then Storage, and select iSCSI. Follow instructions from there.

# HBA

Shared, thick-provisioned storage.

You can add a Host Bus Adapter (HBA) storage device with xe:

xe sr-create content-type=user shared=true type=lvmohba name-label=MyHBAStorage device-config:SCSIid=<the SCSI id>

This is great for passing through full hardware disks, such as an entire hard disk.

If you have a problem with the SCSIid, you can use this alternative, carefully selecting the right drive, and checking it's visible on all hosts with the same name:

xe sr-create content-type=user shared=true type=lvmohba name-label=MyHBAStorage device-config:device=/dev/<HBA drive>

# Ceph iSCSI gateway

WARNING

Experimental, this needs reliable testing to ensure no block corruption happens in regular use.

This is at this moment the only way to connect to Ceph with no modifications of dom0, it's possible to create multiple Ceph iSCSI gateways following this: https://docs.ceph.com/docs/master/rbd/iscsi-target-cli/ (opens new window)

Ceph iSCSI gateway node(s) sits outside dom0, probably another Virtual or Physical machine. The packages referred in the URL are to be installed on iSCSI gateway node(s). For XCP-ng dom0, no modifications are needed as it would use LVMoISCSISR (lvmoiscsi) driver to access the iSCSI LUN presented by these gateways.

For some reason the chap authentication between gwcli and XCP-ng doesn't seem to be working, so it's recommended to disable it (in case you use no authentication a dedicated network for storage should be used to ensure some security).

IMPORTANT: User had many weird glitches with iSCSI connection via ceph gateway in lab setup (3 gateways and 3 paths on each host) after several days of using it. So please keep in mind that this setup is experimental and unstable. This would have to be retested on recent XCP-ng.

# Ceph RBD

WARNING

This way of using Ceph requires installing ceph-common inside dom0 from outside the official XCP-ng repositories. It is reported to be working by some users, but isn't recommended officially (see Additional packages). You will also need to be careful about system updates and upgrades.

You can use this to connect to an existing Ceph storage over RBD, and configure it as a shared SR for all your hosts in the pool. This driver uses LVM (lvm) as generic driver and expects that the Ceph RBD volume is already connected to one or more hosts.

Known issue: this SR is not allowed to be used for HA state metadata due to LVM backend restrictions within XAPI drivers, so if you want to use HA, you will need to create another type of storage for HA metadata

Installation steps

# yum install centos-release-ceph-nautilus --enablerepo=extras
# yum install ceph-common --enablerepo=base

Create /etc/ceph/keyring with your access secret for Ceph.

# cat /etc/ceph/keyring 
[client.admin]
key = AQBX21dfVMJtJhAA2qthmLyp7Wxz+T5YgoxzeQ==

Create /etc/ceph/ceph.conf as your matching setup.

# cat /etc/ceph/ceph.conf 
[global]
mon_host = 10.10.10.10:6789

[client.admin]
keyring = /etc/ceph/keyring
rbd create --size 300G --image-feature layering pool/xen1

# Map it to all xen hosts in your pool
rbd map pool/xen1

# edit /etc/lvm/lvm.conf and /etc/lvm/master/lvm.conf on all nodes and add this option
# otherwise LVM will ignore the rbd block device
types = [ "rbd", 1024 ]

# create a shared LVM
xe sr-create name-label='CEPH' shared=true device-config:device=/dev/rbd/rbd/xen1 type=lvm content-type=user

You will probably want to configure ceph further so that the block device is mapped on reboot.

For the full discussion about Ceph in XCP-ng, see this forum thread: https://xcp-ng.org/forum/topic/4/ceph-on-xcp-ng (opens new window)

TIP

# ISO SR

You might be wondering how to upload an ISO. Unlike other solutions, you need to create a dedicated "space" for these, a specific ISO SR. To create an ISO SR, you have 2 possibilities:

  • Shared: A shared ISO SR is on a VM or on a dedicated storage server. It's accessible with an IP address, like 192.168.1.100 via SMB or NFS.
  • Local (not recommended for production): A local ISO SR is a directory created directly on the dom0 host. It's only accessible on the host where the directory was created.

# Create a Shared ISO SR

First, you need to create the NFS or SMB Share. There are plenty of options: from dedicated NAS hardware solutions and dedicated software solutions such as TrueNAS, to manual administration on any Linux/unix or Windows system.

You can find some tutorials on the internet to create an NFS Server, for exemple here https://wiki.linux-nfs.org/wiki/index.php/NFS_Howto (opens new window) or here https://wiki.archlinux.org/title/NFS (opens new window)

Then, in Xen Orchestra go into "New/Storage" and select "ISO SR":

# Create a Local ISO SR

From the CLI:

  1. Create a directory on the local filesystem to storage your ISOs
  2. Copy/move ISOs to this new location
  3. Create the ISO SR using xe sr-create
  4. You can add or update ISOs later by placing them into the directory you created in step 1
  5. Rescan the SR if you change the files stored in the ISO directory

Here's an example of how to create a Local ISO SR named "ISO Repository" that will be stored in /opt/var/iso_repository:

mkdir -p /opt/var/iso_repository

xe sr-create name-label="ISO Repository" type=iso device-config:location=/opt/var/iso_repository device-config:legacy_mode=true content-type=iso
a6732eb5-9129-27a7-5e4a-8784ac45df27 # this is the output

xe sr-scan uuid=a6732eb5-9129-27a7-5e4a-8784ac45df27

If your host is in a pool of several hosts, you need to add the host-uuid parameter to the xe sr-create command above. You can retrieve the host UUID with xe host-list.

You can then upload your ISO in /opt/var/iso_repository/

On Xen Orchestra, go into "New/Storage" and select "ISO SR"

  • Select "Local" instead of NFS/SMB
  • Enter the path created before
  • Upload ISOs on your host to the same path

WARNING

A local ISO SR will only be available on the host where it was created. Also, the dom0 filesystem is small with only about 15gb of space free for extra storage!

That's it!

TIP

Don't forget to rescan your SR after adding, changing, or deleting ISO files. Rescan is done automatically every 10 minutes otherwise.

# Storage API

Current storage stack on XCP-ng is called SMAPIv1. The VHD format is used, which has a maximum file size limitation of 2TiB. This means that when using this format your VM disk can't be larger than 2TiB.

# Why use VHD format?

Mostly for historical reasons. When standardization on VHD (opens new window) was decided, it was the only acceptable format that supported copy on write (opens new window), delta capabilities, and merge possibilities. Thanks to VHD format, you have:

  • snapshot support
  • delta backup
  • fast clone VM
  • live storage migration

# Using RAW format

Alternatively, you can decide to use a disk without 2TiB limitation, thanks to RAW format. However, the price to pay is to lose all VHD features.

To create a large VDI on a file based SR, it's trivial, for example:

xe vdi-create type=user sm-config:type=raw virtual-size=5TiB sr-uuid=<SR_UUID> name-label=test

On a block based storage, it's a bit more complicated:

  1. Create a small disk first: xe vdi-create type=user sm-config:type=raw virtual-size=1GiB sr-uuid=<SR_UUID> name-label=test
  2. Extend it with lvextend -L+5T /dev/VG_<whateverUUID>/LV-<VDI_UUID>
  3. Rescan SR

WARNING

You won't be able to live migrate storage on this disk or snapshot it anymore. Outside of this, it will work very well.

# SMAPIv3: the future

SMAPIv1 is the historical storage interface, and now a big spaghetti monster. That's why Citrix decided to create a new one, called SMAPIv3: it's far more flexible, and also support (partially) the qcow2 format. This format has the same concepts as VHD, but without its limitations.

Also, the storage API is far more agnostic and the code is better. So what's the catch? Problem is there's no Open Source implementation of SMAPIv3, also the current API state isn't really complete (doesn't support a lot of features). However, XCP-ng team is working on it too, because it's clearly the future!

# Coalesce

Coalesce process is an operation happening in your hosts as soon a snapshot is removed.

When you make a snapshot, a "base copy" is created (in read only), the "active" disk will live its own life, same for the freshly created snapshot. Example here: A is the parent, B the current/active disk and C is the snapshot:

That's OK. But what about creating a new snapshot on B after some data are written?

You got this:

When you make XO backup on regular basis, old/unused snapshots will be removed automatically. This will also happen if you create/delete snapshots manually. So in our case, C will disappear. And without this snapshot, XCP-ng will coalesce A and B:

This process will take some time to finish (especially if you VM stays up and worst if you have a lot of writes on its disks).

What about creating snapshot (ie call backup jobs) faster than XCP-ng can coalesce? Well, the chain will continue to grow. And more you have disks to merge, longer it will take.

You will hit a wall, 2 options here:

  • if your VM disks are small enough, you could reach the max chain length (30).
  • if your VM disks are big, you'll hit the SR space limit before.

# Xen Orchestra protection

Luckily, Xen Orchestra is able to detect an uncoalesced chain. It means it won't trigger a backup job for a VM if its disks are not coalesced yet: it will be skipped.

But more than that, Xen Orchestra is also able to show you uncoalesced disk in the SR view, in the Advanced tab.

More about this exclusive feature on https://xen-orchestra.com/blog/xenserver-coalesce-detection-in-xen-orchestra/ (opens new window)

# Modify an existing SR connection

The link between a host and an SR is called the PBD. A PBD basically stores how to access a storage repository (like the path to the drive or to an NFS share).

If you want to change how an SR is accessed (for example, if your NFS SR changed its IP), you must destroy and recreate the PBD with the new values. Let's use our example where an NFS SR has changed to a new IP:

  1. Double check you don't have running VMs on this SR. This is crucial as this operation cannot be performed live.
  2. Get the SR UUID (in XO, SR view, click on your NFS SR, the UUID is visible then)
  3. On your host console/terminal, find all the PBD UUIDs for this SR: xe sr-param-get param-name=PBDs uuid=<SR UUID>
  4. For each PBD UUID, run xe pbd-param-list uuid=<PBD UUID> and copy the output to a text editor so you have them "saved" elsewhere. Each record has the host UUID and SR UUID, which will be needed to recreate them. It will also contain the device-config, which is required to indicate how to access it (the NFS path).
  5. Now you need to edit this device-config field with the new values. In our example, I will change my device-config from serverpath: /mnt/xen; server: 192.168.1.2 to serverpath: /mnt/xen; server: 192.168.1.5 to reflect the new NFS IP. Have this text ready for the next commands.
  6. Remove each of these old PBDs with xe pbd-destroy uuid=<PBD UUID>.
  7. Recreate each of them using your new device-config info by running xe pbd-create host-uuid=<HOST UUID> sr-uuid=<SR UUID> device-config:<YOUR NEW CONFIG>
  8. When you're done and all PBDs are recreated, you can reconnect (in XO, SR view, "reconnect to all hosts" or do a xe pbd-plug uuid=<PBD UUID for each of them). Once reconnected, you can start your VMs as if nothing happened.