User stories #2 - Aqua Ray, Tier IV and XCP-ng

User story Feb 26, 2021

A new user story, and a new episode! This time, we'll explore a concrete use case of XCP-ng and Xen Orchestra used in a VPS and hosting business.

Note: all episodes can be found with the tag "user-story" at this URL.

The company

Aqua Ray is a French company created in 2003 that offers hosting services for physical or virtual servers, datacenters, infrastructure design and facilities management with strong service guarantee commitments. Aqua Ray is a privately held (unlisted) company, owned by its co-founders: Raphael Nicoud and Guillaume de Lafond: two French engineers.

There are no foreign shareholders and no French or foreign investment funds holding shares. Aqua Ray claims its technical and financial independence as an alternative to the leading GAFAMs in the sector. It thus controls its entire production chain. It owns and operates its own Tier IV datacenter in Val-de-Marne and its fiber network in the Ile-de-France region (see below). Aqua Ray offers a wide range of turnkey or customized services. The core business is to provide VPS to their customers. Its specialty is the design and maintenance of infrastructure designed to host critical web services, in particular those based on Debian Linux systems and free software.

This name might be familiar to you, because we announced a partnership a few months ago!

The datacenter

To build a solid virtualized infrastructure, you need a resilient environment. And that's exactly the point of having a Tier IV datacenter!

Tier IV means full redundancy for power, cooling and network. For each, no physical path should cross at any point, to avoid a potential simultaneous disruption. In other words, it's entirely "fault tolerant". This level also means 99.995% Guaranteed availability.

You can find the list of Tier IV DC all around the world on this map: https://uptimeinstitute.com/tier-certification/tier-certification-list

The hardware

The network stack is mainly based on Cisco Nexus switches, and routers are also Cisco but Juniper is present as well. Each host has a 2x10G network connection (bonded).

Regarding the compute machines themselves, they are SuperMicro stuff:

The stack

For the application layer, Aquaray always favors the use of Open Source projects, such as XCP-ng, Debian, Apache, HA Proxy, MariaDB and so on (HLAMP stack for their web services). This allows them to be able to easily audit the code for a better internal understanding of potential malfunctions or incidents, but also to be capable of participating in the code or contribute to projects.

They have a long track record of using Xen (pre-Citrix), then the Open Source XenServer back in 2012. And finally XCP-ng, which is the logical move to keep using an Open Source but powerful virtualization platform. Also, their infrastructure relies a lot on live storage migration capabilities, to avoid any customer service interruption.

XCP-ng is today the main hypervisor used in Aqua Ray infrastructure and therefore a critical part of it, which they trust entirely.

No shared storage

It's not an usual thing to do, but some of our users decide not to use any shared storage. Despite having hundreds of machines, they decided to rely on local SSDs. In this environment, any host can be considered as disposable. By deploying/installing XCP-ng via a PXE answer file process, they can reinstall pretty easily a machine from scratch, and move VMs around to stay on a fully up to date XCP-ng version!

Backup

They are mixing 2 different types of backup: at application level (eg MariaDB dump to dedicated backup storage), but also at the VM level. For example, rolling snapshots are used to provide a way to go back in time. It's logical to mix both of them to provide flexibility of snapshot rollback and security of application data backup.

Continuous Replication

Like us, and like how we have done countless times now, they are also leveraging XO Continuous Replication to provide a kind of "warm migration" system.

In short, when you can't live migrate for whatever reason (incompatible CPUs, network bandwidth…), you can use CR while the VM is running to send the first large full image. When it's done, you shutdown the VM and re-start the CR job. This time, only the diff (very small) will be sent, and you can boot it on the destination, with very minimal total downtime!

Obviously, they are also using Continuous Replication for customers asking for a BCP, Business Continuity Planning. So they can replicate VMs from their production racks to their "continuity planning" racks, without having to restore anything.

Cloudinit

They are also relying on Cloudinit to facilitate VM deployment from a Cloudinit-ready template. It's faster than their previous own scripts, and also entirely integrated in Xen Orchestra. If you don't know how it works, go take a look in our documentation.

Conclusion

As you can see, there's a lot of different ways to run XCP-ng in production, to organize your storage and to perform backups. That's why this series will continue to show you various use cases, so you can choose the one you need!

Tags

Marc Pezin

Along with Olivier Lambert

CMO at Vates since 2017, I have been at the forefront of shaping and advancing the communication strategies and business development initiatives for Vates Virtualization Management Stack.