XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    XOSTOR hyperconvergence preview

    Scheduled Pinned Locked Moved XOSTOR
    446 Posts 47 Posters 479.0k Views 48 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • B Offline
      BenHuzo @olivierlambert
      last edited by

      @olivierlambert Thank you, I have many questions - is there a call/demo you could do?

      1 Reply Last reply Reply Quote 0
      • olivierlambertO Offline
        olivierlambert Vates 🪐 Co-Founder CEO
        last edited by

        Go there and ask for a preview access on your hardware: https://vates.tech/contact/

        B 1 Reply Last reply Reply Quote 0
        • B Offline
          BenHuzo @olivierlambert
          last edited by

          @olivierlambert Thank you for pointing that direction! I went ahead and made a request.

          1 Reply Last reply Reply Quote 0
          • J Offline
            JensH
            last edited by JensH

            I am working for years with XenServer/Citrix Hypervisor and Citrix products like Virtual Apps.
            Meanwhile I also have XCP-NG running on an test server for a while.
            Well, I decided now to build a new small cluster with XCP-NG. One reason is also the XOSTOR option.

            This new pool is planned with 3 nodes and multiple SSD disks (not yet NVMe) in each host.
            I am wondering how XOSTOR creates the LV on a VG with let's say 4 physical drives:
            Will it be a linear LV? Is there any option for striping or other raid levels available/planned?

            Looking forward to your reply.
            Thanks a lot for all the good work in a challenging environment.

            1 Reply Last reply Reply Quote 0
            • olivierlambertO Offline
              olivierlambert Vates 🪐 Co-Founder CEO
              last edited by

              We don't need/want to have RAID levels or things like this, since it's already replicated to other hosts, this will make it too redundant. So it will be like a linear LV, yes 🙂

              J 1 Reply Last reply Reply Quote 0
              • J Offline
                JensH @olivierlambert
                last edited by

                @olivierlambert thank you for the quick answer.
                To be on the real safe side this means then a replication count not lower than 3 would be useful (from my perspective).

                What would happen if a node of a 3 node cluster with replication count 3 (so all nodes have a copy) fails?
                Would everything stop because replication count is higher than available nodes?
                (I refer to post https://xcp-ng.org/forum/post/54086)

                ronan-aR 1 Reply Last reply Reply Quote 0
                • ronan-aR Offline
                  ronan-a Vates 🪐 XCP-ng Team @JensH
                  last edited by ronan-a

                  @JensH No. You can continue to use your pool. New resources can still be created and LINSTOR can sync volumes when the connection to the lost node is recreated.

                  As long as there is no split brain, and you have 3 hosts online, it's ok, that's why we recommend using 4 machines.
                  With a pool of 3 machines, and if you lose a node, you increase the risk of split brain on a resource but you can continue to create and use them.

                  1 Reply Last reply Reply Quote 0
                  • olivierlambertO Offline
                    olivierlambert Vates 🪐 Co-Founder CEO
                    last edited by

                    Also, keep in mind the LINSTOR put things in read only as soon you are under your replication target.

                    It means, on a 3 hosts scenario:

                    • if you have a replication 3, any host that is unreachable will trigger read only on the 2 others
                    • if you have a replication 2, you can lose one host without any consequence

                    So for 3 machines, replication 2 is a sweet spot in terms of availability.

                    1 Reply Last reply Reply Quote 0
                    • W Offline
                      Wilken
                      last edited by

                      Hi,

                      I've run the install script on a XCP-ng 8.2.1 host. The output of the following command:

                      rpm -qa | grep -E "^(sm|xha)-.linstor."

                      sm-2.30.8-2.1.0.linstor.5.xcpng8.2.x86_64

                      xha-10.1.0-2.2.0.linstor.1.xcpng8.2.x86_64

                      is missing, because it is already installed in version:

                      xha-10.1.0-2.1.xcpng8.2.x86_64

                      from XCP-ng itself.

                      Is this packace still needed from the linstor repo?
                      Should I uninstall it an re-run the install script?

                      BR,
                      Wilken

                      ronan-aR 1 Reply Last reply Reply Quote 0
                      • olivierlambertO Offline
                        olivierlambert Vates 🪐 Co-Founder CEO
                        last edited by

                        question for @ronan-a

                        1 Reply Last reply Reply Quote 0
                        • ronan-aR Offline
                          ronan-a Vates 🪐 XCP-ng Team @Wilken
                          last edited by

                          @Wilken The modified version of the xha package is no longer needed. You can use the latest version without the linstor tag.

                          It's not necessary to reinstall your XOSTOR SR.

                          1 Reply Last reply Reply Quote 0
                          • W Offline
                            Wilken
                            last edited by

                            Thank you @olivierlambert and @ronan-a for the quick answer and clarification!

                            BR,
                            Wilken

                            1 Reply Last reply Reply Quote 0
                            • AtaxyaNetworkA AtaxyaNetwork referenced this topic on
                            • G Offline
                              gb.123
                              last edited by

                              @ronan-a

                              Hi !
                              Before I test this, I have a small question:
                              If the VM is encrypted, and XOSTOR SR is enabled, is the VM + Memory replicated or just the VDI ?
                              Once the 1st node is down, will the 2nd node take over as-is or will the 2nd node go to 'boot' stage where is asks for decryption password ?

                              Thanks

                              ronan-aR 1 Reply Last reply Reply Quote 0
                              • ronan-aR Offline
                                ronan-a Vates 🪐 XCP-ng Team @gb.123
                                last edited by

                                @gb-123 How the VM is encrypted? Only the VDIs are replicated.

                                G 1 Reply Last reply Reply Quote 0
                                • G Offline
                                  gb.123 @ronan-a
                                  last edited by gb.123

                                  @ronan-a

                                  VMs would be using LUKS encryption.

                                  So if only VDI is replicated and hypothetically, if I loose the master node or any other node actually having the VM, then I will have to create the VM again using the replicated disk? Or would it be something like DRBD where there are actually 2 VMs running in Active/Passive mode and there is an automatic switchover ? Or would it be that One VM is running and the second gets automatically started when 1st is down ?

                                  Sorry for the noob questions. I just wanted to be sure of the implementation.

                                  Maelstrom96M 1 Reply Last reply Reply Quote 0
                                  • Maelstrom96M Offline
                                    Maelstrom96 @gb.123
                                    last edited by

                                    @gb-123 said in XOSTOR hyperconvergence preview:

                                    @ronan-a

                                    VMs would be using LUKS encryption.

                                    So if only VDI is replicated and hypothetically, if I loose the master node or any other node actually having the VM, then I will have to create the VM again using the replicated disk? Or would it be something like DRBD where there are actually 2 VMs running in Active/Passive mode and there is an automatic switchover ? Or would it be that One VM is running and the second gets automatically started when 1st is down ?

                                    Sorry for the noob questions. I just wanted to be sure of the implementation.

                                    The VM metadata is at the pool level, meaning that you wouldn't have to re-create the VM if the current VM host has a failure. However, memory can't/isn't replicated in the cluster, unless you're doing a live migration which would temporarily replicate the VM memory to the new host, so it can be moved.

                                    DRBD only replicates the VDI, or in other terms, the disk data across the active Linstor members. If the VM is stopped or is terminated because of host failure, you should be able to start it back up on another host in your pool, but by default, this will require manual intervention to start the VM, and will require you to input your encryption password since it will be a cold boot.

                                    If you want the VM to automatically self-start in case of failure, you can use the HA feature of XCP-ng. This wouldn't solve your issue with having to input your encryption password since, like explain earlier, the memory isn't replicated, and it would cold boot from the replicated VDI. Also, keep in mind that enabling HA adds maintenance complexity and might not be worth it.

                                    G 1 Reply Last reply Reply Quote 3
                                    • G Offline
                                      gb.123 @Maelstrom96
                                      last edited by

                                      @Maelstrom96

                                      Thanks for your clarification !
                                      I was thinking of testing HA with XOSTOR (If at all that is possible). XOSTOR would also be treated as 'Shared SR' I guess ?

                                      ronan-aR 1 Reply Last reply Reply Quote 0
                                      • ronan-aR Offline
                                        ronan-a Vates 🪐 XCP-ng Team @gb.123
                                        last edited by

                                        @gb-123 Yes, use the shared flag as in the sr-create example of the first post, and you can activate the HA like any shared SR.

                                        1 Reply Last reply Reply Quote 0
                                        • olivierlambertO olivierlambert referenced this topic on
                                        • SwenS Offline
                                          Swen
                                          last edited by

                                          hi @ronan-a,
                                          we did some performance testing with the latest version and we run into a bottleneck we are unable to identify in detail.

                                          Here is our setup:
                                          Dell R730
                                          CPU: 2x Intel E5-2680v4
                                          RAM: 384GB
                                          Storage: 2x NVMe Samsung PM9A3 3.84TB via U.2 PCIe 3 x16 Extender Card
                                          NICs: 2x 10G Intel, 2x 40G Intel

                                          We have 3 servers with the same configuration and installed them as a cluster with replica count of 2.
                                          xcp-ng 8.2 with latest patches is installed. All servers are using the same switch (2x QFX5100-24Q, configured as virtual chassis). We are using a LACP bond on the 40G interfaces.

                                          When using the 10G interfaces (xcp-ng is using those interfaces as management interfaces) for linstor traffic we run into a cap on the nic bandwith of around 4 Gbit/s (500MB/s).
                                          When using the bonded 40G interfaces the cap is around 8 Gbit/s (1000MB/s)

                                          Only 1 VM is installed on the pool. We are using Ubuntu 22.04 LTS with latest updates installed from ISO using the template for Ubuntu 20.04.

                                          Here is the fio command we are using:
                                          fio --name=a --direct=1 --bs=1M --iodepth=32 --ioengine=libaio --rw=write --filename=/tmp/test.io --size=100G

                                          I would expect far more because we do not hit any known bottleneck of interfaces, NVMe or PCIe slot. Do I miss something? Is this expected performance? If not, any idea what the bottleneck is? Does anybody have some data we can compare with?

                                          regards,
                                          Swen

                                          1 Reply Last reply Reply Quote 0
                                          • olivierlambertO Offline
                                            olivierlambert Vates 🪐 Co-Founder CEO
                                            last edited by olivierlambert

                                            1. use iodepth of 128
                                            2. use 4 process at the same time (numjobs=4)
                                            3. use io_uring if you can in the guest (and not libaio)
                                            4. don't use a test file but bench directly on a non-formatted device (like /dev/xvdb), this removes the filesystem layer

                                            With those settings in fio, I can reach near 2600MiB/s in read, and 900MiB/s in write with 4x virtual disks in mdadm RAID0, in the guest (a test VM on Debian 12), on rather "old" Xeon CPUs and a PCIe 3 ports on an consumer grade NVMe SSD.

                                            Also, latest thing to know: if you use thin pro, you need to run the test twice, the first run (while the VHD is growing), it's always slower. And this is not a problem in real life, you can run twice or 3 times and check the result your tests, without counting the first.

                                            I'm about to get more recent hardware (except the NVMe) to re-run some tests this week. But as you can see, you can go over a 20G network (I'm using a 25G NIC)

                                            SwenS 1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post