XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    XOSTOR hyperconvergence preview

    Scheduled Pinned Locked Moved XOSTOR
    446 Posts 47 Posters 478.9k Views 48 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • olivierlambertO Offline
      olivierlambert Vates πŸͺ Co-Founder CEO
      last edited by ronan-a

      XOSTOR - Tech preview

      ⚠ Installation script is only compatible with XCP-ng 8.2

      ⚠ UPDATE to sm-2.30.7-1.3.0.linstor.7 ⚠

      Please read this: https://xcp-ng.org/forum/topic/5361/xostor-hyperconvergence-preview/224?_=1679390249707

      ⚠ UPDATE from an older version (before sm-2.30.7-1.3.0.linstor.3) ⚠

      Please read this: https://xcp-ng.org/forum/topic/5361/xostor-hyperconvergence-preview/177?_=1667938000897


      XOSTOR is a "disaggregated hyperconvergence storage solution". In plain English: you can assemble local storage of multiple hosts into one "fake" shared storage.

      The key to get fast hyperconvergence is to try a different approach. We used GlusterFS for XOSAN, and it wasn't really fast for small random blocks (due to the nature of the global filesystem). But in XOSTOR, there's a catch: unlike traditional hyperconvergence, it won't create a global clustered and shared filesystem. This time, when you'll create a VM disk, it will create a "resource", that will be replicated "n" times on multiple hosts (eg twice or 3 times).

      So in the end, the number of resources depends on the VM disk numbers (and snapshots).

      The technology we use is not invented from scratch, we are using LINSTOR from LINBIT, based itself on DRBD. See https://linbit.com/linstor/

      For you, it will be (ideally) transparent.

      Ultimate goals

      Our first goal here is to validate the technology at scale. If it works as we expect, then we'll add a complete automated solution and UI on top of it, and sell pro support for people who want to get a "turnkey" supported solution (a la XOA).

      The manual/shell script installation as described here is meant to stay fully open/accessible with community support πŸ™‚

      Now I'm letting @ronan-a writing the rest of this message πŸ™‚ Thanks a lot for your hard work πŸ˜‰

      How-it-works-Isometric-Deck-5Integrations-VersionV2-1024x554.png


      ⚠ Important ⚠

      Despite we are doing intensive testing with this technology in the last 2 years (!), it was really HARD to integrate it easily into SMAPIv1 (legacy storage stack of XCP-ng). Especially when you have to test all potential cases.

      The goal of this tech preview is to scale our testing to a LOT of users.

      Right now, this version should be installed on pools with 3 or 4 hosts. We plan to release another test release in one month to remove this limitation. Also, in order to ensure data integrity, it is more than recommended to use at least 3 hosts.

      How to install XOSTOR on your pool?

      1. Download installation script

      First, you must ensure you have at least one free disk or more on each host of your pool.
      Then you can download the installation script using this command:

      wget https://gist.githubusercontent.com/Wescoeur/7bb568c0e09e796710b0ea966882fcac/raw/052b3dfff9c06b1765e51d8de72c90f2f90f475b/gistfile1.txt -O install && chmod +x install
      

      2. Install

      Then, on each host you must execute the script with the disks to use, for example with one partition:

      ./install --disks /dev/sdb
      

      If you have many disks you can use them, BUT for optimal use, the sum of all disks should be the same on each host:

      ./install --disks /dev/nvme0n1 /dev/nvme0n2 /dev/nvme0n3
      

      By default, thick provisioning is used, you can use thin instead:

      ./install --disks /dev/sdb --thin
      

      Note: You can use the --force flag if you already have a VG group or PV on your hosts to override:

      ./install --disks /dev/sdb --thin --force
      

      3. Verify config

      With thin option

      lsblk must return on each host an output similar to:

      > lsblk
      NAME                                                                              MAJ:MIN  RM   SIZE RO TYPE  MOUNTPOINT
      ...
      sdb                                                                                 8:16    0   1.8T  0 disk
      └─36848f690df82210028c2364008358dd7                                               253:0     0   1.8T  0 mpath
        β”œβ”€linstor_group-thin_device_tmeta                                               253:1     0   120M  0 lvm
        β”‚ └─linstor_group-thin_device-tpool                                             253:3     0   1.8T  0 lvm
        └─linstor_group-thin_device_tdata                                               253:2     0   1.8T  0 lvm
          └─linstor_group-thin_device-tpool                                             253:3     0   1.8T  0 lvm
      ...
      

      With thick option

      No LVM volume is created, only a new group must be present now using vgs command.

      > vgs
        VG                                                 #PV #LV #SN Attr   VSize   VFree  
        ...
        linstor_group                                        1   0   0 wz--n- 931.51g 931.51g
      

      And you must have linstor versions of sm and xha:

      > rpm -qa | grep -E "^(sm|xha)-.*linstor.*"
      sm-2.30.4-1.1.0.linstor.8.xcpng8.2.x86_64
      xha-10.1.0-2.2.0.linstor.1.xcpng8.2.x86_64
      

      4. Finally you can create the SR:

      If you use thick provisioning:

      xe sr-create type=linstor name-label=<SR_NAME> host-uuid=<MASTER_UUID> device-config:group-name=linstor_group device-config:redundancy=<REDUNDANCY> shared=true device-config:provisioning=thick
      

      Otherwise with thin provisioning:

      xe sr-create type=linstor name-label=<SR_NAME> host-uuid=<MASTER_UUID> device-config:group-name=linstor_group/thin_device device-config:redundancy=<REDUNDANCY> shared=true device-config:provisioning=thin
      

      So for example if you have 4 hosts, a thin config and you want a replication of 3 for each disk:

      xe sr-create type=linstor name-label=XOSTOR host-uuid=bc3cd3af-3f09-48cf-ae55-515ba21930f5 device-config:group-name=linstor_group/thin_device device-config:redundancy=3 shared=true device-config:provisioning=thin
      
      

      5. Verification

      After that you must have a XOSTOR SR visible in XOA with all PBDs attached.

      6. Update

      If you want to update your LINSTOR and other packages, you can execute on each host the install script like this:

      ./install --update-only
      

      F.A.Q.

      How the SR capacity is calculated? πŸ€”

      If you can't create a VDI greater than the displayed size in the XO SR view, don't worry:

      • There are two important things to remember: the maximum size of a VDI that can be created is not necessarily equal to the capacity of the SR. The SR capacity in the XOSTOR context is the maximum size that can be used to store all VDI data.
      • Exception: if the replication count is equal to the number of hosts, the SR capacity is equal to the max VDI size, i.e. the capacity of the smallest disk in the pool.

      We use this formula to compute the SR capacity:

      sr_capacity = smallest_host_disk_capacity * host_count / replication_count
      

      For example if you have a pool of 3 hosts with a replication count of 2 and a disk of 200 GiB on each host, the capacity of the SR is equal to 300 GiB using the formula. Notes:

      • You can't create a VDI greater than 200 GiB because the replication is not block based but volume based.
      • If you create a volume of 200 GiB (400 of the 600 GiB are physically used) and the remaining disk can't be used because it becomes impossible to replicate on two different disks.
      • If you create 3 volumes of 100 GiB: the SR becomes fully filled. In this case you have 300 GiB of unique data and a replication of 300 GiB.

      How to destroy properly the SR after a SR.forget call?

      If you used a command like SR.forget, the SR is not actually removed properly. To do that you can execute these commands:

      # Create new UUID for the SR to reintroduce it.
      uuidgen
      
      # Reintroduce the SR.
      xe sr-introduce uuid=<UUID_of_uuidgen> type=linstor shared=true name-label="XOSTOR" content-type=user
      
      # Get host list to recreate PBD
      xe host-list
      ...
      
      # For each host, you must execute a `xe pbd-create` call.
      # Don't forget to use the correct SR/host UUIDs, and device-config parameters.
      xe pbd-create host-uuid=<host_uuid>  sr-uuid=uuid_of_uuidgen> device-config:provisioning=thick device-config:redundancy=<redundancy> device-config:group-name=<group_name>
      
      # After this point you can now destroy the SR properly using xe or XOA.
      

      Node: auto-eviction and how to restore?

      If a node is no longer active for 60 minutes by default, it's automatically evicted. This behavior can be changed.
      There is an advantage using auto evict, if there are enough nodes in your cluster, LINSTOR will create new replicas of your disks.

      See: https://linbit.com/blog/linstors-auto-evict/

      Now if you want to re-add your node, it's not automatic. You can used a linstor command to remove it: linstor node lost. Then you can recreate it. Also if there is no disk issue, and it was a network problem, whatever, just run one command linstor node restore.

      How to use a specific network storage?

      You can run few specific LINSTOR commands to configure new NICs to use. By default the XAPI management interface is used.

      For more info: https://linbit.com/drbd-user-guide/linstor-guide-1_0-en/#s-managing_network_interface_cards

      In case of failure with the preferred NIC, the default interface is used.

      How to replace drives?

      Take a look at the official documentation: https://kb.linbit.com/how-do-i-replace-a-failed-d

      XAPI plugin: linstor-manager

      It's possible to perform low-level tasks using the linstor-manager plugin.

      It can be executed using the following command:

      xe host-call-plugin host-uuid=<HOST_UUID> plugin=linstor-manager fn=<FUNCTION> args:<ARG_NAME_1>=<VALUE_1> args:<ARG_NAME_2>=<VALUE_2> ...
      

      Many functions are not documented here and are reserved for internal use by the smapi driver (LinstorSR).

      For each command, HOST_UUID is a host of your pool, master or not.

      Add a new host to an existing LINSTOR SR

      xe host-call-plugin host-uuid=<HOST_UUID> plugin=linstor-manager fn=addHost args:groupName=<THIN_OR_THICK_POOL_NAME>
      

      This command creates a new PBD on the SR and new node in the LINSTOR database. Also it starts what's is necessary for the driver.
      After running this command, it's up to you to set up a new storage pool in the LINSTOR database with the same name used by the other nodes.
      So again use pvcreate/vgcreate and then a basic "linstor storage-pool create"

      Remove a new host from an existing LINSTOR SR

      xe host-call-plugin host-uuid=<HOST_UUID> plugin=linstor-manager fn=removeHost args:groupName=<THIN_OR_THICK_POOL_NAME>
      

      Check if the linstor controller is currently running on a specific host

      xe host-call-plugin host-uuid=<HOST_UUID> plugin=linstor-manager fn=hasControllerRunning
      

      Example:

      xe host-call-plugin host-uuid=ddcd3461-7052-4f5e-932c-e1ed75c192d6 plugin=linstor-manager fn=hasControllerRunning
      False
      

      Check if a DRBD volume is currently used by a process on a specific host

      xe host-call-plugin host-uuid=<HOST_UUID> plugin=linstor-manager fn=getDrbdOpeners args:resourceName=<RES_NAME> args:volume=0
      

      Example:

      xe host-call-plugin host-uuid=ddcd3461-7052-4f5e-932c-e1ed75c192d6 plugin=linstor-manager fn=getDrbdOpeners args:resourceName=xcp-volume-a10809db-bb40-43bd-9dee-22d70d781c45 args:volume=0
      {}
      

      List DRBD volumes

      xe host-call-plugin host-uuid=<HOST_UUID> plugin=linstor-manager fn=listDrbdVolumes args:groupName=<THIN_OR_THICK_POOL_NAME>
      

      Example:

      xe host-call-plugin host-uuid=ddcd3461-7052-4f5e-932c-e1ed75c192d6 plugin=linstor-manager fn=listDrbdVolumes args:groupName=linstor_group/thin_device
      {"linstor_group": [1000, 1005, 1001, 1007, 1006]}
      

      Force destruction of DRBD volumes

      Warning: In principle, the volumes created by the smapi driver (LinstorSR) must be destroyed using the XAPI or XOA. Only use these functions if you know what you are doing. Otherwise, forget them.

      # To destroy one volume:
      xe host-call-plugin host-uuid=<HOST_UUID> plugin=linstor-manager fn=destroyDrbdVolume args:minor=<MINOR>
      
      # To destroy all volumes:
      xe host-call-plugin host-uuid=<HOST_UUID> plugin=linstor-manager fn=destroyDrbdVolumes args:groupName=<THIN_OR_THICK_POOL_NAME>
      
      furyflash777F F 2 Replies Last reply Reply Quote 5
      • olivierlambertO olivierlambert pinned this topic on
      • J Offline
        JeffBerntsen Top contributor
        last edited by

        @olivierlambert @ronan-a

        I've been waiting for this for a while. Thank you!

        I've got a couple of questions about the test though.

        First, does it require an entire disk as its installation target? Is it possible to supply just a disk partition such as the boot drive partition normally reserved for SR use? I'm also interested in testing it on top of software RAID so the requirement, if any, for using a whole drive there could be important too. I know that DRBD in general doesn't care but didn't know if there might be some additional requirement for either LINSTOR or XOSTOR.

        Second, is there any specific networking requirement, i.e. does it need a dedicated network connection or will it work without?

        1 Reply Last reply Reply Quote 0
        • olivierlambertO Offline
          olivierlambert Vates πŸͺ Co-Founder CEO
          last edited by

          Hi,

          1. A partition will work yes.
          2. On top of soft RAID: yes, should work too but I have no idea if it's a best practice or not. We can ask the LINSTOR guys πŸ™‚
          3. Bigger and faster the network, the better. But feel free to test whatever you like during this phase πŸ™‚
          J 2 Replies Last reply Reply Quote 0
          • J Offline
            JeffBerntsen Top contributor @olivierlambert
            last edited by

            @olivierlambert said in XOSTOR hyperconvergence preview:

            Hi,

            1. A partition will work yes.
            2. On top of soft RAID: yes, should work too but I have no idea if it's a best practice or not. We can ask the LINSTOR guys πŸ™‚
            3. Bigger and faster the network, the better. But feel free to test whatever you like during this phase πŸ™‚

            Thanks for getting back to me so quickly! Those are pretty much the answers I expected based on my experience with DRBD and what little I know of LINSTOR but wanted to confirm it if possible. My planned test for software RAID is three servers each with a 4-drive soft RAID 10.

            1 Reply Last reply Reply Quote 0
            • J Offline
              JeffBerntsen Top contributor @olivierlambert
              last edited by

              @olivierlambert @ronan-a

              This seems to be working for me. It's definitely working on top of software RAID. I'm able to move and copy VMs between XOSTOR and other SRs. Copies between a shared NFS v4 SR and XOSTOR are a little slow but I suspect that's more of a network bandwidth problem than anything else. If that's the case, being able to dedicate a network interface for XOSTOR use would probably take care of a lot of that.

              The only problem I ran into with the installation was that one of the three servers was disconnected from the SR after it was created. After reconnecting it, things seem to be working well.

              Speed seems to be a little better than running over the NFS SR.

              Next question is where to go from here. Is there any specific testing you would like done or any information you would like about my test systems?

              1 Reply Last reply Reply Quote 0
              • olivierlambertO Offline
                olivierlambert Vates πŸͺ Co-Founder CEO
                last edited by

                Play with it, snapshot, backup whatever πŸ™‚ What matters is resiliency.

                Really happy to know it works well for you! More improvements are coming in January πŸ™‚

                J 2 Replies Last reply Reply Quote 0
                • J Offline
                  JeffBerntsen Top contributor @olivierlambert
                  last edited by

                  @olivierlambert
                  After some more playing, I'm beginning to see some issues.

                  One is that, I think, running HA on top of it definitely causes some issues. After setting up HA and setting it up to use XOSTOR as the state/heartbeat SR, things appeared to work but since doing that, all three servers experienced crashes in pretty short succession after running successfully for several hours but have run successfully for a day or two since then. While running with HA enabled and using XOSTOR, the logs fill up with drbd state change messages for the xcp-persistent-ha-statefile.

                  If you're interested, I can gather up logs covering the day of the crashes or any other information you might want. Just let me know what you'd want for that and I'll be happy to collect it for you.

                  I've since disabled HA again and expect the servers will be as stable as they were before I enabled it (very stable from what I've seen so far except for the experiment with HA).

                  I suspect that HA will probably also work just fine as long as XOSTOR is not used for the heartbeat/metadata SR for it. This pool is also set up with a shared NFS v4 SR and I could experiment with using HA with that as the heartbeat SR but still using XOSTOR to house the VMs.

                  ronan-aR 1 Reply Last reply Reply Quote 0
                  • olivierlambertO Offline
                    olivierlambert Vates πŸͺ Co-Founder CEO
                    last edited by

                    Hmm weird, I suppose this will be great to have more details for @ronan-a

                    In theory, we should have HA working with LINSOT.

                    J 1 Reply Last reply Reply Quote 0
                    • J Offline
                      JeffBerntsen Top contributor @olivierlambert
                      last edited by JeffBerntsen

                      @olivierlambert
                      No problem. Just let me know or have him let me know what he needs for information and I'll try to get it to you folks somehow.

                      The actual crashes seem to be related to fencing of machines and recovery after I intentionally forced an outage on one of the test servers. My test was making sure that the LINSTOR controller service was running on the server acting as the pool master and hosting a couple of the several test VMs with HA enabled. Then I forced a failure by pulling its power cord.

                      The servers recovered on their own from that with one of the others in the pool taking over as pool master and that one and the other trying to restart failed VMs, mostly successfully. The unsuccessful startups were either due to a lack of RAM or due to not being able to find the VDI for the VM. The latter problem went away on its own and I suspect that was due to HA not waiting long enough for LINSTOR to straighten out the storage situation before trying to restart the VM.

                      After that I restarted the "failed" server and it came up, rejoined the pool, and I was able to start VMs on it. As far as I can see, it looks like the servers crashed and mostly recovered on their own shortly after that.

                      That was three days ago and the pool has been up and running since then without problems with HA enabled. I've since disabled it after looking at the logs associated with the crash and seeing that having HA running with it's heartbeat storage running on top of LINSTOR was causing the logs to fill up at a rate of several lines per second.

                      1 Reply Last reply Reply Quote 0
                      • olivierlambertO Offline
                        olivierlambert Vates πŸͺ Co-Founder CEO
                        last edited by

                        @ronan-a will come Monday and ask you questions probably πŸ˜‰

                        1 Reply Last reply Reply Quote 0
                        • olivierlambertO Offline
                          olivierlambert Vates πŸͺ Co-Founder CEO
                          last edited by

                          Or Wednesday in fact, but he'll answer very soon πŸ˜‰

                          1 Reply Last reply Reply Quote 1
                          • E Offline
                            elialum
                            last edited by

                            @olivierlambert said in XOSTOR hyperconvergence preview:

                            shared=true device-config:provisioning=thin

                            Interesting... Thank you for this implementation, looks promising.

                            Out of curiosity, can we use this "technology" instead of XOA's CR job? So, for a single host setup, with 1 Local disk, and for example a second network iscsi/nas drive, can we use xostor to clone the data in the background?

                            This is more a backup solution rather then HA setup, can can assure up-to-date data in the iscsi device

                            1 Reply Last reply Reply Quote 0
                            • ronan-aR Offline
                              ronan-a Vates πŸͺ XCP-ng Team @JeffBerntsen
                              last edited by

                              @jeffberntsen Hello, so I'm available. ^^

                              If you're interested, I can gather up logs covering the day of the crashes or any other information you might want. Just let me know what you'd want for that and I'll be happy to collect it for you.

                              Yes! Could you send me your logs? (Old XCP-ng logs: xha.log, daemon.log, SMlog, kern.log..., and the logs of LINSTOR: /var/log/linstor-{controller/satellite})

                              The actual crashes seem to be related to fencing of machines and recovery after I intentionally forced an outage on one of the test servers. My test was making sure that the LINSTOR controller service was running on the server acting as the pool master and hosting a couple of the several test VMs with HA enabled. Then I forced a failure by pulling its power cord.

                              It can be a bad delay during the restart of the linstor-controller or a bad sync in the DRBD layer.
                              I have already observed a long delay with a similar test. But we would still have to check the logs.

                              In this situation you can execute this command where the current linstor controller is running: linstor resource list. It's useful to check the current state. πŸ™‚

                              That was three days ago and the pool has been up and running since then without problems with HA enabled. I've since disabled it after looking at the logs associated with the crash and seeing that having HA running with it's heartbeat storage running on top of LINSTOR was causing the logs to fill up at a rate of several lines per second.

                              Yeah, we are aware of this problem. We have discussed with the linbit team to reduce the verbosity of the DRBD logs, and there is a new patch to test in the next CH release to compress log files more often. It would be interesting to reduce the space usage of /var/log.

                              J 1 Reply Last reply Reply Quote 0
                              • J Offline
                                JeffBerntsen Top contributor @ronan-a
                                last edited by

                                @ronan-a said in XOSTOR hyperconvergence preview:

                                @jeffberntsen Hello, so I'm available. ^^

                                If you're interested, I can gather up logs covering the day of the crashes or any other information you might want. Just let me know what you'd want for that and I'll be happy to collect it for you.

                                Yes! Could you send me your logs? (Old XCP-ng logs: xha.log, daemon.log, SMlog, kern.log..., and the logs of LINSTOR: /var/log/linstor-{controller/satellite})

                                Absolutely. I've grabbed all logs from the system including the XCP-ng crash log folder from the day I ran the test and a few days before and after. I've got .tar.gz files of the contents of the logs folders from each of the three servers in my test pool covering that period, about 250MB of compressed files total. What would be the best way to get them to you?

                                It can be a bad delay during the restart of the linstor-controller or a bad sync in the DRBD layer.
                                I have already observed a long delay with a similar test. But we would still have to check the logs.

                                In this situation you can execute this command where the current linstor controller is running: linstor resource list. It's useful to check the current state. πŸ™‚

                                I did that after everything came back up on its own and that reported all resources as up and healthy.

                                Something I noticed is that the linstor command only works on the host running as linstor controller at the time as the cli is looking for the controller running on localhost.

                                I think the delay in my case was the controller coming back up on a different host. I didn't see any sign of a bad sync in DRBD. (I've used DRBD on and off quite a bit so have some experience with that but have very little with LINSTOR).

                                Yeah, we are aware of this problem. We have discussed with the linbit team to reduce the verbosity of the DRBD logs, and there is a new patch to test in the next CH release to compress log files more often. It would be interesting to reduce the space usage of /var/log.

                                I'm pretty sure that's just related to HA from what I could see. You've obviously worked with it more than I have so please correct me if I'm wrong but it looks like LINSTOR tries to switch the active copy of the data to whichever system tries to write to the resource at the time and in HA, all of the servers in the pool are constantly trying to write to the HA metadata and heartbeat VDIs, driving LINSTOR crazy trying to keep up. As far as I can see, that doesn't happen with normal VM use because they're normally opened, read, and written by just one system at a time.

                                ronan-aR 1 Reply Last reply Reply Quote 0
                                • ronan-aR Offline
                                  ronan-a Vates πŸͺ XCP-ng Team @JeffBerntsen
                                  last edited by

                                  @jeffberntsen

                                  Absolutely. I've grabbed all logs from the system including the XCP-ng crash log folder from the day I ran the test and a few days before and after. I've got .tar.gz files of the contents of the logs folders from each of the three servers in my test pool covering that period, about 250MB of compressed files total. What would be the best way to get them to you?

                                  You can upload it where you want. Then you can send me a private message with the download link. Thank you. πŸ™‚

                                  I did that after everything came back up on its own and that reported all resources as up and healthy.

                                  Something I noticed is that the linstor command only works on the host running as linstor controller at the time as the cli is looking for the controller running on localhost.

                                  You can use this command linstor --controllers=<HOSTNAME_OR_IP> resource list when you are on another host. Note: The linstor-controller service is automatically started from a specific smapi daemon: minidrbdcluster because we want to detect at any time host crash or reboot and start a new controller if necessary. Also the LINSTOR DB is shared using a VDI, so the controller service must always be executed by XCP-ng and not a user. πŸ˜‰

                                  I think the delay in my case was the controller coming back up on a different host. I didn't see any sign of a bad sync in DRBD. (I've used DRBD on and off quite a bit so have some experience with that but have very little with LINSTOR).
                                  I'm pretty sure that's just related to HA from what I could see. You've obviously worked with it more than I have so please correct me if I'm wrong but it looks like LINSTOR tries to switch the active copy of the data to whichever system tries to write to the resource at the time and in HA, all of the servers in the pool are constantly trying to write to the HA metadata and heartbeat VDIs, driving LINSTOR crazy trying to keep up. As far as I can see, that doesn't happen with normal VM use because they're normally opened, read, and written by just one system at a time.

                                  Very good analysis on your part. Indeed this VDI is shared, and DRBD prevents us from opening it on several hosts at once. We haven't found a better solution than to open, write and close it for the moment. So it's why we must reduce the spam in the log files with few patches.

                                  J 1 Reply Last reply Reply Quote 0
                                  • J Offline
                                    JeffBerntsen Top contributor @ronan-a
                                    last edited by

                                    @ronan-a said in XOSTOR hyperconvergence preview:

                                    You can upload it where you want. Then you can send me a private message with the download link. Thank you. πŸ™‚

                                    Done. Let me know if you have any problems getting to it.

                                    You can use this command linstor --controllers=<HOSTNAME_OR_IP> resource list when you are on another host. Note: The linstor-controller service is automatically started from a specific smapi daemon: minidrbdcluster because we want to detect at any time host crash or reboot and start a new controller if necessary. Also the LINSTOR DB is shared using a VDI, so the controller service must always be executed by XCP-ng and not a user. πŸ˜‰

                                    A little reading around in the LINSTOR documentation eventually helped me out with this. It's possible to set up an
                                    environment variable LS_CONTROLLERS with a list of possible
                                    controller machines and the linstor CLI command will try all of the servers on the list until it finds the controller. On the three servers in my test pool, I can do something when I first get into a shell like LS_CONTROLLERS=server1,server2,server3 and as long as the three server names can be resolved on all three hosts either via DNS or because they're in the /etc/hosts file, the linstor command works from any of them no matter which one is the controller.

                                    1 Reply Last reply Reply Quote 0
                                    • J Offline
                                      JeffBerntsen Top contributor @olivierlambert
                                      last edited by

                                      @olivierlambert said in XOSTOR hyperconvergence preview:

                                      Play with it, snapshot, backup whatever πŸ™‚ What matters is resiliency.

                                      More playing: I've just installed the latest set of updates as a rolling pool update and it handled things fine. No problems with XOSTOR shifting the VMs around during the update and no apparent problems afterward.

                                      1 Reply Last reply Reply Quote 1
                                      • olivierlambertO Offline
                                        olivierlambert Vates πŸͺ Co-Founder CEO
                                        last edited by

                                        That's a great test indeed πŸ™‚ I have to say I'm impressed, maybe it's because I'm so used of corner cases I tested for month triggering various issues, but every time @ronan-a came with a solution. Kudos to him!

                                        Maelstrom96M 1 Reply Last reply Reply Quote 0
                                        • Maelstrom96M Offline
                                          Maelstrom96 @olivierlambert
                                          last edited by Maelstrom96

                                          @olivierlambert This looks very promising. We're currently running K8s on top of XCP-ng hosts and deploying everything through XOA with terraform adapters. It's been working well for us, but we're not using a shared SR which we're looking into deploying. The nice thing is that it looks like we could actually use the LINTSTORE directly from K8s, removing a two storage layers completely (OpenEBS + soft RAID 5 local SR), and making the whole thing work even better for both XCP-ng and K8s.

                                          I have a question before trying to deploy this - how would we go about changing the SR adapter in cases we need to add, remove or replace a XCP-ng host? Should we be able to change the SR configuration while it's active?

                                          ronan-aR 1 Reply Last reply Reply Quote 0
                                          • olivierlambertO Offline
                                            olivierlambert Vates πŸͺ Co-Founder CEO
                                            last edited by olivierlambert

                                            Likely a question for @ronan-a πŸ™‚

                                            edit: however, I'd love to have a chat with you to discuss your existing k8s workflow with XCP-ng/XOA!

                                            Maelstrom96M 1 Reply Last reply Reply Quote -1
                                            • First post
                                              Last post