XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    XOSTOR hyperconvergence preview

    Scheduled Pinned Locked Moved XOSTOR
    446 Posts 47 Posters 480.2k Views 48 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • stormiS Offline
      stormi Vates 🪐 XCP-ng Team @Chr57
      last edited by

      @Chr57 I'm no XOSTOR expert, but AFAIK the total available space will depend on the replication factor.

      1 Reply Last reply Reply Quote 0
      • G Offline
        gb.123 @Chr57
        last edited by

        @Chr57

        Storage between the servers is not shared as a large file system like Gluster. Right?
        So for example, each 4 hosts has 2TB storage then the max HD space is 2TB

        As @stormi mentioned, this depends on your replication factor. It works like this for your example:
        Total Space = (No of Hosts x Storage ) / Replication Factor
        (Assuming you have same storage on all nodes)

        Eg. replication factor is 2, then :

        Total Space = (4 x 2)/2 = 4 TB
        Note:
        What you have to keep in mind though is it also depends on each SSD you have so say if you put 1.5 TB SSD & 0.5 TB SSD, then although you have 2TB on each node, but if you create a VM with 1TB space, you will not be able to create another VM with 1TB since there will not be enough contiguous space. What it means is that XOSTOR will not split the VM disk on two separate drives in case of JBOD.
        In case of Raid 0 at bios level, you may be able to get away with this but Raid 0 is not recommended.

        Is the NIC speed of the storage network important? Is 2x40G on each server for this overkill?

        The question is generic and it actually depends on your workload and SSD speed (Gen4 or Gen5 or if you have an old Gen1). At the outset 2x40G seems to be more than enough for most applications. If you have a an old Gen1 SSD or SATA SSD, then you might not even reach the full bandwidth in case of 2x40GB (practically speaking).

        What software raid on the NVME disks is recommended?

        For going production with Nvme SSDs, I would not recommend RAID at all ! JBOD would work just fine.(Assuming your are using generic applications)

        1 Reply Last reply Reply Quote 0
        • F Offline
          Flav
          last edited by

          Hello to you,

          I am new to the XOSTOR solution,

          I followed the instructions to build an SR XOSTOR, except that unfortunately I have been stuck for 2-3 weeks on an error when creating the SR, below is my error:

          Error code: SR_BACKEND_FAILURE_5006
          Error parameters: , LINSTOR SR creation error [opterr=Could not create SP xcp-sr-linstor_group on node DEV-XCP02: The satellite does not support the device provider LVM],

          here is the command I run: xe sr-create type=linstor name-label=XOSTOR host-uuid=52c3a2bb-50a8-4700-a232-6e535e24d759 device-config:group-name=linstor_group device-config:redundancy=2 shared =true device-config:provisioning=thick

          thank you in advance for your help, and this great project

          1 Reply Last reply Reply Quote 0
          • T Offline
            tanonl
            last edited by tanonl

            Hi,
            I am actually new to XOSTOR and I have very basic questions to begin with.
            Does it support only Pools ? Can we attach such SR on many independent XCP-ng hosts?

            Thanks again for this incredible project.

            1 Reply Last reply Reply Quote 0
            • olivierlambertO Offline
              olivierlambert Vates 🪐 Co-Founder CEO
              last edited by

              Hi,

              It works only at pool level, the only way have coordination between hosts and knowing which host have the lock on which VM. This is essential to avoid booting the same VM/disk at 2 different places without knowing, leading to data corruption.

              1 Reply Last reply Reply Quote 0
              • F Offline
                fatek @olivierlambert
                last edited by

                @olivierlambert said in XOSTOR hyperconvergence preview:

                1. FINALLY YOU CAN CREATE THE SR:
                  Otherwise with thin provisioning:

                xe sr-create type=linstor name-label=<SR_NAME> host-uuid=<MASTER_UUID> device-config:group-name=linstor_group/thin_device device-config:redundancy=<REDUNDANCY> shared=tru

                Is this part of the command not needed?

                device-config:hosts=XCP-01,XCP-02,XCP-xx
                
                ronan-aR 1 Reply Last reply Reply Quote 0
                • olivierlambertO Offline
                  olivierlambert Vates 🪐 Co-Founder CEO
                  last edited by

                  question for @ronan-a

                  1 Reply Last reply Reply Quote 0
                  • ronan-aR Offline
                    ronan-a Vates 🪐 XCP-ng Team @fatek
                    last edited by

                    @fatek No. I removed this param, it's useless now.

                    1 Reply Last reply Reply Quote 1
                    • Maelstrom96M Offline
                      Maelstrom96
                      last edited by

                      @ronan-a Since XOSTOR is supposed to be stable now, I figured I would try it out with a new setup of 3 newly installed 8.2 nodes.

                      I used the CLI to deploy it. It all went well, and the SR was quickly ready. I was even able to migrate a disk to the Linstor SR and boot the VM. However, after rebooting the master, it seems like the SR doesn't want to allow any disk migration, and manual Scan are failing. I've tried unmounting/remounting the SR fully, restarting the toolstack, but nothing seems to help. The disk that was on Linstor is still accessible and the VM is able to boot.

                      Here is the error I'm getting:

                      sr.scan
                      {
                        "id": "e1a9bf4d-26ad-3ef6-b4a5-db98d012e0d9"
                      }
                      {
                        "code": "SR_BACKEND_FAILURE_47",
                        "params": [
                          "",
                          "The SR is not available [opterr=Database is not mounted]",
                          ""
                        ],
                        "task": {
                          "uuid": "a467bd90-8d47-09cc-b8ac-afa35056ff25",
                          "name_label": "Async.SR.scan",
                          "name_description": "",
                          "allowed_operations": [],
                          "current_operations": {},
                          "created": "20240502T21:40:00Z",
                          "finished": "20240502T21:40:01Z",
                          "status": "failure",
                          "resident_on": "OpaqueRef:b3e2f390-f45f-4614-a150-1eee53f204e1",
                          "progress": 1,
                          "type": "<none/>",
                          "result": "",
                          "error_info": [
                            "SR_BACKEND_FAILURE_47",
                            "",
                            "The SR is not available [opterr=Database is not mounted]",
                            ""
                          ],
                          "other_config": {},
                          "subtask_of": "OpaqueRef:NULL",
                          "subtasks": [],
                          "backtrace": "(((process xapi)(filename lib/backtrace.ml)(line 210))((process xapi)(filename ocaml/xapi/storage_access.ml)(line 32))((process xapi)(filename lib/xapi-stdext-pervasives/pervasiveext.ml)(line 24))((process xapi)(filename lib/xapi-stdext-pervasives/pervasiveext.ml)(line 35))((process xapi)(filename ocaml/xapi/message_forwarding.ml)(line 131))((process xapi)(filename lib/xapi-stdext-pervasives/pervasiveext.ml)(line 24))((process xapi)(filename ocaml/xapi/rbac.ml)(line 205))((process xapi)(filename ocaml/xapi/server_helpers.ml)(line 95)))"
                        },
                        "message": "SR_BACKEND_FAILURE_47(, The SR is not available [opterr=Database is not mounted], )",
                        "name": "XapiError",
                        "stack": "XapiError: SR_BACKEND_FAILURE_47(, The SR is not available [opterr=Database is not mounted], )
                          at Function.wrap (file:///opt/xo/xo-builds/xen-orchestra-202404270302/packages/xen-api/_XapiError.mjs:16:12)
                          at default (file:///opt/xo/xo-builds/xen-orchestra-202404270302/packages/xen-api/_getTaskResult.mjs:11:29)
                          at Xapi._addRecordToCache (file:///opt/xo/xo-builds/xen-orchestra-202404270302/packages/xen-api/index.mjs:1029:24)
                          at file:///opt/xo/xo-builds/xen-orchestra-202404270302/packages/xen-api/index.mjs:1063:14
                          at Array.forEach (<anonymous>)
                          at Xapi._processEvents (file:///opt/xo/xo-builds/xen-orchestra-202404270302/packages/xen-api/index.mjs:1053:12)
                          at Xapi._watchEvents (file:///opt/xo/xo-builds/xen-orchestra-202404270302/packages/xen-api/index.mjs:1226:14)"
                      }
                      

                      I quickly glanced over the source code and the SM logs to see if I could identify what was going on but it doesn't seem to be a simple thing.

                      Logs from SM:

                      May  2 13:22:02 xcp-ng-labs-host01 SM: [19242] LinstorSR.scan for e1a9bf4d-26ad-3ef6-b4a5-db98d012e0d9
                      May  2 13:22:02 xcp-ng-labs-host01 SM: [19242] Raising exception [47, The SR is not available [opterr=Database is not mounted]]
                      May  2 13:22:02 xcp-ng-labs-host01 SM: [19242] lock: released /var/lock/sm/e1a9bf4d-26ad-3ef6-b4a5-db98d012e0d9/sr
                      May  2 13:22:02 xcp-ng-labs-host01 SM: [19242] ***** generic exception: sr_scan: EXCEPTION <class 'SR.SROSError'>, The SR is not available [opterr=Database is not mounted]
                      May  2 13:22:02 xcp-ng-labs-host01 SM: [19242]   File "/opt/xensource/sm/SRCommand.py", line 110, in run
                      May  2 13:22:02 xcp-ng-labs-host01 SM: [19242]     return self._run_locked(sr)
                      May  2 13:22:02 xcp-ng-labs-host01 SM: [19242]   File "/opt/xensource/sm/SRCommand.py", line 159, in _run_locked
                      May  2 13:22:02 xcp-ng-labs-host01 SM: [19242]     rv = self._run(sr, target)
                      May  2 13:22:02 xcp-ng-labs-host01 SM: [19242]   File "/opt/xensource/sm/SRCommand.py", line 364, in _run
                      May  2 13:22:02 xcp-ng-labs-host01 SM: [19242]     return sr.scan(self.params['sr_uuid'])
                      May  2 13:22:02 xcp-ng-labs-host01 SM: [19242]   File "/opt/xensource/sm/LinstorSR", line 536, in wrap
                      May  2 13:22:02 xcp-ng-labs-host01 SM: [19242]     return load(self, *args, **kwargs)
                      May  2 13:22:02 xcp-ng-labs-host01 SM: [19242]   File "/opt/xensource/sm/LinstorSR", line 521, in load
                      May  2 13:22:02 xcp-ng-labs-host01 SM: [19242]     return wrapped_method(self, *args, **kwargs)
                      May  2 13:22:02 xcp-ng-labs-host01 SM: [19242]   File "/opt/xensource/sm/LinstorSR", line 381, in wrapped_method
                      May  2 13:22:02 xcp-ng-labs-host01 SM: [19242]     return method(self, *args, **kwargs)
                      May  2 13:22:02 xcp-ng-labs-host01 SM: [19242]   File "/opt/xensource/sm/LinstorSR", line 777, in scan
                      May  2 13:22:02 xcp-ng-labs-host01 SM: [19242]     opterr='Database is not mounted'
                      May  2 13:22:02 xcp-ng-labs-host01 SM: [19242]
                      
                      ronan-aR 1 Reply Last reply Reply Quote 0
                      • olivierlambertO Offline
                        olivierlambert Vates 🪐 Co-Founder CEO
                        last edited by

                        Have you restarted the satellites?

                        1 Reply Last reply Reply Quote 0
                        • ronan-aR Offline
                          ronan-a Vates 🪐 XCP-ng Team @Maelstrom96
                          last edited by

                          @Maelstrom96 said in XOSTOR hyperconvergence preview:

                          However, after rebooting the master, it seems like the SR doesn't want to allow any disk migration, and manual Scan are failing.

                          What's the status of these commands on each host?

                          systemctl status linstor-controller
                          systemctl status linstor-satellite
                          systemctl status drbd-reactor
                          mountpoint /var/lib/linstor
                          drbdsetup events2
                          

                          Also please share your SMlog files. 🙂

                          Maelstrom96M 1 Reply Last reply Reply Quote 1
                          • Maelstrom96M Offline
                            Maelstrom96 @ronan-a
                            last edited by Maelstrom96

                            @ronan-a said in XOSTOR hyperconvergence preview:

                            drbdsetup events2

                            Host1:

                            [09:49 xcp-ng-labs-host01 ~]# systemctl status linstor-controller
                            ● linstor-controller.service - drbd-reactor controlled linstor-controller
                               Loaded: loaded (/usr/lib/systemd/system/linstor-controller.service; disabled; vendor preset: disabled)
                              Drop-In: /run/systemd/system/linstor-controller.service.d
                                       └─reactor.conf
                               Active: active (running) since Thu 2024-05-02 13:24:32 PDT; 20h ago
                             Main PID: 21340 (java)
                               CGroup: /system.slice/linstor-controller.service
                                       └─21340 /usr/lib/jvm/jre-11/bin/java -Xms32M -classpath /usr/share/linstor-server/lib/conf:/usr/share/linstor-server/lib/* com.linbit.linstor.core.Controller --logs=/var/log/linstor-controller --config-directory=/etc/linstor
                            [09:49 xcp-ng-labs-host01 ~]# systemctl status linstor-satellite
                            ● linstor-satellite.service - LINSTOR Satellite Service
                               Loaded: loaded (/usr/lib/systemd/system/linstor-satellite.service; enabled; vendor preset: disabled)
                              Drop-In: /etc/systemd/system/linstor-satellite.service.d
                                       └─override.conf
                               Active: active (running) since Wed 2024-05-01 16:04:05 PDT; 1 day 17h ago
                             Main PID: 1947 (java)
                               CGroup: /system.slice/linstor-satellite.service
                                       ├─1947 /usr/lib/jvm/jre-11/bin/java -Xms32M -classpath /usr/share/linstor-server/lib/conf:/usr/share/linstor-server/lib/* com.linbit.linstor.core.Satellite --logs=/var/log/linstor-satellite --config-directory=/etc/linstor
                                       ├─2109 drbdsetup events2 all
                                       └─2347 /usr/sbin/dmeventd
                            [09:49 xcp-ng-labs-host01 ~]# systemctl status drbd-reactor
                            ● drbd-reactor.service - DRBD-Reactor Service
                               Loaded: loaded (/usr/lib/systemd/system/drbd-reactor.service; enabled; vendor preset: disabled)
                              Drop-In: /etc/systemd/system/drbd-reactor.service.d
                                       └─override.conf
                               Active: active (running) since Wed 2024-05-01 16:04:11 PDT; 1 day 17h ago
                                 Docs: man:drbd-reactor
                                       man:drbd-reactorctl
                                       man:drbd-reactor.toml
                             Main PID: 1950 (drbd-reactor)
                               CGroup: /system.slice/drbd-reactor.service
                                       ├─1950 /usr/sbin/drbd-reactor
                                       └─1976 drbdsetup events2 --full --poll
                            [09:49 xcp-ng-labs-host01 ~]# mountpoint /var/lib/linstor
                            /var/lib/linstor is a mountpoint
                            [09:49 xcp-ng-labs-host01 ~]# drbdsetup events2
                            exists resource name:xcp-persistent-database role:Primary suspended:no force-io-failures:no may_promote:no promotion_score:10103
                            exists connection name:xcp-persistent-database peer-node-id:1 conn-name:xcp-ng-labs-host03 connection:Connected role:Secondary
                            exists connection name:xcp-persistent-database peer-node-id:2 conn-name:xcp-ng-labs-host02 connection:Connected role:Secondary
                            exists device name:xcp-persistent-database volume:0 minor:1000 backing_dev:/dev/linstor_group/xcp-persistent-database_00000 disk:UpToDate client:no quorum:yes
                            exists peer-device name:xcp-persistent-database peer-node-id:1 conn-name:xcp-ng-labs-host03 volume:0 replication:Established peer-disk:UpToDate peer-client:no resync-suspended:no
                            exists path name:xcp-persistent-database peer-node-id:1 conn-name:xcp-ng-labs-host03 local:ipv4:10.100.0.200:7000 peer:ipv4:10.100.0.202:7000 established:yes
                            exists peer-device name:xcp-persistent-database peer-node-id:2 conn-name:xcp-ng-labs-host02 volume:0 replication:Established peer-disk:UpToDate peer-client:no resync-suspended:no
                            exists path name:xcp-persistent-database peer-node-id:2 conn-name:xcp-ng-labs-host02 local:ipv4:10.100.0.200:7000 peer:ipv4:10.100.0.201:7000 established:yes
                            exists resource name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 role:Secondary suspended:no force-io-failures:no may_promote:no promotion_score:10103
                            exists connection name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 peer-node-id:1 conn-name:xcp-ng-labs-host03 connection:Connected role:Secondary
                            exists connection name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 peer-node-id:0 conn-name:xcp-ng-labs-host02 connection:Connected role:Primary
                            exists device name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 volume:0 minor:1001 backing_dev:/dev/linstor_group/xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0_00000 disk:UpToDate client:no quorum:yes
                            exists peer-device name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 peer-node-id:1 conn-name:xcp-ng-labs-host03 volume:0 replication:Established peer-disk:UpToDate peer-client:no resync-suspended:no
                            exists path name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 peer-node-id:1 conn-name:xcp-ng-labs-host03 local:ipv4:10.100.0.200:7001 peer:ipv4:10.100.0.202:7001 established:yes
                            exists peer-device name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 peer-node-id:0 conn-name:xcp-ng-labs-host02 volume:0 replication:Established peer-disk:UpToDate peer-client:no resync-suspended:no
                            exists path name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 peer-node-id:0 conn-name:xcp-ng-labs-host02 local:ipv4:10.100.0.200:7001 peer:ipv4:10.100.0.201:7001 established:yes
                            exists -
                            

                            Host2:

                            [09:51 xcp-ng-labs-host02 ~]# systemctl status linstor-controller
                            ● linstor-controller.service - drbd-reactor controlled linstor-controller
                               Loaded: loaded (/usr/lib/systemd/system/linstor-controller.service; disabled; vendor preset: disabled)
                              Drop-In: /run/systemd/system/linstor-controller.service.d
                                       └─reactor.conf
                               Active: inactive (dead)
                            [09:51 xcp-ng-labs-host02 ~]# systemctl status linstor-satellite
                            ● linstor-satellite.service - LINSTOR Satellite Service
                               Loaded: loaded (/usr/lib/systemd/system/linstor-satellite.service; enabled; vendor preset: disabled)
                              Drop-In: /etc/systemd/system/linstor-satellite.service.d
                                       └─override.conf
                               Active: active (running) since Thu 2024-05-02 10:26:59 PDT; 23h ago
                             Main PID: 1990 (java)
                               CGroup: /system.slice/linstor-satellite.service
                                       ├─1990 /usr/lib/jvm/jre-11/bin/java -Xms32M -classpath /usr/share/linstor-server/lib/conf:/usr/share/linstor-server/lib/* com.linbit.linstor.core.Satellite --logs=/var/log/linstor-satellite --config-directory=/etc/linstor
                                       ├─2128 drbdsetup events2 all
                                       └─2552 /usr/sbin/dmeventd
                            [09:51 xcp-ng-labs-host02 ~]# systemctl status drbd-reactor
                            ● drbd-reactor.service - DRBD-Reactor Service
                               Loaded: loaded (/usr/lib/systemd/system/drbd-reactor.service; enabled; vendor preset: disabled)
                              Drop-In: /etc/systemd/system/drbd-reactor.service.d
                                       └─override.conf
                               Active: active (running) since Thu 2024-05-02 10:27:07 PDT; 23h ago
                                 Docs: man:drbd-reactor
                                       man:drbd-reactorctl
                                       man:drbd-reactor.toml
                             Main PID: 1989 (drbd-reactor)
                               CGroup: /system.slice/drbd-reactor.service
                                       ├─1989 /usr/sbin/drbd-reactor
                                       └─2035 drbdsetup events2 --full --poll
                            [09:51 xcp-ng-labs-host02 ~]# mountpoint /var/lib/linstor
                            /var/lib/linstor is not a mountpoint
                            [09:51 xcp-ng-labs-host02 ~]# drbdsetup events2
                            exists resource name:xcp-persistent-database role:Secondary suspended:no force-io-failures:no may_promote:no promotion_score:10103
                            exists connection name:xcp-persistent-database peer-node-id:0 conn-name:xcp-ng-labs-host01 connection:Connected role:Primary
                            exists connection name:xcp-persistent-database peer-node-id:1 conn-name:xcp-ng-labs-host03 connection:Connected role:Secondary
                            exists device name:xcp-persistent-database volume:0 minor:1000 backing_dev:/dev/linstor_group/xcp-persistent-database_00000 disk:UpToDate client:no quorum:yes
                            exists peer-device name:xcp-persistent-database peer-node-id:0 conn-name:xcp-ng-labs-host01 volume:0 replication:Established peer-disk:UpToDate peer-client:no resync-suspended:no
                            exists path name:xcp-persistent-database peer-node-id:0 conn-name:xcp-ng-labs-host01 local:ipv4:10.100.0.201:7000 peer:ipv4:10.100.0.200:7000 established:yes
                            exists peer-device name:xcp-persistent-database peer-node-id:1 conn-name:xcp-ng-labs-host03 volume:0 replication:Established peer-disk:UpToDate peer-client:no resync-suspended:no
                            exists path name:xcp-persistent-database peer-node-id:1 conn-name:xcp-ng-labs-host03 local:ipv4:10.100.0.201:7000 peer:ipv4:10.100.0.202:7000 established:yes
                            exists resource name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 role:Primary suspended:no force-io-failures:no may_promote:no promotion_score:10103
                            exists connection name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 peer-node-id:2 conn-name:xcp-ng-labs-host01 connection:Connected role:Secondary
                            exists connection name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 peer-node-id:1 conn-name:xcp-ng-labs-host03 connection:Connected role:Secondary
                            exists device name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 volume:0 minor:1001 backing_dev:/dev/linstor_group/xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0_00000 disk:UpToDate client:no quorum:yes
                            exists peer-device name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 peer-node-id:2 conn-name:xcp-ng-labs-host01 volume:0 replication:Established peer-disk:UpToDate peer-client:no resync-suspended:no
                            exists path name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 peer-node-id:2 conn-name:xcp-ng-labs-host01 local:ipv4:10.100.0.201:7001 peer:ipv4:10.100.0.200:7001 established:yes
                            exists peer-device name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 peer-node-id:1 conn-name:xcp-ng-labs-host03 volume:0 replication:Established peer-disk:UpToDate peer-client:no resync-suspended:no
                            exists path name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 peer-node-id:1 conn-name:xcp-ng-labs-host03 local:ipv4:10.100.0.201:7001 peer:ipv4:10.100.0.202:7001 established:yes
                            exists -
                            

                            Host3:

                            [09:51 xcp-ng-labs-host03 ~]# systemctl status linstor-controller
                            ● linstor-controller.service - drbd-reactor controlled linstor-controller
                               Loaded: loaded (/usr/lib/systemd/system/linstor-controller.service; disabled; vendor preset: disabled)
                              Drop-In: /run/systemd/system/linstor-controller.service.d
                                       └─reactor.conf
                               Active: inactive (dead)
                            [09:52 xcp-ng-labs-host03 ~]# systemctl status linstor-satellite
                            ● linstor-satellite.service - LINSTOR Satellite Service
                               Loaded: loaded (/usr/lib/systemd/system/linstor-satellite.service; enabled; vendor preset: disabled)
                              Drop-In: /etc/systemd/system/linstor-satellite.service.d
                                       └─override.conf
                               Active: active (running) since Thu 2024-05-02 10:10:16 PDT; 23h ago
                             Main PID: 1937 (java)
                               CGroup: /system.slice/linstor-satellite.service
                                       ├─1937 /usr/lib/jvm/jre-11/bin/java -Xms32M -classpath /usr/share/linstor-server/lib/conf:/usr/share/linstor-server/lib/* com.linbit.linstor.core.Satellite --logs=/var/log/linstor-satellite --config-directory=/etc/linstor
                                       ├─2151 drbdsetup events2 all
                                       └─2435 /usr/sbin/dmeventd
                            [09:52 xcp-ng-labs-host03 ~]# systemctl status drbd-reactor
                            ● drbd-reactor.service - DRBD-Reactor Service
                               Loaded: loaded (/usr/lib/systemd/system/drbd-reactor.service; enabled; vendor preset: disabled)
                              Drop-In: /etc/systemd/system/drbd-reactor.service.d
                                       └─override.conf
                               Active: active (running) since Thu 2024-05-02 10:10:26 PDT; 23h ago
                                 Docs: man:drbd-reactor
                                       man:drbd-reactorctl
                                       man:drbd-reactor.toml
                             Main PID: 1939 (drbd-reactor)
                               CGroup: /system.slice/drbd-reactor.service
                                       ├─1939 /usr/sbin/drbd-reactor
                                       └─1981 drbdsetup events2 --full --poll
                            [09:52 xcp-ng-labs-host03 ~]# mountpoint /var/lib/linstor
                            /var/lib/linstor is not a mountpoint
                            [09:52 xcp-ng-labs-host03 ~]# drbdsetup events2
                            exists resource name:xcp-persistent-database role:Secondary suspended:no force-io-failures:no may_promote:no promotion_score:10103
                            exists connection name:xcp-persistent-database peer-node-id:0 conn-name:xcp-ng-labs-host01 connection:Connected role:Primary
                            exists connection name:xcp-persistent-database peer-node-id:2 conn-name:xcp-ng-labs-host02 connection:Connected role:Secondary
                            exists device name:xcp-persistent-database volume:0 minor:1000 backing_dev:/dev/linstor_group/xcp-persistent-database_00000 disk:UpToDate client:no quorum:yes
                            exists peer-device name:xcp-persistent-database peer-node-id:0 conn-name:xcp-ng-labs-host01 volume:0 replication:Established peer-disk:UpToDate peer-client:no resync-suspended:no
                            exists path name:xcp-persistent-database peer-node-id:0 conn-name:xcp-ng-labs-host01 local:ipv4:10.100.0.202:7000 peer:ipv4:10.100.0.200:7000 established:yes
                            exists peer-device name:xcp-persistent-database peer-node-id:2 conn-name:xcp-ng-labs-host02 volume:0 replication:Established peer-disk:UpToDate peer-client:no resync-suspended:no
                            exists path name:xcp-persistent-database peer-node-id:2 conn-name:xcp-ng-labs-host02 local:ipv4:10.100.0.202:7000 peer:ipv4:10.100.0.201:7000 established:yes
                            exists resource name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 role:Secondary suspended:no force-io-failures:no may_promote:no promotion_score:10103
                            exists connection name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 peer-node-id:2 conn-name:xcp-ng-labs-host01 connection:Connected role:Secondary
                            exists connection name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 peer-node-id:0 conn-name:xcp-ng-labs-host02 connection:Connected role:Primary
                            exists device name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 volume:0 minor:1001 backing_dev:/dev/linstor_group/xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0_00000 disk:UpToDate client:no quorum:yes
                            exists peer-device name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 peer-node-id:2 conn-name:xcp-ng-labs-host01 volume:0 replication:Established peer-disk:UpToDate peer-client:no resync-suspended:no
                            exists path name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 peer-node-id:2 conn-name:xcp-ng-labs-host01 local:ipv4:10.100.0.202:7001 peer:ipv4:10.100.0.200:7001 established:yes
                            exists peer-device name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 peer-node-id:0 conn-name:xcp-ng-labs-host02 volume:0 replication:Established peer-disk:UpToDate peer-client:no resync-suspended:no
                            exists path name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 peer-node-id:0 conn-name:xcp-ng-labs-host02 local:ipv4:10.100.0.202:7001 peer:ipv4:10.100.0.201:7001 established:yes
                            exists -
                            
                            

                            Will be sending the debug file as a DM.

                            Edit: Just as a sanity check, I tried to reboot the master instead of just restarting the toolstack, and the linstor SR seems to be working as expected again. The XOSTOR tab in XOA now populates (it just errored out before) and the SR scan now goes through.

                            Edit2: Was able to move a VDI, but then, the same exact error started to happen again. No idea why.

                            ronan-aR 2 Replies Last reply Reply Quote 0
                            • Theoi-MeteoroiT Offline
                              Theoi-Meteoroi
                              last edited by

                              You lost quorum.

                              I would start looking at DRBD - that is the underlying part that isn't working at the moment. When I deployed this I wanted to understand the parts. Key to the Linstor layer - drbd stores the cluster state and membership.

                              I'd advise reading the DRBD docs as well as the Linstor docs to find the commands you need to stand this back up. I really don't advise using anything spinning for disk. SSD and NVMe is the ticket. You can make rust work but its terminally slow. I found 3TB disk was ok ( ~60MB/sec ) but 9.1 (10 ) TB were just awful at with 20-40MB/sec the best I saw. I removed all the XOSTOR stuff this week to maybe reinstall on some 4TB NVMe.

                              The upside of all that time learning drbd and linstor was helpful when I decided to use the Piraeus operator at the kubernetes level. Its basically all the same bits built from source on the nodes you deploy on and includes a CSI driver.

                              ronan-aR 1 Reply Last reply Reply Quote 0
                              • ronan-aR Offline
                                ronan-a Vates 🪐 XCP-ng Team @Theoi-Meteoroi
                                last edited by

                                @Theoi-Meteoroi said in XOSTOR hyperconvergence preview:

                                You lost quorum.

                                Not a quorum issue:

                                exists device name:xcp-persistent-database volume:0 minor:1000 backing_dev:/dev/linstor_group/xcp-persistent-database_00000 disk:UpToDate client:no quorum:yes

                                1 Reply Last reply Reply Quote 0
                                • ronan-aR Offline
                                  ronan-a Vates 🪐 XCP-ng Team @Maelstrom96
                                  last edited by

                                  @Maelstrom96 Thank you for the logs, I'm trying to understand the issue.
                                  For the moment I don't see a problem regarding the status of the services.

                                  1 Reply Last reply Reply Quote 0
                                  • ronan-aR Offline
                                    ronan-a Vates 🪐 XCP-ng Team @Maelstrom96
                                    last edited by ronan-a

                                    @Maelstrom96 It sounds like a race condition or a bad mount of the database. But I'm not sure, so I will add more logs for the next RPM. We plan to release it in a few weeks.

                                    Maelstrom96M 1 Reply Last reply Reply Quote 0
                                    • Maelstrom96M Offline
                                      Maelstrom96 @ronan-a
                                      last edited by Maelstrom96

                                      @ronan-a I will be testing my theory a little bit later today, but I believe it might be a hostname mismatch between the node name it expects in linstor and what it set to now on Dom0. We had the hostname of the node updated before the cluster was spinned up, but I think it still had the previous name active when the linstor SR was created.

                                      This means that the node name doesn't match here:
                                      https://github.com/xcp-ng/sm/blob/e951676098c80e6da6de4d4653f496b15f5a8cb9/drivers/linstorvolumemanager.py#L2641C21-L2641C41

                                      I will try to revert the hostname and see if it fixes everything.

                                      Edit: Just tested and reverted the hostname to the default one, which matches what's in linstor, and it works again. So seems like changing a hostname after the cluster is provisionned is a no-no.

                                      ronan-aR 1 Reply Last reply Reply Quote 0
                                      • ronan-aR Offline
                                        ronan-a Vates 🪐 XCP-ng Team @Maelstrom96
                                        last edited by

                                        @Maelstrom96 Oh! This explanation makes sense, thank you. 🙂 Yes in case of change of hostname, the LINSTOR node name must also be modified, otherwise the path to the database resource will not be found.

                                        Maelstrom96M F 2 Replies Last reply Reply Quote 0
                                        • Maelstrom96M Offline
                                          Maelstrom96 @ronan-a
                                          last edited by

                                          @ronan-a Do you know of a way to update a node name in Linstor? I've tried to look in their documentation and checked through CLI commands but couldn't find a way.

                                          ronan-aR 1 Reply Last reply Reply Quote 0
                                          • ronan-aR Offline
                                            ronan-a Vates 🪐 XCP-ng Team @Maelstrom96
                                            last edited by

                                            @Maelstrom96 Well there is no simple helper to do that using the CLI.

                                            So you can create a new node using:

                                            linstor node create --node-type Combined <NAME> <IP>
                                            

                                            Then you must evacuate the old node to preserve the replication count:

                                            linstor node evacuate <OLD_NAME>
                                            

                                            Next, you can change your hostname an restart the services on each host:

                                            systemctl stop linstor-controller
                                            systemctl restart linstor-satellites
                                            

                                            Finally you can delete the node:

                                            linstor node delete <OLD_NAME>
                                            

                                            After that you must recreate the diskless resources if necessary. Exec linstor advise r to see the commands to execute.

                                            Maelstrom96M 1 Reply Last reply Reply Quote 2
                                            • First post
                                              Last post