XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    XOSTOR hyperconvergence preview

    Scheduled Pinned Locked Moved XOSTOR
    457 Posts 50 Posters 531.9k Views 53 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • A Offline
      AudleyElwine
      last edited by AudleyElwine

      Hey @ronan-a , Now all the VDIs on the device are broken, I tried to migrate them but i get errors such as.

      SR_BACKEND_FAILURE_1200(, Cannot update volume uuid 36a23780-2025-4f3f-bade-03c410e63368 to 45537c14-0125-4f6c-a1ad-476552888087: this last one is not empty, )
      
      SR_BACKEND_FAILURE_78(, VDI Creation failed [opterr=error Error: Could not set kv(/volume/9cdc83cc-0fd8-490e-a3af-2ca40c95f398/not-exists:2): ERRO:Exception thrown.], )
      
      SR_BACKEND_FAILURE_46(, The VDI is not available [opterr=Plugin linstor-manager failed], )
      

      I dont care about the broken VDIs content so no worries.
      It was fun experimenting with it, but I need more storage and will move the SSDs to my NAS and run my VMs on NFS there instead.
      Before I do so I thought you might be interested in debugging the issues and getting my logs if that will help the project. Just let me know what files I need to send and will be happy to do so.

      ronan-aR 1 Reply Last reply Reply Quote 0
      • olivierlambertO Online
        olivierlambert Vates 🪐 Co-Founder CEO
        last edited by

        Hey @AudleyElwine Ronan will take a look next week. It might be a bug we already fixed for our next beta round. He'll tell you šŸ™‚

        1 Reply Last reply Reply Quote 0
        • Maelstrom96M Offline
          Maelstrom96
          last edited by

          Hi @ronan-a ,

          So like we said at some point, we're using a K8s cluster that is connecting to the linstor directly. It's actually going surprisingly well, and we've even deployed that in production with contingency plans in case of failure, but it's been rock solid for now.

          We're working on setting up Velero to automatically backup all of our K8s cluster metadata along with the PVs for easy Disaster Recovery, but we've hit a unfortunate blocker. Here is what we're getting from Velero when attempting to do the backup/snapshot:

          error:
              message: 'Failed to check and update snapshot content: failed to take snapshot
                of the volume pvc-3602bca1-5b92-4fc7-96af-ce77f35e802c: "rpc error: code = Internal
                desc = failed to create snapshot: error creating S3 backup: Message: ''LVM_THIN
                based backup shipping requires at least version 2.24 for setsid from util_linux''
                next error: Message: ''LVM_THIN based backup shipping requires support for thin_send_recv''
                next error: Message: ''Backup shipping of resource ''pvc-3602bca1-5b92-4fc7-96af-ce77f35e802c''
                cannot be started since there is no node available that supports backup shipping.''"'
          

          It looks like when using thin volumes, we can't actually run a backup. We've checked and the current version of setsid is 2.23.2 on xcp-ng :

          [12:57 ovbh-pprod-xen12 ~]# setsid --v
          setsid from util-linux 2.23.2
          

          We know that updating a package directly is a pretty bad idea, so I'm wondering if you have an idea on what we could do to solve this, or if this will be updated with other XCP-ng updates?

          Thanks in advance for you time!

          P.S: We're working on a full post on how we went about deploying our full K8s linstor CSI setup for other people if anyone is interested in that.

          ronan-aR 1 Reply Last reply Reply Quote 1
          • ronan-aR Offline
            ronan-a Vates 🪐 XCP-ng Team @AudleyElwine
            last edited by

            @AudleyElwine I think I fixed this issue recently, it's generally caused by a bad snapshot. After that there is a problem to rename it. I will update the packages, thank you for the report. šŸ˜‰

            1 Reply Last reply Reply Quote 0
            • ronan-aR Offline
              ronan-a Vates 🪐 XCP-ng Team @Maelstrom96
              last edited by

              @Maelstrom96 I'm not sure to have a solution, we use VHDs on the top of the LVM/DRBD layer, so only vhd-util + linstor commands are required on our side to backup/snap data.

              1 Reply Last reply Reply Quote 0
              • ronan-aR Offline
                ronan-a Vates 🪐 XCP-ng Team
                last edited by ronan-a

                ⚠ UPDATE AND IMPORTANT INFO ⚠

                I am updating the LINSTOR packages on our repositories.
                This update fixes many issues, especially regarding the HA.

                However, this update is not compatible with the LINSTOR SRs already configured, so it is necessary to DELETE the existing SRs before installing this update.
                We exceptionally allow ourselves to force a reinstallation during this beta, as long as we haven't officially released a production version.
                In theory, this should not happen again.

                To resume:
                1 - Uninstall any existing LINSTOR SR.
                2 - Install the new sm package: "sm-2.30.7-1.3.0.linstor.3.xcpng8.2.x86_64" on all used hosts.
                3 - Reinstall the LINSTOR SR.

                Thank you ! šŸ™‚

                Maelstrom96M A 2 Replies Last reply Reply Quote 3
                • Maelstrom96M Offline
                  Maelstrom96 @ronan-a
                  last edited by

                  @ronan-a I've checked the commit history and saw that the breaking change seems to be related to the renaming of the KV store. Also just noticed that you renamed the volume namespace. Is there any other breaking changes that would require the deletion of the SR in order to update the sm package?

                  I've made a python code that makes a copy of all the old KV data to the new KV name, along with renaming the key names for the volume data and was wondering if that would be sufficient.

                  Thanks,

                  ronan-aR 1 Reply Last reply Reply Quote 1
                  • ronan-aR Offline
                    ronan-a Vates 🪐 XCP-ng Team @Maelstrom96
                    last edited by ronan-a

                    @Maelstrom96 There are two important changes yes: the renaming of the KV store and the XCP volume namespace. In theory you can copy the data of your old KV store to the new one, it should be enough.

                    However I prefer to be sure that all those who test the driver use a stable version with a new SR to avoid surprises that I would have forgotten. In practice, if you haven't had any problems, a migration script may suffice.

                    Also, you must be sure there is no running tasks like snapshots, coalesce, etc. Otherwise you can have trouble during the update.

                    Maelstrom96M 1 Reply Last reply Reply Quote 1
                    • Maelstrom96M Offline
                      Maelstrom96 @ronan-a
                      last edited by

                      @ronan-a Perfect, thanks a lot for your input šŸ™‚

                      1 Reply Last reply Reply Quote 0
                      • TheiLLeniumStudiosT Offline
                        TheiLLeniumStudios
                        last edited by

                        Is there a way to remove Linstor / XOSTOR entirely? I've been experimenting a little bit with the latest update and it looks like it takes a really really long time to run VMs on the shared SR created by XOSTOR and mass VM creation (tested with 6 VMs using terraform) also fails with "code":"TOO_MANY_STORAGE_MIGRATES","params":["3"]

                        ronan-aR 1 Reply Last reply Reply Quote 0
                        • TheiLLeniumStudiosT Offline
                          TheiLLeniumStudios
                          last edited by

                          Or reprovision it? I removed the SR from the XOA but the actual disk still seems to have the filesystem intact. Also I was running the XOSTOR as the HA SR. Maybe that caused it to become really slow and also I had the affinity set to disabled

                          TheiLLeniumStudiosT 1 Reply Last reply Reply Quote 0
                          • TheiLLeniumStudiosT Offline
                            TheiLLeniumStudios @TheiLLeniumStudios
                            last edited by

                            I'm also seeing this in the xcp-rrdd-plugins.log:

                            Nov 12 03:49:42 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds...
                            Nov 12 03:49:57 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds...
                            Nov 12 03:50:12 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds...
                            Nov 12 03:50:27 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds...
                            Nov 12 03:50:37 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds...
                            Nov 12 03:50:52 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds...
                            Nov 12 03:51:07 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds...
                            Nov 12 03:51:22 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds...
                            Nov 12 03:51:33 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds...
                            Nov 12 03:51:48 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds...
                            Nov 12 03:52:03 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds...
                            Nov 12 03:52:18 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds...
                            Nov 12 03:52:28 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds...
                            Nov 12 03:52:43 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds...
                            Nov 12 03:52:58 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds...
                            Nov 12 03:53:13 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds...
                            Nov 12 03:53:23 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds...
                            Nov 12 03:53:38 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds...
                            Nov 12 03:53:53 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds...
                            Nov 12 03:54:08 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds...
                            Nov 12 03:54:18 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds...
                            Nov 12 03:54:33 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds...
                            Nov 12 03:54:48 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds...
                            Nov 12 03:55:03 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds...
                            Nov 12 03:55:13 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds...
                            Nov 12 03:55:28 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds...
                            Nov 12 03:55:43 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds...
                            Nov 12 03:55:58 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds...
                            Nov 12 03:56:09 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds...
                            Nov 12 03:56:24 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds...
                            Nov 12 03:56:39 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds...
                            Nov 12 03:56:54 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds...
                            Nov 12 03:57:04 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds...
                            Nov 12 03:57:19 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds...
                            Nov 12 03:57:34 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds...
                            Nov 12 03:57:49 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds...
                            Nov 12 03:57:59 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds...
                            Nov 12 03:58:14 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds...
                            Nov 12 03:58:29 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds...
                            Nov 12 03:58:44 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds...
                            Nov 12 03:58:54 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds...
                            Nov 12 03:59:09 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds...
                            Nov 12 03:59:24 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds...
                            Nov 12 03:59:39 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds...
                            Nov 12 03:59:49 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds...
                            Nov 12 04:00:04 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds...
                            Nov 12 04:00:20 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds...
                            Nov 12 04:00:35 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds...
                            Nov 12 04:00:45 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds...
                            Nov 12 04:01:00 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds...
                            Nov 12 04:01:15 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds...
                            Nov 12 04:01:30 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds...
                            Nov 12 04:01:40 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds...
                            Nov 12 04:01:55 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds...
                            

                            Not sure if its related

                            1 Reply Last reply Reply Quote 0
                            • ronan-aR Offline
                              ronan-a Vates 🪐 XCP-ng Team @TheiLLeniumStudios
                              last edited by

                              @TheiLLeniumStudios said in XOSTOR hyperconvergence preview:

                              Is there a way to remove Linstor / XOSTOR entirely?

                              If you destroyed the SR, you can remove the linstor packages and force a reinstallation of a stable sm package using yum.

                              I've been experimenting a little bit with the latest update and it looks like it takes a really really long time to run VMs on the shared SR created by XOSTOR and mass VM creation (tested with 6 VMs using terraform)

                              What do you mean? What's the performance problem? When the VM starts ? During execution?

                              also fails with "code":"TOO_MANY_STORAGE_MIGRATES","params":["3"]

                              Do you migrate with XO? Because this exception is normally handled by it, and a new migration is restarted a few moment later in this case.

                              Or reprovision it? I removed the SR from the XOA but the actual disk still seems to have the filesystem intact.

                              Did you just execute a xe sr-forget command on the SR? In this case the volumes are not removed. xe sr-destroy must be used to remove the volumes. So you can execute xe sr-introduce and then xe sr-destroy to clean your hosts.

                              Also I was running the XOSTOR as the HA SR. Maybe that caused it to become really slow and also I had the affinity set to disabled

                              I'm not sure, (but the HA was buggy before the last update).

                              I'm also seeing this in the xcp-rrdd-plugins.log:

                              No relation with LINSTOR here. šŸ˜‰

                              TheiLLeniumStudiosT 1 Reply Last reply Reply Quote 1
                              • A Offline
                                AudleyElwine
                                last edited by

                                Hey @ronan-a
                                I'm trying to remove the SR via XOA with "Remove this SR" button but im getting this error.

                                SR_NOT_EMPTY()
                                

                                And there is no VM connected to it. So i tried to delete the disks manually in it but I get this error

                                SR_HAS_NO_PBDS(OpaqueRef:6d7520e0-60fa-4b93-9dfe-aa7ceb3b17d2)
                                

                                Could you help me how to remove it? I dont care about its content so any way is okay for me since I want to reinstall it after i install the updates.

                                ronan-aR 1 Reply Last reply Reply Quote 0
                                • TheiLLeniumStudiosT Offline
                                  TheiLLeniumStudios @ronan-a
                                  last edited by

                                  @ronan-a said in XOSTOR hyperconvergence preview:

                                  Did you just execute a xe sr-forget command on the SR? In this case the volumes are not removed. xe sr-destroy must be used to remove the volumes. So you can execute xe sr-introduce and then xe sr-destroy to clean your hosts.

                                  I did it using XO interface. Now it doesn't show up and when I tried your suggestion of running xe sr-introduce, it just creates an "Unknown" SR and doesn't link it to the previous one. Running xe sr-create also doesn't help since that errors out with LINSTOR SR creation error [opterr=LINSTOR SR must be unique in a pool]

                                  Can you elaborate the steps for reintroducing a lost SR that is backed by linstor? I ran this command:

                                   xe sr-introduce type=linstor name-label=XOSTOR uuid=41ba4c11-8c13-30b3-fcbb-7668a39825a6
                                  

                                  The UUID is the original UUID of the XOSTOR SR which I found in my command history. And running xe sr-destroy after the above introduce leads to The SR has no attached PBDs. My disks look like this at the moment:

                                  NAME                                                                                              MAJ:MIN  RM   SIZE RO TYPE MOUNTPOINT
                                  drbd1016                                                                                          147:1016  0    10G  0 disk
                                  drbd1014                                                                                          147:1014  0    10G  0 disk
                                  sdb                                                                                                 8:16    0 238.5G  0 disk
                                  |-linstor_group-thin_device_tmeta                                                                 253:1     0   120M  0 lvm
                                  | `-linstor_group-thin_device-tpool                                                               253:3     0 238.2G  0 lvm
                                  |   |-linstor_group-xcp--persistent--redo--log_00000                                              253:10    0   260M  0 lvm
                                  |   | `-drbd1002                                                                                  147:1002  0 259.7M  0 disk
                                  |   |-linstor_group-xcp--persistent--database_00000                                               253:8     0     1G  0 lvm
                                  |   | `-drbd1000                                                                                  147:1000  0     1G  0 disk /var/lib/linstor
                                  |   |-linstor_group-thin_device                                                                   253:4     0 238.2G  0 lvm
                                  |   |-linstor_group-xcp--volume--13a94a7a--d433--4426--8232--812e3c6dc52e_00000                   253:11    0    10G  0 lvm
                                  |   | `-drbd1004                                                                                  147:1004  0    10G  0 disk
                                  |   |-linstor_group-xcp--persistent--ha--statefile_00000                                          253:9     0     8M  0 lvm
                                  |   | `-drbd1001                                                                                  147:1001  0     8M  0 disk
                                  |   |-linstor_group-xcp--volume--70bf80a2--a008--469a--a7db--0ea92fcfc392_00000                   253:5     0    20M  0 lvm
                                  |   | `-drbd1009                                                                                  147:1009  0    20M  0 disk
                                  |   `-linstor_group-xcp--volume--4b70d69b--9cca--4aa3--842f--09366ac76901_00000                   253:12    0    10G  0 lvm
                                  |     `-drbd1006                                                                                  147:1006  0    10G  0 disk
                                  `-linstor_group-thin_device_tdata                                                                 253:2     0 238.2G  0 lvm
                                    `-linstor_group-thin_device-tpool                                                               253:3     0 238.2G  0 lvm
                                      |-linstor_group-xcp--persistent--redo--log_00000                                              253:10    0   260M  0 lvm
                                      | `-drbd1002                                                                                  147:1002  0 259.7M  0 disk
                                      |-linstor_group-xcp--persistent--database_00000                                               253:8     0     1G  0 lvm
                                      | `-drbd1000                                                                                  147:1000  0     1G  0 disk /var/lib/linstor
                                      |-linstor_group-thin_device                                                                   253:4     0 238.2G  0 lvm
                                      |-linstor_group-xcp--volume--13a94a7a--d433--4426--8232--812e3c6dc52e_00000                   253:11    0    10G  0 lvm
                                      | `-drbd1004                                                                                  147:1004  0    10G  0 disk
                                      |-linstor_group-xcp--persistent--ha--statefile_00000                                          253:9     0     8M  0 lvm
                                      | `-drbd1001                                                                                  147:1001  0     8M  0 disk
                                      |-linstor_group-xcp--volume--70bf80a2--a008--469a--a7db--0ea92fcfc392_00000                   253:5     0    20M  0 lvm
                                      | `-drbd1009                                                                                  147:1009  0    20M  0 disk
                                      `-linstor_group-xcp--volume--4b70d69b--9cca--4aa3--842f--09366ac76901_00000                   253:12    0    10G  0 lvm
                                        `-drbd1006                                                                                  147:1006  0    10G  0 disk
                                  drbd1012                                                                                          147:1012  0    10G  0 disk
                                  tda                                                                                               254:0     0    10G  0 disk
                                  drbd1015                                                                                          147:1015  0    10G  0 disk
                                  drbd1005                                                                                          147:1005  0    20M  0 disk
                                  sda                                                                                                 8:0     0 223.6G  0 disk
                                  |-sda4                                                                                              8:4     0   512M  0 part /boot/efi
                                  |-sda2                                                                                              8:2     0    18G  0 part
                                  |-sda5                                                                                              8:5     0     4G  0 part /var/log
                                  |-sda3                                                                                              8:3     0 182.1G  0 part
                                  | `-XSLocalEXT--712c1f83--d11f--ae07--d2b8--14a823761e6e-712c1f83--d11f--ae07--d2b8--14a823761e6e 253:0     0 182.1G  0 lvm  /run/sr-mount/712c1f83-d11f-ae07-d2b8-14a823761e6e
                                  |-sda1                                                                                              8:1     0    18G  0 part /
                                  `-sda6                                                                                              8:6     0     1G  0 part [SWAP]
                                  tdb                                                                                               254:1     0    50G  0 disk
                                  

                                  @ronan-a said in XOSTOR hyperconvergence preview:

                                  What do you mean? What's the performance problem? When the VM starts ? During execution?

                                  So, with XOSTOR created and making the Pool HA with that SR, whenever I created a new VM in that SR and not choose any affinity host, it takes atleast 10-15 minutes to run a migrate task which shouldn't be necessary because XOSTOR is shared right? Or is my assumption not correct? And once the job is done, it doesn't even start all the VMs and some even disappeared from XO. I was creating the VMs using terraform and it spun up 6 of them at a time and since the migrate task started for all of them, I saw the error TOO_MANY_STORAGE_MIGRATES. Not really sure what's going on.

                                  I first thought it was my template VDI that wasn't on XOSTOR but I reuploaded the cloud image to XOSTOR but still got the same behavior. And I'm not even using spinning disks, 2 of hosts have NVMe drives and 1 of them involved in XOSTOR has an mSATA one

                                  ronan-aR 1 Reply Last reply Reply Quote 0
                                  • ronan-aR Offline
                                    ronan-a Vates 🪐 XCP-ng Team @TheiLLeniumStudios
                                    last edited by

                                    @TheiLLeniumStudios Better explanations, to reintroduce a LINSTOR SR, you can use these commands with you own parameters:

                                    Generate a new SR UUID.

                                    [10:18 r620-s1 ~]# uuidgen
                                    345adcd2-aa2b-44ad-9c25-788cf870db72
                                    
                                    [10:18 r620-s1 ~]# xe sr-introduce uuid=345adcd2-aa2b-44ad-9c25-788cf870db72 type=linstor name-label="XOSTOR" content-type=user
                                    345adcd2-aa2b-44ad-9c25-788cf870db72
                                    
                                    # Get host UUIDs.
                                    [10:18 r620-s1 ~]# xe host-list
                                    uuid ( RO)                : 888254e8-da05-4f86-ad37-979b8d6bad04
                                              name-label ( RW): R620-S2
                                        name-description ( RW): Default install
                                    
                                    
                                    uuid ( RO)                : c96ec4dd-28ac-4df4-b73c-4371bd202728
                                              name-label ( RW): R620-S1
                                        name-description ( RW): Default install
                                    
                                    
                                    uuid ( RO)                : ddcd3461-7052-4f5e-932c-e1ed75c192d6
                                              name-label ( RW): R620-S3
                                        name-description ( RW): Default install
                                    

                                    Create the PBDs using the same old config.

                                    [10:19 r620-s1 ~]# xe pbd-create sr-uuid=345adcd2-aa2b-44ad-9c25-788cf870db72 host-uuid=c96ec4dd-28ac-4df4-b73c-4371bd202728 device-config:hosts=r620-s1,r620-s2,r620-s3 device-config:group-name=linstor_group/thin_device device-config:redundancy=2 device-config:provisioning=thin
                                    1c5c030a-1823-d53a-d8df-6c50af6beb2b
                                    
                                    [10:19 r620-s1 ~]# xe pbd-create sr-uuid=345adcd2-aa2b-44ad-9c25-788cf870db72 host-uuid=888254e8-da05-4f86-ad37-979b8d6bad04 device-config:hosts=r620-s1,r620-s2,r620-s3 device-config:group-name=linstor_group/thin_device device-config:redundancy=2 device-config:provisioning=thin
                                    4c5df60a-f96d-19c2-44b0-f5951388d502
                                    
                                    [10:20 r620-s1 ~]# xe pbd-create sr-uuid=345adcd2-aa2b-44ad-9c25-788cf870db72 host-uuid=ddcd3461-7052-4f5e-932c-e1ed75c192d6 device-config:hosts=r620-s1,r620-s2,r620-s3 device-config:group-name=linstor_group/thin_device device-config:redundancy=2 device-config:provisioning=thin
                                    584d033c-7bad-ebc8-30dd-1888ea2bea29
                                    

                                    If you don't know what's your group name, you can use vgs/lvs, in my case I use thin provisioning, so I have an associated volume:

                                    [10:21 r620-s1 ~]# vgs
                                      VG            #PV #LV #SN Attr   VSize   VFree
                                      linstor_group   1   5   0 wz--n- 931.51g    0
                                    [10:21 r620-s1 ~]# lvs
                                      LV                                                    VG            Attr       LSize    Pool        Origin Data%  Meta%  Move Log Cpy%Sync Convert
                                      thin_device                                           linstor_group twi-aotz-- <931.28g                    0.29   10.55
                                      ...
                                    

                                    I did it using XO interface. Now it doesn't show up and when I tried your suggestion of running xe sr-introduce, it just creates an "Unknown" SR and doesn't link it to the previous one. Running xe sr-create also doesn't help since that errors out with LINSTOR SR creation error [opterr=LINSTOR SR must be unique in a pool]

                                    If you have this error, the LINSTOR PBDs still exist. Are you sure you forgot the previous SR? šŸ˜›

                                    So, with XOSTOR created and making the Pool HA with that SR, whenever I created a new VM in that SR and not choose any affinity host, it takes atleast 10-15 minutes to run a migrate task which shouldn't be necessary because XOSTOR is shared right?

                                    Regarding the migration, it's correct if the SR is created with shared=true, and if you migrate the VM between two hosts with the same SR used, the migration should be short.
                                    After repairing your SR, you can do a VM migration and send me the logs of the machines if you want, I can take a look. šŸ˜‰

                                    TheiLLeniumStudiosT 1 Reply Last reply Reply Quote 0
                                    • ronan-aR Offline
                                      ronan-a Vates 🪐 XCP-ng Team @AudleyElwine
                                      last edited by

                                      @AudleyElwine If you forgot your SR, you can follow the instructions I gave in the previous message. Otherwise, check if the PBDs are correctly plugged to the SR, it's probably not the case. šŸ™‚

                                      A 1 Reply Last reply Reply Quote 0
                                      • A Offline
                                        AudleyElwine @ronan-a
                                        last edited by

                                        @ronan-a I did not forget the SR so yeah it is the PBDs.
                                        I tried to plug them back with

                                        xe pbd-plug uuid=...
                                        

                                        taking the uuid from the

                                        xe pbd-list sr-uuid=xostor-uuid
                                        

                                        I was able to plug three hosts, however the last forth host says the following.

                                        Error code: SR_BACKEND_FAILURE_1200
                                        Error parameters: , Cannot update volume uuid 36a23780-2025-4f3f-bade-03c410e63368 to 45537c14-0125-4f6c-a1ad-476552888087: this last one is not empty,
                                        

                                        What do you think I should do to make the forth host pbd connect to delete the SR correctly?

                                        ronan-aR 1 Reply Last reply Reply Quote 0
                                        • TheiLLeniumStudiosT Offline
                                          TheiLLeniumStudios @ronan-a
                                          last edited by

                                          @ronan-a I just tried to reintroduce the SR and I got no errors while running xe pdb-create but it still shows up as a -1 Size SR. I think I might have corrupted the metadata as checking lvs, vgs and pvs throw errors:

                                          [11:09 xcp-ng-node-1 ~]# lvs
                                            /dev/drbd1014: open failed: No data available
                                            LV                                                    VG                                              Attr       LSize    Pool        Origin Data%  Meta%  Move Log Cpy%Sync Convert
                                            712c1f83-d11f-ae07-d2b8-14a823761e6e                  XSLocalEXT-712c1f83-d11f-ae07-d2b8-14a823761e6e -wi-ao---- <182.06g                                                           
                                            thin_device                                           linstor_group                                   twi-aotz-- <238.24g                    1.64   11.27                           
                                            xcp-persistent-database_00000                         linstor_group                                   Vwi-aotz--    1.00g thin_device        0.84                                   
                                            xcp-persistent-ha-statefile_00000                     linstor_group                                   Vwi-aotz--    8.00m thin_device        6.25                                   
                                            xcp-persistent-redo-log_00000                         linstor_group                                   Vwi-aotz--  260.00m thin_device        0.53                                   
                                            xcp-volume-13a94a7a-d433-4426-8232-812e3c6dc52e_00000 linstor_group                                   Vwi-aotz--   10.03g thin_device        0.14                                   
                                            xcp-volume-4b70d69b-9cca-4aa3-842f-09366ac76901_00000 linstor_group                                   Vwi-aotz--   10.03g thin_device        38.67                                  
                                            xcp-volume-70bf80a2-a008-469a-a7db-0ea92fcfc392_00000 linstor_group                                   Vwi-aotz--   20.00m thin_device        71.88                                  
                                          [11:09 xcp-ng-node-1 ~]# vgs
                                            /dev/drbd1014: open failed: No data available
                                            VG                                              #PV #LV #SN Attr   VSize    VFree
                                            XSLocalEXT-712c1f83-d11f-ae07-d2b8-14a823761e6e   1   1   0 wz--n- <182.06g    0 
                                            linstor_group                                     1   7   0 wz--n-  238.47g    0 
                                          [11:09 xcp-ng-node-1 ~]# pvs
                                            /dev/drbd1014: open failed: No data available
                                            PV         VG                                              Fmt  Attr PSize    PFree
                                            /dev/sda3  XSLocalEXT-712c1f83-d11f-ae07-d2b8-14a823761e6e lvm2 a--  <182.06g    0 
                                            /dev/sdb   linstor_group                                   lvm2 a--   238.47g    0 
                                          [11:09 xcp-ng-node-1 ~]# lsblk
                                          NAME                                                     MAJ:MIN  RM   SIZE RO TYPE MOUNTPOINT
                                          drbd1016                                                 147:1016  0    10G  0 disk 
                                          drbd1014                                                 147:1014  0    10G  0 disk 
                                          sdb                                                        8:16    0 238.5G  0 disk 
                                          |-linstor_group-thin_device_tmeta                        253:1     0   120M  0 lvm  
                                          | `-linstor_group-thin_device-tpool                      253:3     0 238.2G  0 lvm  
                                          |   |-linstor_group-xcp--persistent--redo--log_00000     253:10    0   260M  0 lvm  
                                          |   | `-drbd1002                                         147:1002  0 259.7M  0 disk 
                                          |   |-linstor_group-xcp--persistent--database_00000      253:8     0     1G  0 lvm  
                                          |   | `-drbd1000                                         147:1000  0     1G  0 disk /var/lib/linstor
                                          |   |-linstor_group-thin_device                          253:4     0 238.2G  0 lvm  
                                          |   |-linstor_group-xcp--volume--13a94a7a--d433--4426--8232--812e3c6dc52e_00000
                                                                                                   253:11    0    10G  0 lvm  
                                          |   | `-drbd1004                                         147:1004  0    10G  0 disk 
                                          |   |-linstor_group-xcp--persistent--ha--statefile_00000 253:9     0     8M  0 lvm  
                                          |   | `-drbd1001                                         147:1001  0     8M  0 disk 
                                          |   |-linstor_group-xcp--volume--70bf80a2--a008--469a--a7db--0ea92fcfc392_00000
                                                                                                   253:5     0    20M  0 lvm  
                                          |   | `-drbd1009                                         147:1009  0    20M  0 disk 
                                          |   `-linstor_group-xcp--volume--4b70d69b--9cca--4aa3--842f--09366ac76901_00000
                                                                                                   253:12    0    10G  0 lvm  
                                          |     `-drbd1006                                         147:1006  0    10G  0 disk 
                                          `-linstor_group-thin_device_tdata                        253:2     0 238.2G  0 lvm  
                                            `-linstor_group-thin_device-tpool                      253:3     0 238.2G  0 lvm  
                                              |-linstor_group-xcp--persistent--redo--log_00000     253:10    0   260M  0 lvm  
                                              | `-drbd1002                                         147:1002  0 259.7M  0 disk 
                                              |-linstor_group-xcp--persistent--database_00000      253:8     0     1G  0 lvm  
                                              | `-drbd1000                                         147:1000  0     1G  0 disk /var/lib/linstor
                                              |-linstor_group-thin_device                          253:4     0 238.2G  0 lvm  
                                              |-linstor_group-xcp--volume--13a94a7a--d433--4426--8232--812e3c6dc52e_00000
                                                                                                   253:11    0    10G  0 lvm  
                                              | `-drbd1004                                         147:1004  0    10G  0 disk 
                                              |-linstor_group-xcp--persistent--ha--statefile_00000 253:9     0     8M  0 lvm  
                                              | `-drbd1001                                         147:1001  0     8M  0 disk 
                                              |-linstor_group-xcp--volume--70bf80a2--a008--469a--a7db--0ea92fcfc392_00000
                                                                                                   253:5     0    20M  0 lvm  
                                              | `-drbd1009                                         147:1009  0    20M  0 disk 
                                              `-linstor_group-xcp--volume--4b70d69b--9cca--4aa3--842f--09366ac76901_00000
                                                                                                   253:12    0    10G  0 lvm  
                                                `-drbd1006                                         147:1006  0    10G  0 disk 
                                          drbd1012                                                 147:1012  0    10G  0 disk 
                                          tda                                                      254:0     0    10G  0 disk 
                                          drbd1015                                                 147:1015  0    10G  0 disk 
                                          drbd1005                                                 147:1005  0    20M  0 disk 
                                          sda                                                        8:0     0 223.6G  0 disk 
                                          |-sda4                                                     8:4     0   512M  0 part /boot/efi
                                          |-sda2                                                     8:2     0    18G  0 part 
                                          |-sda5                                                     8:5     0     4G  0 part /var/log
                                          |-sda3                                                     8:3     0 182.1G  0 part 
                                          | `-XSLocalEXT--712c1f83--d11f--ae07--d2b8--14a823761e6e-712c1f83--d11f--ae07--d2b8--14a823761e6e
                                                                                                   253:0     0 182.1G  0 lvm  /run/sr-mount/712c1f83-d11f-ae07-d2b8-14a82376
                                          |-sda1                                                     8:1     0    18G  0 part /
                                          `-sda6                                                     8:6     0     1G  0 part [SWAP]
                                          tdb                                                      254:1     0    50G  0 disk 
                                          [11:09 xcp-ng-node-1 ~]# 
                                          

                                          Is it possible to clean up the partition table and recreate it some other way without having to reinstall xcp-ng on the machines? As using wipefs -a says that the device is in use so I cannot wipe the partitions

                                          ronan-aR 1 Reply Last reply Reply Quote 0
                                          • ronan-aR Offline
                                            ronan-a Vates 🪐 XCP-ng Team @AudleyElwine
                                            last edited by ronan-a

                                            @AudleyElwine said in XOSTOR hyperconvergence preview:

                                            Ho! Sounds like a bug fixed in the latest beta... In this case, ensure there is no VM running, and download this script:

                                            wget https://gist.githubusercontent.com/Wescoeur/3b5c399b15c4d700b4906f12b51e2591/raw/452acd9ebcd52c62020e796302c681590b37cd3f/gistfile1.txt -O linstor-kv-tool && chmod +x linstor-kv-tool
                                            

                                            Find where is the running linstor-controller, so execute this command on each host:

                                            [11:13 r620-s1 ~]# mountpoint /var/lib/linstor
                                            /var/lib/linstor is a mountpoint
                                            

                                            If it's a mounpoint, you found it. Now, you must execute the script using the local IP of this host, for example:

                                            ./linstor-kv-tool --dump-volumes -u 172.16.210.16 -g xcp-sr-linstor_group_thin_device
                                            

                                            The group to use is equal to: <VG_name>_<LV_thin_name>. Or just <VG_name> if you don't use thin provisioning.
                                            Note: there was a bug in the previous beta, you must double the xcp-sr- prefix. (Example: xcp-sr-xcp-sr-linstor_group_thin_device) šŸ˜‰

                                            So if you have an output using this script with many entries, you can run --remove-all-volumes instead of --dump-volumes. This command should remove the properties in the LINSTOR KV-store. After that you can dump a new time to verify.

                                            Now, you can execute a scan on the SR. After that, it's necessary to remove all resource definitions using the linstor binary.

                                            Get the list using:

                                            linstor --controllers=<CONTROLLER_IP> resource-definition list
                                            ╭───────────────────────────────────────────────────────────────────────────────────────────────────╮
                                            ā”Š ResourceName                                    ā”Š Port ā”Š ResourceGroup                    ā”Š State ā”Š
                                            ā•žā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•”
                                            ā”Š xcp-persistent-database                         ā”Š 7000 ā”Š xcp-sr-linstor_group_thin_device ā”Š ok    ā”Š
                                            ā”Š xcp-volume-0db304a1-89a2-45df-a39d-7c5c39a87c5f ā”Š 7006 ā”Š xcp-sr-linstor_group_thin_device ā”Š ok    ā”Š
                                            ā”Š xcp-volume-6289f306-ab2b-4388-a5a2-a20ba18698f8 ā”Š 7005 ā”Š xcp-sr-linstor_group_thin_device ā”Š ok    ā”Š
                                            ā”Š xcp-volume-73b9a396-c67f-48b3-8774-f60f1c2af598 ā”Š 7001 ā”Š xcp-sr-linstor_group_thin_device ā”Š ok    ā”Š
                                            ā”Š xcp-volume-a46393ef-428d-4af8-9c0e-30b0108bd21a ā”Š 7003 ā”Š xcp-sr-linstor_group_thin_device ā”Š ok    ā”Š
                                            ā”Š xcp-volume-b83db8cf-ea3b-47aa-ad77-89b5cd9a1853 ā”Š 7002 ā”Š xcp-sr-linstor_group_thin_device ā”Š ok    ā”Š
                                            ╰───────────────────────────────────────────────────────────────────────────────────────────────────╯
                                            

                                            Then execute linstor resource-definition delete <VOLUME> on each volume. But don't do that on the xcp-persistent-database, only on xcp-volume-XXX! šŸ™‚

                                            Normally after all these steps, you can destroy the SR properly! I think I will write an automated version for later, like linstor-emergency-destroy. šŸ˜‰

                                            A 1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post