XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    XOSTOR hyperconvergence preview

    Scheduled Pinned Locked Moved XOSTOR
    459 Posts 50 Posters 761.7k Views 53 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • olivierlambertO Offline
      olivierlambert Vates 🪐 Co-Founder CEO
      last edited by

      Ping @ronan-a or @dthenot

      1 Reply Last reply Reply Quote 0
      • ronan-aR Offline
        ronan-a Vates 🪐 XCP-ng Team @peter_webbird
        last edited by ronan-a

        @peter_webbird We've already had feedback on CBT and LINSTOR/DRBD, we don't necessarily recommend enabling it. We have a blocking dev card regarding a bug with LVM lvchange command that may fail on CBT volumes used by a XOSTOR SR. We also have other issues related to migration with CBT.

        1 Reply Last reply Reply Quote 2
        • G Offline
          gb.123
          last edited by

          @ronan-a @dthenot @Team-Storage

          Guys, Can you please clarify which method to use for installing XOSTOR in XCP-ng 8.3 ?

          Simple :

          yum install xcp-ng-linstor
          yum install xcp-ng-release-linstor
          ./install --disks /dev/nvme0n1 --thin
          

          Or the script in the first post ?
          Or Some other script ?

          dthenotD 1 Reply Last reply Reply Quote 2
          • dthenotD Offline
            dthenot Vates 🪐 XCP-ng Team @gb.123
            last edited by

            @gb.123 Hello,
            The instruction in the first post are still the way to go 🙂

            J 1 Reply Last reply Reply Quote 3
            • J Offline
              JeffBerntsen Top contributor @dthenot
              last edited by

              @dthenot said in XOSTOR hyperconvergence preview:

              @gb.123 Hello,
              The instruction in the first post are still the way to go 🙂

              I'm curious about that as well but the first post says that the installation script is only compatible with 8.2 and doesn't mention 8.3. Is that still the case or is the installation script now compatible with 8.3 as well? If not, is there an installation script that is compatible with 8.3?

              I know that using XO is the recommended method for installation but I'm interested in an installation script as I would like to try to integrate XOSTOR installation into an XCP-ng installation script I already have which runs via PXE boot.

              dthenotD 1 Reply Last reply Reply Quote 0
              • dthenotD Offline
                dthenot Vates 🪐 XCP-ng Team @JeffBerntsen
                last edited by

                @JeffBerntsen That's why I meant, the way to install written in the first post still work in 8.3, the script still work as expected also, it basically only create the VG/LV needed on hosts before you create the SR.

                J 1 Reply Last reply Reply Quote 0
                • J Offline
                  JeffBerntsen Top contributor @dthenot
                  last edited by

                  @dthenot said in XOSTOR hyperconvergence preview:

                  @JeffBerntsen That's why I meant, the way to install written in the first post still work in 8.3, the script still work as expected also, it basically only create the VG/LV needed on hosts before you create the SR.

                  Got it. Thanks!

                  henri9813H 1 Reply Last reply Reply Quote 0
                  • henri9813H Offline
                    henri9813 @JeffBerntsen
                    last edited by henri9813

                    Hello,

                    I plan to install my XOSTOR cluster on a pool of 7 nodes with 3 replicas, but not all nodes at once because disks are in use.
                    consider:

                    • node1
                    • node2
                    • node ...
                    • node 5
                    • node 6
                    • node 7.

                    with 2 disks on each

                    • sda: 128GB for the OS
                    • sdb: 1TB for local sr ( for now 😄 )

                    I emptied node 6 & 7.

                    so, here is what i plan to do:

                    • On ALL NODES: setup linstor packages

                    Run the install script on node 6 & 7 to add their disks
                    so:

                    node6# install.sh --disks /dev/sdb
                    node7# install.sh --disks /dev/sdb
                    

                    Then, configure the SR and the linstor plugin manager as the following

                    xe sr-create \ 
                        type=linstor name-label=pool-01 \
                        host-uuid=XXXX \
                        device-config:group-name=linstor_group/thin_device device-config:redundancy=3 shared=true device-config:provisioning=thin
                    

                    Normally, i should have a linstor cluster running of 2 nodes ( 2 satellite and one controller randomly placed ) with only 2 disks and then, only 2/3 working replicas.

                    The cluster SHOULD be usable ( i'm right on this point ? )

                    The next step, would be to move VM from node 5 on it to evacuate node 5. and then add it to the cluster by the following

                    node5# install.sh --disks /dev/sdb
                    node5# xe host-call-plugin \
                      host-uuid=node5-uuid \
                      plugin=linstor-manager \
                      fn=addHost args:groupName=linstor_group/thin_device
                    

                    That should deploy satelite on node 5 and add the disk.

                    I normally should have 3/3 working replicas and can start to deploy others nodes progressively.

                    I'm right on the process ?

                    aS mentionned in the discord, i will post my feedbacks and results from my setup once i finalized it. ( maybe thought a blog post somewhere ).

                    Thanks to provide xostor in opensource, it's clearly the missing piece for this virtualization stack in opensource ( vs proxmox )

                    1 Reply Last reply Reply Quote 0
                    • henri9813H henri9813 referenced this topic on
                    • J Offline
                      Jonathon @Jonathon
                      last edited by

                      I have amazing news!

                      After the upgrade to xcp-ng 8.3, I retested velero backup, and it all just works 😁

                      Completed Backup

                      jonathon@jonathon-framework:~$ velero --kubeconfig k8s_configs/production.yaml backup describe grafana-test
                      Name:         grafana-test
                      Namespace:    velero
                      Labels:       objectset.rio.cattle.io/hash=c2b5f500ab5d9b8ffe14f2c70bf3742291df565c
                                    velero.io/storage-location=default
                      Annotations:  objectset.rio.cattle.io/applied=H4sIAAAAAAAA/4SSQW/bPgzFvwvPtv9OajeJj/8N22HdBqxFL0MPlEQlWmTRkOhgQ5HvPsixE2yH7iji8ffIJ74CDu6ZYnIcoIMTeYpcOf7vtIICji4Y6OB/1MdxgAJ6EjQoCN0rYAgsKI5Dyk9WP0hLIqmi40qjiKfMcRlAq7pBY+py26qmbEi15a5p78vtaqe0oqbVVsO5AI+K/Ju4A6YDdKDXqrVtXaNqzU5traVVY9d6Uyt7t2nW693K2Pa+naABe4IO9hEtBiyFksClmgbUdN06a9NAOtvr5B4DDunA8uR64lGgg7u6rxMUYMji6OWZ/dhTeuIPaQ6os+gTFUA/tR8NmXd+TELxUfNA5hslHqOmBN13OF16ZwvNQShIqpZClYQj7qk6blPlGF5uzC/L3P+kvok7MB9z0OcCXPiLPLHmuLLWCfVfB4rTZ9/iaA5zHovNZz7R++k6JI50q89BXcuXYR5YT0DolkChABEPHWzW9cK+rPQx8jgsH/KQj+QT/frzXCdduc/Ca9u1Y7aaFvMu5Ang5Xz+HQAA//8X7Fu+/QIAAA
                                    objectset.rio.cattle.io/id=e104add0-85b4-4eb5-9456-819bcbe45cfc
                                    velero.io/resource-timeout=10m0s
                                    velero.io/source-cluster-k8s-gitversion=v1.33.4+rke2r1
                                    velero.io/source-cluster-k8s-major-version=1
                                    velero.io/source-cluster-k8s-minor-version=33
                      
                      Phase:  Completed
                      
                      
                      Namespaces:
                        Included:  grafana
                        Excluded:  <none>
                      
                      Resources:
                        Included cluster-scoped:    <none>
                        Excluded cluster-scoped:    volumesnapshotcontents.snapshot.storage.k8s.io
                        Included namespace-scoped:  *
                        Excluded namespace-scoped:  volumesnapshots.snapshot.storage.k8s.io
                      
                      Label selector:  <none>
                      
                      Or label selector:  <none>
                      
                      Storage Location:  default
                      
                      Velero-Native Snapshot PVs:  true
                      Snapshot Move Data:          true
                      Data Mover:                  velero
                      
                      TTL:  720h0m0s
                      
                      CSISnapshotTimeout:    30m0s
                      ItemOperationTimeout:  4h0m0s
                      
                      Hooks:  <none>
                      
                      Backup Format Version:  1.1.0
                      
                      Started:    2025-10-15 15:29:52 -0700 PDT
                      Completed:  2025-10-15 15:31:25 -0700 PDT
                      
                      Expiration:  2025-11-14 14:29:52 -0800 PST
                      
                      Total items to be backed up:  35
                      Items backed up:              35
                      
                      Backup Item Operations:  1 of 1 completed successfully, 0 failed (specify --details for more information)
                      Backup Volumes:
                        Velero-Native Snapshots: <none included>
                      
                        CSI Snapshots:
                          grafana/central-grafana:
                            Data Movement: included, specify --details for more information
                      
                        Pod Volume Backups: <none included>
                      
                      HooksAttempted:  0
                      HooksFailed:     0
                      

                      Completed Restore

                      jonathon@jonathon-framework:~$ velero --kubeconfig k8s_configs/production.yaml restore describe restore-grafana-test --details
                      Name:         restore-grafana-test
                      Namespace:    velero
                      Labels:       objectset.rio.cattle.io/hash=252addb3ed156c52d9fa9b8c045b47a55d66c0af
                      Annotations:  objectset.rio.cattle.io/applied=H4sIAAAAAAAA/3yRTW7zIBBA7zJrO5/j35gzfE2rtsomymIM45jGBgTjbKLcvaKJm6qL7kDwnt7ABdDpHfmgrQEBZxrJ25W2/85rSOCkjQIBrxTYeoIEJmJUyAjiAmiMZWRtTYhb232Q5EC88tquJDKPFEU6GlpUG5UVZdpUdZ6WZZ+niOtNWtR1SypvqC8buCYwYkfjn7oBwwAC8ipHpbqC1LqqZZWrtse228isrLqywapSdS0z7KPU4EQgwN+mSI8eezSYMgWG22lwKOl7/MgERzJmdChPs9veDL9IGfSbQRcGy+96IjszCCiyCRLQRo6zIrVd5AHEfuHhkIBmmp4d+a/3e9Dl8LPoCZ3T5hg7FvQRcR8nxt6XL7sAgv1MCZztOE+01P23cvmnPYzaxNtwuF4/AwAA//8k6OwC/QEAAA
                                    objectset.rio.cattle.io/id=9ad8d034-7562-44f2-aa18-3669ed27ef47
                      
                      Phase:                       Completed
                      Total items to be restored:  33
                      Items restored:              33
                      
                      Started:    2025-10-15 15:35:26 -0700 PDT
                      Completed:  2025-10-15 15:36:34 -0700 PDT
                      
                      Warnings:
                        Velero:     <none>
                        Cluster:    <none>
                        Namespaces:
                          grafana-restore:  could not restore, ConfigMap:elasticsearch-es-transport-ca-internal already exists. Warning: the in-cluster version is different than the backed-up version
                                            could not restore, ConfigMap:kube-root-ca.crt already exists. Warning: the in-cluster version is different than the backed-up version
                      
                      Backup:  grafana-test
                      
                      Namespaces:
                        Included:  grafana
                        Excluded:  <none>
                      
                      Resources:
                        Included:        *
                        Excluded:        nodes, events, events.events.k8s.io, backups.velero.io, restores.velero.io, resticrepositories.velero.io, csinodes.storage.k8s.io, volumeattachments.storage.k8s.io, backuprepositories.velero.io
                        Cluster-scoped:  auto
                      
                      Namespace mappings:  grafana=grafana-restore
                      
                      Label selector:  <none>
                      
                      Or label selector:  <none>
                      
                      Restore PVs:  true
                      
                      CSI Snapshot Restores:
                        grafana-restore/central-grafana:
                          Data Movement:
                            Operation ID: dd-ffa56e1c-9fd0-44b4-a8bb-8163f40a49e9.330b82fc-ca6a-423217ee5
                            Data Mover: velero
                            Uploader Type: kopia
                      
                      Existing Resource Policy:   <none>
                      ItemOperationTimeout:       4h0m0s
                      
                      Preserve Service NodePorts:  auto
                      
                      Restore Item Operations:
                        Operation for persistentvolumeclaims grafana-restore/central-grafana:
                          Restore Item Action Plugin:  velero.io/csi-pvc-restorer
                          Operation ID:                dd-ffa56e1c-9fd0-44b4-a8bb-8163f40a49e9.330b82fc-ca6a-423217ee5
                          Phase:                       Completed
                          Progress:                    856284762 of 856284762 complete (Bytes)
                          Progress description:        Completed
                          Created:                     2025-10-15 15:35:28 -0700 PDT
                          Started:                     2025-10-15 15:36:06 -0700 PDT
                          Updated:                     2025-10-15 15:36:26 -0700 PDT
                      
                      HooksAttempted:   0
                      HooksFailed:      0
                      
                      Resource List:
                        apps/v1/Deployment:
                          - grafana-restore/central-grafana(created)
                          - grafana-restore/grafana-debug(created)
                        apps/v1/ReplicaSet:
                          - grafana-restore/central-grafana-5448b9f65(created)
                          - grafana-restore/central-grafana-56887c6cb6(created)
                          - grafana-restore/central-grafana-56ddd4f497(created)
                          - grafana-restore/central-grafana-5f4757844b(created)
                          - grafana-restore/central-grafana-5f69f86c85(created)
                          - grafana-restore/central-grafana-64545dcdc(created)
                          - grafana-restore/central-grafana-69c66c54d9(created)
                          - grafana-restore/central-grafana-6c8d6f65b8(created)
                          - grafana-restore/central-grafana-7b479f79ff(created)
                          - grafana-restore/central-grafana-bc7d96cdd(created)
                          - grafana-restore/central-grafana-cb88bd49c(created)
                          - grafana-restore/grafana-debug-556845ff7b(created)
                          - grafana-restore/grafana-debug-6fb594cb5f(created)
                          - grafana-restore/grafana-debug-8f66bfbf6(created)
                        discovery.k8s.io/v1/EndpointSlice:
                          - grafana-restore/central-grafana-hkgd5(created)
                        networking.k8s.io/v1/Ingress:
                          - grafana-restore/central-grafana(created)
                        rbac.authorization.k8s.io/v1/Role:
                          - grafana-restore/central-grafana(created)
                        rbac.authorization.k8s.io/v1/RoleBinding:
                          - grafana-restore/central-grafana(created)
                        v1/ConfigMap:
                          - grafana-restore/central-grafana(created)
                          - grafana-restore/elasticsearch-es-transport-ca-internal(failed)
                          - grafana-restore/kube-root-ca.crt(failed)
                        v1/Endpoints:
                          - grafana-restore/central-grafana(created)
                        v1/PersistentVolume:
                          - pvc-e3f6578f-08b2-4e79-85f0-76bbf8985b55(skipped)
                        v1/PersistentVolumeClaim:
                          - grafana-restore/central-grafana(created)
                        v1/Pod:
                          - grafana-restore/central-grafana-cb88bd49c-fc5br(created)
                        v1/Secret:
                          - grafana-restore/fpinfra-net-cf-cert(created)
                          - grafana-restore/grafana(created)
                        v1/Service:
                          - grafana-restore/central-grafana(created)
                        v1/ServiceAccount:
                          - grafana-restore/central-grafana(created)
                          - grafana-restore/default(skipped)
                        velero.io/v2alpha1/DataUpload:
                          - velero/grafana-test-nw7zj(skipped)
                      

                      Image of working restore pod, with correct data in PV
                      34d87db1-19ae-4348-8d4e-6599375d7634-image.png

                      Velero installed from helm: https://vmware-tanzu.github.io/helm-charts
                      Version: velero:11.1.0
                      Values

                      ---
                      image:
                        repository: velero/velero
                        tag: v1.17.0
                      
                      # Whether to deploy the restic daemonset.
                      deployNodeAgent: true
                      
                      initContainers:
                         - name: velero-plugin-for-aws
                           image: velero/velero-plugin-for-aws:latest
                           imagePullPolicy: IfNotPresent
                           volumeMounts:
                             - mountPath: /target
                               name: plugins
                      
                      configuration:
                        defaultItemOperationTimeout: 2h
                        features: EnableCSI
                        defaultSnapshotMoveData: true
                      
                        backupStorageLocation:
                          - name: default
                            provider: aws
                            bucket: velero
                            config:
                              region: us-east-1
                              s3ForcePathStyle: true
                              s3Url: https://s3.location
                      
                        # Destination VSL points to LINSTOR snapshot class
                        volumeSnapshotLocation:
                          - name: linstor
                            provider: velero.io/csi
                            config:
                              snapshotClass: linstor-vsc
                      
                      credentials:
                        useSecret: true
                        existingSecret: velero-user
                      
                      
                      metrics:
                        enabled: true
                      
                        serviceMonitor:
                          enabled: true
                      
                        prometheusRule:
                          enabled: true
                          # Additional labels to add to deployed PrometheusRule
                          additionalLabels: {}
                          # PrometheusRule namespace. Defaults to Velero namespace.
                          # namespace: ""
                          # Rules to be deployed
                          spec:
                            - alert: VeleroBackupPartialFailures
                              annotations:
                                message: Velero backup {{ $labels.schedule }} has {{ $value | humanizePercentage }} partialy failed backups.
                              expr: |-
                                velero_backup_partial_failure_total{schedule!=""} / velero_backup_attempt_total{schedule!=""} > 0.25
                              for: 15m
                              labels:
                                severity: warning
                            - alert: VeleroBackupFailures
                              annotations:
                                message: Velero backup {{ $labels.schedule }} has {{ $value | humanizePercentage }} failed backups.
                              expr: |-
                                velero_backup_failure_total{schedule!=""} / velero_backup_attempt_total{schedule!=""} > 0.25
                              for: 15m
                              labels:
                                severity: warning
                      

                      Also create the following.

                      apiVersion: snapshot.storage.k8s.io/v1
                      kind: VolumeSnapshotClass
                      metadata:
                        name: linstor-vsc
                        labels:
                          velero.io/csi-volumesnapshot-class: "true"
                      driver: linstor.csi.linbit.com
                      deletionPolicy: Delete
                      

                      We are using Piraeus operator to use xostor in k8s
                      https://github.com/piraeusdatastore/piraeus-operator.git
                      Version: v2.9.1
                      Values:

                      ---
                      operator: 
                        resources:
                          requests:
                            cpu: 250m
                            memory: 500Mi
                          limits:
                            memory: 1Gi
                      installCRDs: true
                      imageConfigOverride:
                      - base: quay.io/piraeusdatastore
                        components:
                          linstor-satellite:
                            image: piraeus-server
                            tag: v1.29.0
                      tls:
                        certManagerIssuerRef:
                          name: step-issuer
                          kind: StepClusterIssuer
                          group: certmanager.step.sm
                      

                      Then we just connect to the xostor cluster like external linstor controller.

                      henri9813H 1 Reply Last reply Reply Quote 1
                      • henri9813H Offline
                        henri9813 @Jonathon
                        last edited by henri9813

                        Hello,

                        I got my whole xostor destroyed, i don't know how precisely.

                        I found some errors in sattelite

                        Error context:
                                An error occurred while processing resource 'Node: 'host', Rsc: 'xcp-volume-e011c043-8751-45e6-be06-4ce9f8807cad''
                        ErrorContext:
                          Details:     Command 'lvcreate --config 'devices { filter=['"'"'a|/dev/md127|'"'"','"'"'a|/dev/md126p3|'"'"','"'"'r|.*|'"'"'] }' --virtualsize 52543488k linstor_primary --thinpool thin_device --name xcp-volume-e011c043-8751-45e6-be06-4ce9f8807cad_00000' returned with exitcode 5. 
                        
                        Standard out: 
                        
                        
                        Error message: 
                          WARNING: Remaining free space in metadata of thin pool linstor_primary/thin_device is too low (98.06% >= 96.30%). Resize is recommended.
                          Cannot create new thin volume, free space in thin pool linstor_primary/thin_device reached threshold.
                        

                        of course, i checked, my SR was not full
                        aa2774a4-c2d4-4dd1-be52-3c6e418c9083-image.png

                        And the controller crashed, and i couldn't make it works.

                        Here is the error i got

                        ==========
                        
                        Category:                           RuntimeException
                        Class name:                         IllegalStateException
                        Class canonical name:               java.lang.IllegalStateException
                        Generated at:                       Method 'newIllegalStateException', Source file 'DataUtils.java', Line #870
                        
                        Error message:                      Reading from nio:/var/lib/linstor/linstordb.mv.db failed; file length 2293760 read length 384 at 2445540 [1.4.197/1]
                        

                        So i deduce the database was fucked-up, i tried to open the file as explained in the documentation, but the linstor schema was "not found" in the file, event if using cat i see data about it.

                        for now, i leave xostor and i'm back to localstorage until we know what to do when this issue occured with a "solution path".

                        1 Reply Last reply Reply Quote 0
                        • First post
                          Last post