XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    XOSTOR hyperconvergence preview

    Scheduled Pinned Locked Moved XOSTOR
    461 Posts 50 Posters 784.5k Views 53 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • ronan-aR Offline
      ronan-a Vates 🪐 XCP-ng Team @peter_webbird
      last edited by ronan-a

      @peter_webbird We've already had feedback on CBT and LINSTOR/DRBD, we don't necessarily recommend enabling it. We have a blocking dev card regarding a bug with LVM lvchange command that may fail on CBT volumes used by a XOSTOR SR. We also have other issues related to migration with CBT.

      1 Reply Last reply Reply Quote 3
      • G Offline
        gb.123
        last edited by

        @ronan-a @dthenot @Team-Storage

        Guys, Can you please clarify which method to use for installing XOSTOR in XCP-ng 8.3 ?

        Simple :

        yum install xcp-ng-linstor
        yum install xcp-ng-release-linstor
        ./install --disks /dev/nvme0n1 --thin
        

        Or the script in the first post ?
        Or Some other script ?

        dthenotD 1 Reply Last reply Reply Quote 2
        • dthenotD Offline
          dthenot Vates 🪐 XCP-ng Team @gb.123
          last edited by

          @gb.123 Hello,
          The instruction in the first post are still the way to go 🙂

          J 1 Reply Last reply Reply Quote 3
          • J Offline
            JeffBerntsen Top contributor @dthenot
            last edited by

            @dthenot said in XOSTOR hyperconvergence preview:

            @gb.123 Hello,
            The instruction in the first post are still the way to go 🙂

            I'm curious about that as well but the first post says that the installation script is only compatible with 8.2 and doesn't mention 8.3. Is that still the case or is the installation script now compatible with 8.3 as well? If not, is there an installation script that is compatible with 8.3?

            I know that using XO is the recommended method for installation but I'm interested in an installation script as I would like to try to integrate XOSTOR installation into an XCP-ng installation script I already have which runs via PXE boot.

            dthenotD 1 Reply Last reply Reply Quote 0
            • dthenotD Offline
              dthenot Vates 🪐 XCP-ng Team @JeffBerntsen
              last edited by

              @JeffBerntsen That's why I meant, the way to install written in the first post still work in 8.3, the script still work as expected also, it basically only create the VG/LV needed on hosts before you create the SR.

              J 1 Reply Last reply Reply Quote 0
              • J Offline
                JeffBerntsen Top contributor @dthenot
                last edited by

                @dthenot said in XOSTOR hyperconvergence preview:

                @JeffBerntsen That's why I meant, the way to install written in the first post still work in 8.3, the script still work as expected also, it basically only create the VG/LV needed on hosts before you create the SR.

                Got it. Thanks!

                henri9813H 1 Reply Last reply Reply Quote 0
                • henri9813H Offline
                  henri9813 @JeffBerntsen
                  last edited by henri9813

                  Hello,

                  I plan to install my XOSTOR cluster on a pool of 7 nodes with 3 replicas, but not all nodes at once because disks are in use.
                  consider:

                  • node1
                  • node2
                  • node ...
                  • node 5
                  • node 6
                  • node 7.

                  with 2 disks on each

                  • sda: 128GB for the OS
                  • sdb: 1TB for local sr ( for now 😄 )

                  I emptied node 6 & 7.

                  so, here is what i plan to do:

                  • On ALL NODES: setup linstor packages

                  Run the install script on node 6 & 7 to add their disks
                  so:

                  node6# install.sh --disks /dev/sdb
                  node7# install.sh --disks /dev/sdb
                  

                  Then, configure the SR and the linstor plugin manager as the following

                  xe sr-create \ 
                      type=linstor name-label=pool-01 \
                      host-uuid=XXXX \
                      device-config:group-name=linstor_group/thin_device device-config:redundancy=3 shared=true device-config:provisioning=thin
                  

                  Normally, i should have a linstor cluster running of 2 nodes ( 2 satellite and one controller randomly placed ) with only 2 disks and then, only 2/3 working replicas.

                  The cluster SHOULD be usable ( i'm right on this point ? )

                  The next step, would be to move VM from node 5 on it to evacuate node 5. and then add it to the cluster by the following

                  node5# install.sh --disks /dev/sdb
                  node5# xe host-call-plugin \
                    host-uuid=node5-uuid \
                    plugin=linstor-manager \
                    fn=addHost args:groupName=linstor_group/thin_device
                  

                  That should deploy satelite on node 5 and add the disk.

                  I normally should have 3/3 working replicas and can start to deploy others nodes progressively.

                  I'm right on the process ?

                  aS mentionned in the discord, i will post my feedbacks and results from my setup once i finalized it. ( maybe thought a blog post somewhere ).

                  Thanks to provide xostor in opensource, it's clearly the missing piece for this virtualization stack in opensource ( vs proxmox )

                  1 Reply Last reply Reply Quote 0
                  • henri9813H henri9813 referenced this topic on
                  • J Offline
                    Jonathon @Jonathon
                    last edited by

                    I have amazing news!

                    After the upgrade to xcp-ng 8.3, I retested velero backup, and it all just works 😁

                    Completed Backup

                    jonathon@jonathon-framework:~$ velero --kubeconfig k8s_configs/production.yaml backup describe grafana-test
                    Name:         grafana-test
                    Namespace:    velero
                    Labels:       objectset.rio.cattle.io/hash=c2b5f500ab5d9b8ffe14f2c70bf3742291df565c
                                  velero.io/storage-location=default
                    Annotations:  objectset.rio.cattle.io/applied=H4sIAAAAAAAA/4SSQW/bPgzFvwvPtv9OajeJj/8N22HdBqxFL0MPlEQlWmTRkOhgQ5HvPsixE2yH7iji8ffIJ74CDu6ZYnIcoIMTeYpcOf7vtIICji4Y6OB/1MdxgAJ6EjQoCN0rYAgsKI5Dyk9WP0hLIqmi40qjiKfMcRlAq7pBY+py26qmbEi15a5p78vtaqe0oqbVVsO5AI+K/Ju4A6YDdKDXqrVtXaNqzU5traVVY9d6Uyt7t2nW693K2Pa+naABe4IO9hEtBiyFksClmgbUdN06a9NAOtvr5B4DDunA8uR64lGgg7u6rxMUYMji6OWZ/dhTeuIPaQ6os+gTFUA/tR8NmXd+TELxUfNA5hslHqOmBN13OF16ZwvNQShIqpZClYQj7qk6blPlGF5uzC/L3P+kvok7MB9z0OcCXPiLPLHmuLLWCfVfB4rTZ9/iaA5zHovNZz7R++k6JI50q89BXcuXYR5YT0DolkChABEPHWzW9cK+rPQx8jgsH/KQj+QT/frzXCdduc/Ca9u1Y7aaFvMu5Ang5Xz+HQAA//8X7Fu+/QIAAA
                                  objectset.rio.cattle.io/id=e104add0-85b4-4eb5-9456-819bcbe45cfc
                                  velero.io/resource-timeout=10m0s
                                  velero.io/source-cluster-k8s-gitversion=v1.33.4+rke2r1
                                  velero.io/source-cluster-k8s-major-version=1
                                  velero.io/source-cluster-k8s-minor-version=33
                    
                    Phase:  Completed
                    
                    
                    Namespaces:
                      Included:  grafana
                      Excluded:  <none>
                    
                    Resources:
                      Included cluster-scoped:    <none>
                      Excluded cluster-scoped:    volumesnapshotcontents.snapshot.storage.k8s.io
                      Included namespace-scoped:  *
                      Excluded namespace-scoped:  volumesnapshots.snapshot.storage.k8s.io
                    
                    Label selector:  <none>
                    
                    Or label selector:  <none>
                    
                    Storage Location:  default
                    
                    Velero-Native Snapshot PVs:  true
                    Snapshot Move Data:          true
                    Data Mover:                  velero
                    
                    TTL:  720h0m0s
                    
                    CSISnapshotTimeout:    30m0s
                    ItemOperationTimeout:  4h0m0s
                    
                    Hooks:  <none>
                    
                    Backup Format Version:  1.1.0
                    
                    Started:    2025-10-15 15:29:52 -0700 PDT
                    Completed:  2025-10-15 15:31:25 -0700 PDT
                    
                    Expiration:  2025-11-14 14:29:52 -0800 PST
                    
                    Total items to be backed up:  35
                    Items backed up:              35
                    
                    Backup Item Operations:  1 of 1 completed successfully, 0 failed (specify --details for more information)
                    Backup Volumes:
                      Velero-Native Snapshots: <none included>
                    
                      CSI Snapshots:
                        grafana/central-grafana:
                          Data Movement: included, specify --details for more information
                    
                      Pod Volume Backups: <none included>
                    
                    HooksAttempted:  0
                    HooksFailed:     0
                    

                    Completed Restore

                    jonathon@jonathon-framework:~$ velero --kubeconfig k8s_configs/production.yaml restore describe restore-grafana-test --details
                    Name:         restore-grafana-test
                    Namespace:    velero
                    Labels:       objectset.rio.cattle.io/hash=252addb3ed156c52d9fa9b8c045b47a55d66c0af
                    Annotations:  objectset.rio.cattle.io/applied=H4sIAAAAAAAA/3yRTW7zIBBA7zJrO5/j35gzfE2rtsomymIM45jGBgTjbKLcvaKJm6qL7kDwnt7ABdDpHfmgrQEBZxrJ25W2/85rSOCkjQIBrxTYeoIEJmJUyAjiAmiMZWRtTYhb232Q5EC88tquJDKPFEU6GlpUG5UVZdpUdZ6WZZ+niOtNWtR1SypvqC8buCYwYkfjn7oBwwAC8ipHpbqC1LqqZZWrtse228isrLqywapSdS0z7KPU4EQgwN+mSI8eezSYMgWG22lwKOl7/MgERzJmdChPs9veDL9IGfSbQRcGy+96IjszCCiyCRLQRo6zIrVd5AHEfuHhkIBmmp4d+a/3e9Dl8LPoCZ3T5hg7FvQRcR8nxt6XL7sAgv1MCZztOE+01P23cvmnPYzaxNtwuF4/AwAA//8k6OwC/QEAAA
                                  objectset.rio.cattle.io/id=9ad8d034-7562-44f2-aa18-3669ed27ef47
                    
                    Phase:                       Completed
                    Total items to be restored:  33
                    Items restored:              33
                    
                    Started:    2025-10-15 15:35:26 -0700 PDT
                    Completed:  2025-10-15 15:36:34 -0700 PDT
                    
                    Warnings:
                      Velero:     <none>
                      Cluster:    <none>
                      Namespaces:
                        grafana-restore:  could not restore, ConfigMap:elasticsearch-es-transport-ca-internal already exists. Warning: the in-cluster version is different than the backed-up version
                                          could not restore, ConfigMap:kube-root-ca.crt already exists. Warning: the in-cluster version is different than the backed-up version
                    
                    Backup:  grafana-test
                    
                    Namespaces:
                      Included:  grafana
                      Excluded:  <none>
                    
                    Resources:
                      Included:        *
                      Excluded:        nodes, events, events.events.k8s.io, backups.velero.io, restores.velero.io, resticrepositories.velero.io, csinodes.storage.k8s.io, volumeattachments.storage.k8s.io, backuprepositories.velero.io
                      Cluster-scoped:  auto
                    
                    Namespace mappings:  grafana=grafana-restore
                    
                    Label selector:  <none>
                    
                    Or label selector:  <none>
                    
                    Restore PVs:  true
                    
                    CSI Snapshot Restores:
                      grafana-restore/central-grafana:
                        Data Movement:
                          Operation ID: dd-ffa56e1c-9fd0-44b4-a8bb-8163f40a49e9.330b82fc-ca6a-423217ee5
                          Data Mover: velero
                          Uploader Type: kopia
                    
                    Existing Resource Policy:   <none>
                    ItemOperationTimeout:       4h0m0s
                    
                    Preserve Service NodePorts:  auto
                    
                    Restore Item Operations:
                      Operation for persistentvolumeclaims grafana-restore/central-grafana:
                        Restore Item Action Plugin:  velero.io/csi-pvc-restorer
                        Operation ID:                dd-ffa56e1c-9fd0-44b4-a8bb-8163f40a49e9.330b82fc-ca6a-423217ee5
                        Phase:                       Completed
                        Progress:                    856284762 of 856284762 complete (Bytes)
                        Progress description:        Completed
                        Created:                     2025-10-15 15:35:28 -0700 PDT
                        Started:                     2025-10-15 15:36:06 -0700 PDT
                        Updated:                     2025-10-15 15:36:26 -0700 PDT
                    
                    HooksAttempted:   0
                    HooksFailed:      0
                    
                    Resource List:
                      apps/v1/Deployment:
                        - grafana-restore/central-grafana(created)
                        - grafana-restore/grafana-debug(created)
                      apps/v1/ReplicaSet:
                        - grafana-restore/central-grafana-5448b9f65(created)
                        - grafana-restore/central-grafana-56887c6cb6(created)
                        - grafana-restore/central-grafana-56ddd4f497(created)
                        - grafana-restore/central-grafana-5f4757844b(created)
                        - grafana-restore/central-grafana-5f69f86c85(created)
                        - grafana-restore/central-grafana-64545dcdc(created)
                        - grafana-restore/central-grafana-69c66c54d9(created)
                        - grafana-restore/central-grafana-6c8d6f65b8(created)
                        - grafana-restore/central-grafana-7b479f79ff(created)
                        - grafana-restore/central-grafana-bc7d96cdd(created)
                        - grafana-restore/central-grafana-cb88bd49c(created)
                        - grafana-restore/grafana-debug-556845ff7b(created)
                        - grafana-restore/grafana-debug-6fb594cb5f(created)
                        - grafana-restore/grafana-debug-8f66bfbf6(created)
                      discovery.k8s.io/v1/EndpointSlice:
                        - grafana-restore/central-grafana-hkgd5(created)
                      networking.k8s.io/v1/Ingress:
                        - grafana-restore/central-grafana(created)
                      rbac.authorization.k8s.io/v1/Role:
                        - grafana-restore/central-grafana(created)
                      rbac.authorization.k8s.io/v1/RoleBinding:
                        - grafana-restore/central-grafana(created)
                      v1/ConfigMap:
                        - grafana-restore/central-grafana(created)
                        - grafana-restore/elasticsearch-es-transport-ca-internal(failed)
                        - grafana-restore/kube-root-ca.crt(failed)
                      v1/Endpoints:
                        - grafana-restore/central-grafana(created)
                      v1/PersistentVolume:
                        - pvc-e3f6578f-08b2-4e79-85f0-76bbf8985b55(skipped)
                      v1/PersistentVolumeClaim:
                        - grafana-restore/central-grafana(created)
                      v1/Pod:
                        - grafana-restore/central-grafana-cb88bd49c-fc5br(created)
                      v1/Secret:
                        - grafana-restore/fpinfra-net-cf-cert(created)
                        - grafana-restore/grafana(created)
                      v1/Service:
                        - grafana-restore/central-grafana(created)
                      v1/ServiceAccount:
                        - grafana-restore/central-grafana(created)
                        - grafana-restore/default(skipped)
                      velero.io/v2alpha1/DataUpload:
                        - velero/grafana-test-nw7zj(skipped)
                    

                    Image of working restore pod, with correct data in PV
                    34d87db1-19ae-4348-8d4e-6599375d7634-image.png

                    Velero installed from helm: https://vmware-tanzu.github.io/helm-charts
                    Version: velero:11.1.0
                    Values

                    ---
                    image:
                      repository: velero/velero
                      tag: v1.17.0
                    
                    # Whether to deploy the restic daemonset.
                    deployNodeAgent: true
                    
                    initContainers:
                       - name: velero-plugin-for-aws
                         image: velero/velero-plugin-for-aws:latest
                         imagePullPolicy: IfNotPresent
                         volumeMounts:
                           - mountPath: /target
                             name: plugins
                    
                    configuration:
                      defaultItemOperationTimeout: 2h
                      features: EnableCSI
                      defaultSnapshotMoveData: true
                    
                      backupStorageLocation:
                        - name: default
                          provider: aws
                          bucket: velero
                          config:
                            region: us-east-1
                            s3ForcePathStyle: true
                            s3Url: https://s3.location
                    
                      # Destination VSL points to LINSTOR snapshot class
                      volumeSnapshotLocation:
                        - name: linstor
                          provider: velero.io/csi
                          config:
                            snapshotClass: linstor-vsc
                    
                    credentials:
                      useSecret: true
                      existingSecret: velero-user
                    
                    
                    metrics:
                      enabled: true
                    
                      serviceMonitor:
                        enabled: true
                    
                      prometheusRule:
                        enabled: true
                        # Additional labels to add to deployed PrometheusRule
                        additionalLabels: {}
                        # PrometheusRule namespace. Defaults to Velero namespace.
                        # namespace: ""
                        # Rules to be deployed
                        spec:
                          - alert: VeleroBackupPartialFailures
                            annotations:
                              message: Velero backup {{ $labels.schedule }} has {{ $value | humanizePercentage }} partialy failed backups.
                            expr: |-
                              velero_backup_partial_failure_total{schedule!=""} / velero_backup_attempt_total{schedule!=""} > 0.25
                            for: 15m
                            labels:
                              severity: warning
                          - alert: VeleroBackupFailures
                            annotations:
                              message: Velero backup {{ $labels.schedule }} has {{ $value | humanizePercentage }} failed backups.
                            expr: |-
                              velero_backup_failure_total{schedule!=""} / velero_backup_attempt_total{schedule!=""} > 0.25
                            for: 15m
                            labels:
                              severity: warning
                    

                    Also create the following.

                    apiVersion: snapshot.storage.k8s.io/v1
                    kind: VolumeSnapshotClass
                    metadata:
                      name: linstor-vsc
                      labels:
                        velero.io/csi-volumesnapshot-class: "true"
                    driver: linstor.csi.linbit.com
                    deletionPolicy: Delete
                    

                    We are using Piraeus operator to use xostor in k8s
                    https://github.com/piraeusdatastore/piraeus-operator.git
                    Version: v2.9.1
                    Values:

                    ---
                    operator: 
                      resources:
                        requests:
                          cpu: 250m
                          memory: 500Mi
                        limits:
                          memory: 1Gi
                    installCRDs: true
                    imageConfigOverride:
                    - base: quay.io/piraeusdatastore
                      components:
                        linstor-satellite:
                          image: piraeus-server
                          tag: v1.29.0
                    tls:
                      certManagerIssuerRef:
                        name: step-issuer
                        kind: StepClusterIssuer
                        group: certmanager.step.sm
                    

                    Then we just connect to the xostor cluster like external linstor controller.

                    henri9813H 1 Reply Last reply Reply Quote 1
                    • henri9813H Offline
                      henri9813 @Jonathon
                      last edited by henri9813

                      Hello,

                      I got my whole xostor destroyed, i don't know how precisely.

                      I found some errors in sattelite

                      Error context:
                              An error occurred while processing resource 'Node: 'host', Rsc: 'xcp-volume-e011c043-8751-45e6-be06-4ce9f8807cad''
                      ErrorContext:
                        Details:     Command 'lvcreate --config 'devices { filter=['"'"'a|/dev/md127|'"'"','"'"'a|/dev/md126p3|'"'"','"'"'r|.*|'"'"'] }' --virtualsize 52543488k linstor_primary --thinpool thin_device --name xcp-volume-e011c043-8751-45e6-be06-4ce9f8807cad_00000' returned with exitcode 5. 
                      
                      Standard out: 
                      
                      
                      Error message: 
                        WARNING: Remaining free space in metadata of thin pool linstor_primary/thin_device is too low (98.06% >= 96.30%). Resize is recommended.
                        Cannot create new thin volume, free space in thin pool linstor_primary/thin_device reached threshold.
                      

                      of course, i checked, my SR was not full
                      aa2774a4-c2d4-4dd1-be52-3c6e418c9083-image.png

                      And the controller crashed, and i couldn't make it works.

                      Here is the error i got

                      ==========
                      
                      Category:                           RuntimeException
                      Class name:                         IllegalStateException
                      Class canonical name:               java.lang.IllegalStateException
                      Generated at:                       Method 'newIllegalStateException', Source file 'DataUtils.java', Line #870
                      
                      Error message:                      Reading from nio:/var/lib/linstor/linstordb.mv.db failed; file length 2293760 read length 384 at 2445540 [1.4.197/1]
                      

                      So i deduce the database was fucked-up, i tried to open the file as explained in the documentation, but the linstor schema was "not found" in the file, event if using cat i see data about it.

                      for now, i leave xostor and i'm back to localstorage until we know what to do when this issue occured with a "solution path".

                      ronan-aR 1 Reply Last reply Reply Quote 0
                      • ronan-aR Offline
                        ronan-a Vates 🪐 XCP-ng Team @henri9813
                        last edited by

                        @henri9813 said in XOSTOR hyperconvergence preview:

                        of course, i checked, my SR was not full

                        The visual representation of used space is for informational purposes only; it's an approximation that takes into account replication, disks in use, etc. For more information: https://docs.xcp-ng.org/xostor/#how-a-linstor-sr-capacity-is-calculated

                        We plan to display a complete view of each physical disk space on each host someday to provide a more detailed overview. In any case, if you use "lvs"/"vgs" on each machine, you should indeed see the actual disk space used.

                        henri9813H 1 Reply Last reply Reply Quote 0
                        • henri9813H Offline
                          henri9813 @ronan-a
                          last edited by

                          Hello @ronan-a ,

                          but how recover from this situation ?

                          Thanks !

                          1 Reply Last reply Reply Quote 0
                          • First post
                            Last post