XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login
    1. Home
    2. Jonathon
    J Offline
    • Profile
    • Following 2
    • Followers 0
    • Topics 5
    • Posts 52
    • Groups 0

    Jonathon

    @Jonathon

    7
    Reputation
    19
    Profile views
    52
    Posts
    0
    Followers
    2
    Following
    Joined
    Last Online

    Jonathon Unfollow Follow

    Best posts made by Jonathon

    • RE: XOSTOR hyperconvergence preview

      I have amazing news!

      After the upgrade to xcp-ng 8.3, I retested velero backup, and it all just works 😁

      Completed Backup

      jonathon@jonathon-framework:~$ velero --kubeconfig k8s_configs/production.yaml backup describe grafana-test
      Name:         grafana-test
      Namespace:    velero
      Labels:       objectset.rio.cattle.io/hash=c2b5f500ab5d9b8ffe14f2c70bf3742291df565c
                    velero.io/storage-location=default
      Annotations:  objectset.rio.cattle.io/applied=H4sIAAAAAAAA/4SSQW/bPgzFvwvPtv9OajeJj/8N22HdBqxFL0MPlEQlWmTRkOhgQ5HvPsixE2yH7iji8ffIJ74CDu6ZYnIcoIMTeYpcOf7vtIICji4Y6OB/1MdxgAJ6EjQoCN0rYAgsKI5Dyk9WP0hLIqmi40qjiKfMcRlAq7pBY+py26qmbEi15a5p78vtaqe0oqbVVsO5AI+K/Ju4A6YDdKDXqrVtXaNqzU5traVVY9d6Uyt7t2nW693K2Pa+naABe4IO9hEtBiyFksClmgbUdN06a9NAOtvr5B4DDunA8uR64lGgg7u6rxMUYMji6OWZ/dhTeuIPaQ6os+gTFUA/tR8NmXd+TELxUfNA5hslHqOmBN13OF16ZwvNQShIqpZClYQj7qk6blPlGF5uzC/L3P+kvok7MB9z0OcCXPiLPLHmuLLWCfVfB4rTZ9/iaA5zHovNZz7R++k6JI50q89BXcuXYR5YT0DolkChABEPHWzW9cK+rPQx8jgsH/KQj+QT/frzXCdduc/Ca9u1Y7aaFvMu5Ang5Xz+HQAA//8X7Fu+/QIAAA
                    objectset.rio.cattle.io/id=e104add0-85b4-4eb5-9456-819bcbe45cfc
                    velero.io/resource-timeout=10m0s
                    velero.io/source-cluster-k8s-gitversion=v1.33.4+rke2r1
                    velero.io/source-cluster-k8s-major-version=1
                    velero.io/source-cluster-k8s-minor-version=33
      
      Phase:  Completed
      
      
      Namespaces:
        Included:  grafana
        Excluded:  <none>
      
      Resources:
        Included cluster-scoped:    <none>
        Excluded cluster-scoped:    volumesnapshotcontents.snapshot.storage.k8s.io
        Included namespace-scoped:  *
        Excluded namespace-scoped:  volumesnapshots.snapshot.storage.k8s.io
      
      Label selector:  <none>
      
      Or label selector:  <none>
      
      Storage Location:  default
      
      Velero-Native Snapshot PVs:  true
      Snapshot Move Data:          true
      Data Mover:                  velero
      
      TTL:  720h0m0s
      
      CSISnapshotTimeout:    30m0s
      ItemOperationTimeout:  4h0m0s
      
      Hooks:  <none>
      
      Backup Format Version:  1.1.0
      
      Started:    2025-10-15 15:29:52 -0700 PDT
      Completed:  2025-10-15 15:31:25 -0700 PDT
      
      Expiration:  2025-11-14 14:29:52 -0800 PST
      
      Total items to be backed up:  35
      Items backed up:              35
      
      Backup Item Operations:  1 of 1 completed successfully, 0 failed (specify --details for more information)
      Backup Volumes:
        Velero-Native Snapshots: <none included>
      
        CSI Snapshots:
          grafana/central-grafana:
            Data Movement: included, specify --details for more information
      
        Pod Volume Backups: <none included>
      
      HooksAttempted:  0
      HooksFailed:     0
      

      Completed Restore

      jonathon@jonathon-framework:~$ velero --kubeconfig k8s_configs/production.yaml restore describe restore-grafana-test --details
      Name:         restore-grafana-test
      Namespace:    velero
      Labels:       objectset.rio.cattle.io/hash=252addb3ed156c52d9fa9b8c045b47a55d66c0af
      Annotations:  objectset.rio.cattle.io/applied=H4sIAAAAAAAA/3yRTW7zIBBA7zJrO5/j35gzfE2rtsomymIM45jGBgTjbKLcvaKJm6qL7kDwnt7ABdDpHfmgrQEBZxrJ25W2/85rSOCkjQIBrxTYeoIEJmJUyAjiAmiMZWRtTYhb232Q5EC88tquJDKPFEU6GlpUG5UVZdpUdZ6WZZ+niOtNWtR1SypvqC8buCYwYkfjn7oBwwAC8ipHpbqC1LqqZZWrtse228isrLqywapSdS0z7KPU4EQgwN+mSI8eezSYMgWG22lwKOl7/MgERzJmdChPs9veDL9IGfSbQRcGy+96IjszCCiyCRLQRo6zIrVd5AHEfuHhkIBmmp4d+a/3e9Dl8LPoCZ3T5hg7FvQRcR8nxt6XL7sAgv1MCZztOE+01P23cvmnPYzaxNtwuF4/AwAA//8k6OwC/QEAAA
                    objectset.rio.cattle.io/id=9ad8d034-7562-44f2-aa18-3669ed27ef47
      
      Phase:                       Completed
      Total items to be restored:  33
      Items restored:              33
      
      Started:    2025-10-15 15:35:26 -0700 PDT
      Completed:  2025-10-15 15:36:34 -0700 PDT
      
      Warnings:
        Velero:     <none>
        Cluster:    <none>
        Namespaces:
          grafana-restore:  could not restore, ConfigMap:elasticsearch-es-transport-ca-internal already exists. Warning: the in-cluster version is different than the backed-up version
                            could not restore, ConfigMap:kube-root-ca.crt already exists. Warning: the in-cluster version is different than the backed-up version
      
      Backup:  grafana-test
      
      Namespaces:
        Included:  grafana
        Excluded:  <none>
      
      Resources:
        Included:        *
        Excluded:        nodes, events, events.events.k8s.io, backups.velero.io, restores.velero.io, resticrepositories.velero.io, csinodes.storage.k8s.io, volumeattachments.storage.k8s.io, backuprepositories.velero.io
        Cluster-scoped:  auto
      
      Namespace mappings:  grafana=grafana-restore
      
      Label selector:  <none>
      
      Or label selector:  <none>
      
      Restore PVs:  true
      
      CSI Snapshot Restores:
        grafana-restore/central-grafana:
          Data Movement:
            Operation ID: dd-ffa56e1c-9fd0-44b4-a8bb-8163f40a49e9.330b82fc-ca6a-423217ee5
            Data Mover: velero
            Uploader Type: kopia
      
      Existing Resource Policy:   <none>
      ItemOperationTimeout:       4h0m0s
      
      Preserve Service NodePorts:  auto
      
      Restore Item Operations:
        Operation for persistentvolumeclaims grafana-restore/central-grafana:
          Restore Item Action Plugin:  velero.io/csi-pvc-restorer
          Operation ID:                dd-ffa56e1c-9fd0-44b4-a8bb-8163f40a49e9.330b82fc-ca6a-423217ee5
          Phase:                       Completed
          Progress:                    856284762 of 856284762 complete (Bytes)
          Progress description:        Completed
          Created:                     2025-10-15 15:35:28 -0700 PDT
          Started:                     2025-10-15 15:36:06 -0700 PDT
          Updated:                     2025-10-15 15:36:26 -0700 PDT
      
      HooksAttempted:   0
      HooksFailed:      0
      
      Resource List:
        apps/v1/Deployment:
          - grafana-restore/central-grafana(created)
          - grafana-restore/grafana-debug(created)
        apps/v1/ReplicaSet:
          - grafana-restore/central-grafana-5448b9f65(created)
          - grafana-restore/central-grafana-56887c6cb6(created)
          - grafana-restore/central-grafana-56ddd4f497(created)
          - grafana-restore/central-grafana-5f4757844b(created)
          - grafana-restore/central-grafana-5f69f86c85(created)
          - grafana-restore/central-grafana-64545dcdc(created)
          - grafana-restore/central-grafana-69c66c54d9(created)
          - grafana-restore/central-grafana-6c8d6f65b8(created)
          - grafana-restore/central-grafana-7b479f79ff(created)
          - grafana-restore/central-grafana-bc7d96cdd(created)
          - grafana-restore/central-grafana-cb88bd49c(created)
          - grafana-restore/grafana-debug-556845ff7b(created)
          - grafana-restore/grafana-debug-6fb594cb5f(created)
          - grafana-restore/grafana-debug-8f66bfbf6(created)
        discovery.k8s.io/v1/EndpointSlice:
          - grafana-restore/central-grafana-hkgd5(created)
        networking.k8s.io/v1/Ingress:
          - grafana-restore/central-grafana(created)
        rbac.authorization.k8s.io/v1/Role:
          - grafana-restore/central-grafana(created)
        rbac.authorization.k8s.io/v1/RoleBinding:
          - grafana-restore/central-grafana(created)
        v1/ConfigMap:
          - grafana-restore/central-grafana(created)
          - grafana-restore/elasticsearch-es-transport-ca-internal(failed)
          - grafana-restore/kube-root-ca.crt(failed)
        v1/Endpoints:
          - grafana-restore/central-grafana(created)
        v1/PersistentVolume:
          - pvc-e3f6578f-08b2-4e79-85f0-76bbf8985b55(skipped)
        v1/PersistentVolumeClaim:
          - grafana-restore/central-grafana(created)
        v1/Pod:
          - grafana-restore/central-grafana-cb88bd49c-fc5br(created)
        v1/Secret:
          - grafana-restore/fpinfra-net-cf-cert(created)
          - grafana-restore/grafana(created)
        v1/Service:
          - grafana-restore/central-grafana(created)
        v1/ServiceAccount:
          - grafana-restore/central-grafana(created)
          - grafana-restore/default(skipped)
        velero.io/v2alpha1/DataUpload:
          - velero/grafana-test-nw7zj(skipped)
      

      Image of working restore pod, with correct data in PV
      34d87db1-19ae-4348-8d4e-6599375d7634-image.png

      Velero installed from helm: https://vmware-tanzu.github.io/helm-charts
      Version: velero:11.1.0
      Values

      ---
      image:
        repository: velero/velero
        tag: v1.17.0
      
      # Whether to deploy the restic daemonset.
      deployNodeAgent: true
      
      initContainers:
         - name: velero-plugin-for-aws
           image: velero/velero-plugin-for-aws:latest
           imagePullPolicy: IfNotPresent
           volumeMounts:
             - mountPath: /target
               name: plugins
      
      configuration:
        defaultItemOperationTimeout: 2h
        features: EnableCSI
        defaultSnapshotMoveData: true
      
        backupStorageLocation:
          - name: default
            provider: aws
            bucket: velero
            config:
              region: us-east-1
              s3ForcePathStyle: true
              s3Url: https://s3.location
      
        # Destination VSL points to LINSTOR snapshot class
        volumeSnapshotLocation:
          - name: linstor
            provider: velero.io/csi
            config:
              snapshotClass: linstor-vsc
      
      credentials:
        useSecret: true
        existingSecret: velero-user
      
      
      metrics:
        enabled: true
      
        serviceMonitor:
          enabled: true
      
        prometheusRule:
          enabled: true
          # Additional labels to add to deployed PrometheusRule
          additionalLabels: {}
          # PrometheusRule namespace. Defaults to Velero namespace.
          # namespace: ""
          # Rules to be deployed
          spec:
            - alert: VeleroBackupPartialFailures
              annotations:
                message: Velero backup {{ $labels.schedule }} has {{ $value | humanizePercentage }} partialy failed backups.
              expr: |-
                velero_backup_partial_failure_total{schedule!=""} / velero_backup_attempt_total{schedule!=""} > 0.25
              for: 15m
              labels:
                severity: warning
            - alert: VeleroBackupFailures
              annotations:
                message: Velero backup {{ $labels.schedule }} has {{ $value | humanizePercentage }} failed backups.
              expr: |-
                velero_backup_failure_total{schedule!=""} / velero_backup_attempt_total{schedule!=""} > 0.25
              for: 15m
              labels:
                severity: warning
      

      Also create the following.

      apiVersion: snapshot.storage.k8s.io/v1
      kind: VolumeSnapshotClass
      metadata:
        name: linstor-vsc
        labels:
          velero.io/csi-volumesnapshot-class: "true"
      driver: linstor.csi.linbit.com
      deletionPolicy: Delete
      

      We are using Piraeus operator to use xostor in k8s
      https://github.com/piraeusdatastore/piraeus-operator.git
      Version: v2.9.1
      Values:

      ---
      operator: 
        resources:
          requests:
            cpu: 250m
            memory: 500Mi
          limits:
            memory: 1Gi
      installCRDs: true
      imageConfigOverride:
      - base: quay.io/piraeusdatastore
        components:
          linstor-satellite:
            image: piraeus-server
            tag: v1.29.0
      tls:
        certManagerIssuerRef:
          name: step-issuer
          kind: StepClusterIssuer
          group: certmanager.step.sm
      

      Then we just connect to the xostor cluster like external linstor controller.

      posted in XOSTOR
      J
      Jonathon
    • RE: XOSTOR hyperconvergence preview

      @stormi

      The problem was yum cache. If I did yum update right after yum update xcp-ng-release-linstor it would still fail. To get it working right away did the following

      yum update xcp-ng-release-linstor
      yum clean all
      yum update
      
      posted in XOSTOR
      J
      Jonathon
    • RE: XOSTOR hyperconvergence preview

      OK I figured it out! I made an init container that gets a manually created node label for the node the pod is running on. This value is the bare metal host for that k8s node. The init contianer then takes that value and makes a script wrapper and then calls linstor-csi with the correct values. After making these changes all the linstor csi containers are running with no errors.

      Current problem comes from deploying and using storage class. Started with a basic one that failed, and noticed I did not know what the correct storage_pool_name name was, so went to http://IP:3370/v1/nodes/NODE/storage-pools and http://IP:3370/v1/nodes/NODE to get information.

      Still troubleshooting, but wanted to provide info.

      posted in XOSTOR
      J
      Jonathon
    • RE: DevOps Megathread: what you need and how we can help!

      @andrewperry I myself migrated our rancher management cluster from the original rke to a new rke2 cluster using this plan not too long ago, so you should not have much trouble. Feel free to ask questions šŸ™‚

      posted in Infrastructure as Code
      J
      Jonathon
    • RE: DevOps Megathread: what you need and how we can help!

      @nathanael-h Nice šŸ˜„

      If you have any questions let me know, I have been using this for all our on prem clusters for a while now.

      posted in Infrastructure as Code
      J
      Jonathon
    • RE: DevOps Megathread: what you need and how we can help!

      I do not have any asks ATM, but I thought I would just share my plan that I use to create k8s clusters that we have been using for a while now.

      It has grown over time and may be a bit messy, but figured better then nothing. We use this for rke2 rancher k8s clusters deployed onto out xcp-ng cluster. We use xostor for drives, and the vlan5 network is for piraeus operator to use for pv. We also use IPVS. We are using a rocky linux 9 vm template.

      If these are useful to anyone and they have questions I will do my best to answer.

      variable "pool" {
        default = "OVBH-PROD-XENPOOL04"
      }
      
      variable "network0" {
        default = "Native vRack"
      }
      variable "network1" {
        default = "VLAN80"
      }
      variable "network2" {
        default = "VLAN5"
      }
      
      variable "cluster_name" {
        default = "Production K8s Cluster"
      }
      
      variable "enrollment_command" {
        default = "curl -fL https://rancher.<redacted>.net/system-agent-install.sh | sudo  sh -s - --server https://rancher.<redacted>.net --label 'cattle.io/os=linux' --token <redacted>"
      }
      
      
      variable "node_type" {
        description = "Node type flag"
        default = {
          "1" = "--etcd --controlplane",
          "2" = "--etcd --controlplane",
          "3" = "--etcd --controlplane",
          "4" = "--worker",
          "5" = "--worker",
          "6" = "--worker",
          "7" = "--worker --taints smtp=true:NoSchedule",
          "8" = "--worker --taints smtp=true:NoSchedule",
          "9" = "--worker --taints smtp=true:NoSchedule"
        }
      }
      variable "node_networks" {
        description = "Node network flag"
        default = {
          "1" = "--internal-address 10.1.8.100 --address <redacted>",
          "2" = "--internal-address 10.1.8.101 --address <redacted>",
          "3" = "--internal-address 10.1.8.102 --address <redacted>",
          "4" = "--internal-address 10.1.8.103 --address <redacted>",
          "5" = "--internal-address 10.1.8.104 --address <redacted>",
          "6" = "--internal-address 10.1.8.105 --address <redacted>",
          "7" = "--internal-address 10.1.8.106 --address <redacted>",
          "8" = "--internal-address 10.1.8.107 --address <redacted>",
          "9" = "--internal-address 10.1.8.108 --address <redacted>"
        }
      }
      
      
      variable "vm_name" {
        description = "Node type flag"
        default = {
          "1" = "OVBH-VPROD-K8S01-MASTER01",
          "2" = "OVBH-VPROD-K8S01-MASTER02",
          "3" = "OVBH-VPROD-K8S01-MASTER03",
          "4" = "OVBH-VPROD-K8S01-WORKER01",
          "5" = "OVBH-VPROD-K8S01-WORKER02",
          "6" = "OVBH-VPROD-K8S01-WORKER03",
          "7" = "OVBH-VPROD-K8S01-WORKER04",
          "8" = "OVBH-VPROD-K8S01-WORKER05",
          "9" = "OVBH-VPROD-K8S01-WORKER06"
        }
      }
      
      variable "preferred_host" {
        default = {
          "1" = "85838113-e4b8-4520-9f6d-8f3cf554c8f1",
          "2" = "783c27ac-2dcb-4798-9ca8-27f5f30791f6",
          "3" = "c03e1a45-4c4c-46f5-a2a1-d8de2e22a866",
          "4" = "85838113-e4b8-4520-9f6d-8f3cf554c8f1",
          "5" = "783c27ac-2dcb-4798-9ca8-27f5f30791f6",
          "6" = "c03e1a45-4c4c-46f5-a2a1-d8de2e22a866",
          "7" = "85838113-e4b8-4520-9f6d-8f3cf554c8f1",
          "8" = "783c27ac-2dcb-4798-9ca8-27f5f30791f6",
          "9" = "c03e1a45-4c4c-46f5-a2a1-d8de2e22a866"
        }
      }
      
      variable "xoa_admin_password" {
      }
      
      variable "host_count" {
        description = "All drives go to xostor"
        default = {
          "1" = "479ca676-20a1-4051-7189-a4a9ca47e00d",
          "2" = "479ca676-20a1-4051-7189-a4a9ca47e00d",
          "3" = "479ca676-20a1-4051-7189-a4a9ca47e00d",
          "4" = "479ca676-20a1-4051-7189-a4a9ca47e00d",
          "5" = "479ca676-20a1-4051-7189-a4a9ca47e00d",
          "6" = "479ca676-20a1-4051-7189-a4a9ca47e00d",
          "7" = "479ca676-20a1-4051-7189-a4a9ca47e00d",
          "8" = "479ca676-20a1-4051-7189-a4a9ca47e00d",
          "9" = "479ca676-20a1-4051-7189-a4a9ca47e00d"
        }
      }
      
      variable "network1_ip_mapping" {
        description = "Mapping for network1 ips, vlan80"
        default = {
          "1" = "10.1.8.100",
          "2" = "10.1.8.101",
          "3" = "10.1.8.102",
          "4" = "10.1.8.103",
          "5" = "10.1.8.104",
          "6" = "10.1.8.105",
          "7" = "10.1.8.106",
          "8" = "10.1.8.107",
          "9" = "10.1.8.108"
        }
      }
      
      variable "network1_gateway" {
        description = "Mapping for public ip gateways, from hosts"
        default     = "10.1.8.1"
      }
      
      variable "network1_prefix" {
        description = "Prefix for the network used"
        default     = "22"
      }
      
      variable "network2_ip_mapping" {
        description = "Mapping for network2 ips, VLAN5"
        default = {
          "1" = "10.2.5.30",
          "2" = "10.2.5.31",
          "3" = "10.2.5.32",
          "4" = "10.2.5.33",
          "5" = "10.2.5.34",
          "6" = "10.2.5.35",
          "7" = "10.2.5.36",
          "8" = "10.2.5.37",
          "9" = "10.2.5.38"
        }
      }
      
      
      variable "network2_prefix" {
        description = "Prefix for the network used"
        default     = "22"
      }
      
      variable "network0_ip_mapping" {
        description = "Mapping for network0 ips, public"
        default = {
      <redacted>
        }
      }
      
      variable "network0_gateway" {
        description = "Mapping for public ip gateways, from hosts"
        default = {
      <redacted>
        }
      }
      
      variable "network0_prefix" {
        description = "Prefix for the network used"
        default = {
      <redacted>
        }
      }
      
      # Instruct terraform to download the provider on `terraform init`
      terraform {
        required_providers {
          xenorchestra = {
            source  = "vatesfr/xenorchestra"
            version = "~> 0.29.0"
          }
        }
      }
      
      # Configure the XenServer Provider
      provider "xenorchestra" {
        # Must be ws or wss
        url      = "ws://10.2.0.5"        # Or set XOA_URL environment variable
        username = "admin@admin.net"      # Or set XOA_USER environment variable
        password = var.xoa_admin_password # Or set XOA_PASSWORD environment variable
      }
      
      data "xenorchestra_pool" "pool" {
        name_label = var.pool
      }
      
      data "xenorchestra_template" "template" {
        name_label = "Rocky Linux 9 Template"
        pool_id    = data.xenorchestra_pool.pool.id
      }
      
      data "xenorchestra_network" "net1" {
        name_label = var.network1
        pool_id    = data.xenorchestra_pool.pool.id
      }
      data "xenorchestra_network" "net2" {
        name_label = var.network2
        pool_id    = data.xenorchestra_pool.pool.id
      }
      data "xenorchestra_network" "net0" {
        name_label = var.network0
        pool_id    = data.xenorchestra_pool.pool.id
      }
      
      resource "xenorchestra_cloud_config" "node" {
        count    = 9
        name     = "${lower(lookup(var.vm_name, count.index + 1))}_cloud_config"
        template = <<EOF
      
      #cloud-config
      ssh_authorized_keys:
        - ssh-rsa <redacted>
      
      write_files:
        - path: /etc/NetworkManager/conf.d/rke2-canal.conf
          permissions: '0755'
          owner: root
          content: |
            [keyfile]
            unmanaged-devices=interface-name:cali*;interface-name:flannel*
        - path: /tmp/selinux_kmod_drbd.log
          permissions: '0640'
          owner: root
          content: |
            type=AVC msg=audit(1661803314.183:778): avc:  denied  { module_load } for  pid=148256 comm="insmod" path="/tmp/ko/drbd.ko" dev="overlay" ino=101839829 scontext=system_u:system_r:unconfined_service_t:s0 tcontext=system_u:object_r:var_lib_t:s0 tclass=system permissive=0
            type=AVC msg=audit(1661803314.185:779): avc:  denied  { module_load } for  pid=148257 comm="insmod" path="/tmp/ko/drbd_transport_tcp.ko" dev="overlay" ino=101839831 scontext=system_u:system_r:unconfined_service_t:s0 tcontext=system_u:object_r:var_lib_t:s0 tclass=system permissive=0
        - path: /etc/sysconfig/modules/ipvs.modules
          permissions: 0755
          owner: root
          content: |
            #!/bin/bash
            modprobe -- ip_vs
            modprobe -- ip_vs_rr
            modprobe -- ip_vs_wrr
            modprobe -- ip_vs_sh
            modprobe -- nf_conntrack
        - path: /etc/modules-load.d/ipvs.conf
          permissions: 0755
          owner: root
          content: |
            ip_vs
            ip_vs_rr
            ip_vs_wrr
            ip_vs_sh
            nf_conntrack
      
      #cloud-init
      runcmd:
        - sudo hostnamectl set-hostname --static ${lower(lookup(var.vm_name, count.index + 1))}.<redacted>.com
        - sudo hostnamectl set-hostname ${lower(lookup(var.vm_name, count.index + 1))}.<redacted>.com
        - nmcli -t -f NAME con show | xargs -d '\n' -I {} nmcli con delete "{}"
        - nmcli con add type ethernet con-name public ifname enX0
        - nmcli con mod public ipv4.address '${lookup(var.network0_ip_mapping, count.index + 1)}/${lookup(var.network0_prefix, count.index + 1)}'
        - nmcli con mod public ipv4.method manual
        - nmcli con mod public ipv4.ignore-auto-dns yes
        - nmcli con mod public ipv4.gateway '${lookup(var.network0_gateway, count.index + 1)}'
        - nmcli con mod public ipv4.dns "8.8.8.8 8.8.4.4"
        - nmcli con mod public connection.autoconnect true
        - nmcli con up public
        - nmcli con add type ethernet con-name vlan80 ifname enX1
        - nmcli con mod vlan80 ipv4.address '${lookup(var.network1_ip_mapping, count.index + 1)}/${var.network1_prefix}'
        - nmcli con mod vlan80 ipv4.method manual
        - nmcli con mod vlan80 ipv4.ignore-auto-dns yes
        - nmcli con mod vlan80 ipv4.ignore-auto-routes yes
        - nmcli con mod vlan80 ipv4.gateway '${var.network1_gateway}'
        - nmcli con mod vlan80 ipv4.dns "${var.network1_gateway}"
        - nmcli con mod vlan80 connection.autoconnect true
        - nmcli con mod vlan80 ipv4.never-default true
        - nmcli con mod vlan80 ipv6.never-default true
        - nmcli con mod vlan80 ipv4.routes "10.0.0.0/8 ${var.network1_gateway}"
        - nmcli con up vlan80
        - nmcli con add type ethernet con-name vlan5 ifname enX2
        - nmcli con mod vlan5 ipv4.address '${lookup(var.network2_ip_mapping, count.index + 1)}/${var.network2_prefix}'
        - nmcli con mod vlan5 ipv4.method manual
        - nmcli con mod vlan5 ipv4.ignore-auto-dns yes
        - nmcli con mod vlan5 ipv4.ignore-auto-routes yes
        - nmcli con mod vlan5 connection.autoconnect true
        - nmcli con mod vlan5 ipv4.never-default true
        - nmcli con mod vlan5 ipv6.never-default true
        - nmcli con up vlan5
        - systemctl restart NetworkManager
        - dnf upgrade -y
        - dnf install ipset ipvsadm -y
        - bash /etc/sysconfig/modules/ipvs.modules
        - dnf install chrony -y
        - sudo systemctl enable --now chronyd
        - yum install kernel-devel kernel-headers -y
        - yum install elfutils-libelf-devel -y
        - swapoff -a
        - modprobe -- ip_tables
        - systemctl disable --now firewalld.service
        - systemctl disable --now rngd
        - dnf config-manager --add-repo=https://download.docker.com/linux/centos/docker-ce.repo
        - dnf install containerd.io tar -y
        - dnf install policycoreutils-python-utils -y
        - cat /tmp/selinux_kmod_drbd.log | sudo audit2allow -M insmoddrbd
        - sudo semodule -i insmoddrbd.pp
        - ${var.enrollment_command} ${lookup(var.node_type, count.index + 1)} ${lookup(var.node_networks, count.index + 1)}
      
      bootcmd:
        - swapoff -a
        - modprobe -- ip_tables
      EOF
      }
      
      resource "xenorchestra_vm" "master" {
        count            = 3
        cpus             = 4
        memory_max       = 8589934592
        cloud_config     = xenorchestra_cloud_config.node[count.index].template
        name_label       = lookup(var.vm_name, count.index + 1)
        name_description = "${var.cluster_name} master"
        template         = data.xenorchestra_template.template.id
        auto_poweron     = true
        affinity_host    = lookup(var.preferred_host, count.index + 1)
      
        network {
          network_id = data.xenorchestra_network.net0.id
        }
        network {
          network_id = data.xenorchestra_network.net1.id
        }
        network {
          network_id = data.xenorchestra_network.net2.id
        }
        disk {
          sr_id      = lookup(var.host_count, count.index + 1)
          name_label = "Terraform_disk_imavo"
          size       = 107374182400
        }
      }
      
      
      resource "xenorchestra_vm" "worker" {
        count            = 3
        cpus             = 32
        memory_max       = 68719476736
        cloud_config     = xenorchestra_cloud_config.node[count.index + 3].template
        name_label       = lookup(var.vm_name, count.index + 3 + 1)
        name_description = "${var.cluster_name} worker"
        template         = data.xenorchestra_template.template.id
        auto_poweron     = true
        affinity_host    = lookup(var.preferred_host, count.index + 3 + 1)
        
        network {
          network_id = data.xenorchestra_network.net0.id
        }
        network {
          network_id = data.xenorchestra_network.net1.id
        }
        network {
          network_id = data.xenorchestra_network.net2.id
        }
        disk {
          sr_id      = lookup(var.host_count, count.index + 3 + 1)
          name_label = "Terraform_disk_imavo"
          size       = 322122547200
        }
      }
      
      resource "xenorchestra_vm" "smtp" {
        count            = 3
        cpus             = 4
        memory_max       = 8589934592
        cloud_config     = xenorchestra_cloud_config.node[count.index + 6].template
        name_label       = lookup(var.vm_name, count.index + 6 + 1)
        name_description = "${var.cluster_name} smtp worker"
        template         = data.xenorchestra_template.template.id
        auto_poweron     = true
        affinity_host    = lookup(var.preferred_host, count.index + 6 + 1)
        
        network {
          network_id = data.xenorchestra_network.net0.id
        }
        network {
          network_id = data.xenorchestra_network.net1.id
        }
        network {
          network_id = data.xenorchestra_network.net2.id
        }
        disk {
          sr_id      = lookup(var.host_count, count.index + 6 + 1)
          name_label = "Terraform_disk_imavo"
          size       = 53687091200
        }
      }
      
      posted in Infrastructure as Code
      J
      Jonathon

    Latest posts made by Jonathon

    • RE: xcp-ng server crashed/rebooted due to issues with drbd/linstor?

      I've installed the latest updates to all hosts, and restarted everything. I also have all xcp-ng logs going into loki now, so next time something happens I will see everything lol.

      156f9ac1-f1bb-47f3-a789-8551c8614805-image.jpeg

      posted in XOSTOR
      J
      Jonathon
    • RE: xcp-ng server crashed/rebooted due to issues with drbd/linstor?

      Here is the log from when xen04 last crashed xen04-jun12-journalctl-crash-summary.txt

      posted in XOSTOR
      J
      Jonathon
    • xcp-ng server crashed/rebooted due to issues with drbd/linstor?

      Hello, for the last few weeks we have been having random server crashes/reboots. This last crash happened on the June 2026 Updates #1 for XCP-ng 8.3 LTS update. I see a new one just came out.

      I can share more logs if desired.
      xen03-jun20-journalctl-crash-summary.txt

      The timeline is a bit fuzzy, as we started to experience vendor issues and they stated that there was an issue with the power supply. But we have continued to have issues with that server. Now a new server (xen03) has crashed/rebooted. Will add more to this thread as I collect and go through it.

      posted in XOSTOR
      J
      Jonathon
    • RE: Ran into a new auth issue with xostor?

      @Mathieu-L

      linstor n l was included in my original post.
      All nodes were updated to May 2026 Security and Maintenance Updates for XCP-ng 8.3 LTS, all nodes were restarted.
      May 2026 Updates #2 for XCP-ng 8.3 LTS was released, and a couple days later I installed on all hosts. No host restarted.

      When xen04 was restarted, that is when this issue happened.
      I had used systemctl restart linstor-controller here (https://xcp-ng.org/forum/post/105309) to restart the controller.

      posted in XOSTOR
      J
      Jonathon
    • RE: Ran into a new auth issue with xostor?

      After looking at things some more and not seeing anything else I could do, I restarted the controller and satellites. This allowed things to recover.

      posted in XOSTOR
      J
      Jonathon
    • Ran into a new auth issue with xostor?

      Something went wrong with a (xeno4) host and it rebooted. After reboot it is behaving weirdly. Rebooting again does not resolve the issue.

      Attempting to start a vm with xostor vdi results in the following

      vm.start
      {
        "id": "3db40547-fcbf-35b1-4f1d-fc29ca851a57",
        "bypassMacAddressesCheck": false,
        "force": false,
        "host": "3aa66f69-ea6f-465a-83a7-c2c1c43eb3e3"
      }
      {
        "code": "SR_BACKEND_FAILURE_1200",
        "params": [
          "",
          "[Errno 30] Read-only file system: '/dev/drbd/by-res/xcp-volume-dc9de686-850a-4c6f-896f-0dacc2e1e35c/0'",
          ""
        ],
        "task": {
          "uuid": "484997e0-b959-3f38-5711-0e1f14031fea",
          "name_label": "Async.VM.start_on",
          "name_description": "",
          "allowed_operations": [],
          "current_operations": {},
          "created": "20260511T18:43:18Z",
          "finished": "20260511T18:44:44Z",
          "status": "failure",
          "resident_on": "OpaqueRef:1f61b22b-05b3-4724-9805-284d1079c6f7",
          "progress": 1,
          "type": "<none/>",
          "result": "",
          "error_info": [
            "SR_BACKEND_FAILURE_1200",
            "",
            "[Errno 30] Read-only file system: '/dev/drbd/by-res/xcp-volume-dc9de686-850a-4c6f-896f-0dacc2e1e35c/0'",
            ""
          ],
          "other_config": {
            "debug_info:cancel_points_seen": "1"
          },
          "subtask_of": "OpaqueRef:NULL",
          "subtasks": [],
          "backtrace": "(((process xapi)(filename ocaml/xapi-client/client.ml)(line 7))((process xapi)(filename ocaml/xapi-client/client.ml)(line 19))((process xapi)(filename ocaml/xapi-client/client.ml)(line 7879))((process xapi)(filename ocaml/libs/xapi-stdext/lib/xapi-stdext-pervasives/pervasiveext.ml)(line 24))((process xapi)(filename ocaml/libs/xapi-stdext/lib/xapi-stdext-pervasives/pervasiveext.ml)(line 39))((process xapi)(filename ocaml/xapi/message_forwarding.ml)(line 144))((process xapi)(filename ocaml/libs/xapi-stdext/lib/xapi-stdext-pervasives/pervasiveext.ml)(line 24))((process xapi)(filename ocaml/libs/xapi-stdext/lib/xapi-stdext-pervasives/pervasiveext.ml)(line 39))((process xapi)(filename ocaml/xapi/message_forwarding.ml)(line 1990))((process xapi)(filename ocaml/libs/xapi-stdext/lib/xapi-stdext-pervasives/pervasiveext.ml)(line 24))((process xapi)(filename ocaml/libs/xapi-stdext/lib/xapi-stdext-pervasives/pervasiveext.ml)(line 39))((process xapi)(filename ocaml/libs/xapi-stdext/lib/xapi-stdext-pervasives/pervasiveext.ml)(line 24))((process xapi)(filename ocaml/libs/xapi-stdext/lib/xapi-stdext-pervasives/pervasiveext.ml)(line 39))((process xapi)(filename ocaml/xapi/message_forwarding.ml)(line 1974))((process xapi)(filename ocaml/xapi/rbac.ml)(line 228))((process xapi)(filename ocaml/xapi/rbac.ml)(line 238))((process xapi)(filename ocaml/xapi/server_helpers.ml)(line 78)))"
        },
        "message": "SR_BACKEND_FAILURE_1200(, [Errno 30] Read-only file system: '/dev/drbd/by-res/xcp-volume-dc9de686-850a-4c6f-896f-0dacc2e1e35c/0', )",
        "name": "XapiError",
        "stack": "XapiError: SR_BACKEND_FAILURE_1200(, [Errno 30] Read-only file system: '/dev/drbd/by-res/xcp-volume-dc9de686-850a-4c6f-896f-0dacc2e1e35c/0', )
          at XapiError.wrap (file:///opt/xo/xo-builds/xen-orchestra-202605041856/packages/xen-api/_XapiError.mjs:16:12)
          at default (file:///opt/xo/xo-builds/xen-orchestra-202605041856/packages/xen-api/_getTaskResult.mjs:13:29)
          at Xapi._addRecordToCache (file:///opt/xo/xo-builds/xen-orchestra-202605041856/packages/xen-api/index.mjs:1078:24)
          at file:///opt/xo/xo-builds/xen-orchestra-202605041856/packages/xen-api/index.mjs:1112:14
          at Array.forEach (<anonymous>)
          at Xapi._processEvents (file:///opt/xo/xo-builds/xen-orchestra-202605041856/packages/xen-api/index.mjs:1102:12)
          at Xapi._watchEvents (file:///opt/xo/xo-builds/xen-orchestra-202605041856/packages/xen-api/index.mjs:1275:14)"
      }
      

      However another vm with a xostor vdi started
      7af8a0b6-3276-4fd0-81f0-f669ff93d5aa-image.jpeg

      When I look at that resource in linstor/xostor

      jonathon@jonathon-framework:~$ linstor --controllers=10.2.0.11 r l | grep -e 'xcp-volume-dc9de686-850a-4c6f-896f-0dacc2e1e35c'
      | xcp-volume-dc9de686-850a-4c6f-896f-0dacc2e1e35c | ovbh-pprod-xen01                         | DRBD,STORAGE | Unused | Connecting(ovbh-pprod-xen04)                                                           |     UpToDate | 2025-05-23 13:49:57 |
      | xcp-volume-dc9de686-850a-4c6f-896f-0dacc2e1e35c | ovbh-pprod-xen02                         | DRBD,STORAGE | Unused | Connecting(ovbh-pprod-xen04)                                                           |     UpToDate | 2025-05-23 13:49:57 |
      | xcp-volume-dc9de686-850a-4c6f-896f-0dacc2e1e35c | ovbh-pprod-xen04                         | DRBD,STORAGE | Unused | StandAlone(ovbh-pprod-xen02,ovbh-pprod-xen01)                                          |     UpToDate | 2025-05-23 13:49:57 |
      

      Restarting the satellite on xen04 does not help.

      jonathon@jonathon-framework:~$ linstor --controllers=10.2.0.11 n l
      ╭──────────────────────────────────────────────────────────────────────────────────────────╮
      ā”Š Node                                     ā”Š NodeType  ā”Š Addresses               ā”Š State   ā”Š
      ā•žā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•”
      ā”Š ovbh-pprod-xen01                         ā”Š COMBINED  ā”Š 10.2.0.10:3366 (PLAIN)  ā”Š Online  ā”Š
      ā”Š ovbh-pprod-xen02                         ā”Š COMBINED  ā”Š 10.2.0.11:3366 (PLAIN)  ā”Š Online  ā”Š
      ā”Š ovbh-pprod-xen03                         ā”Š COMBINED  ā”Š 10.2.0.12:3366 (PLAIN)  ā”Š Online  ā”Š
      ā”Š ovbh-pprod-xen04                         ā”Š COMBINED  ā”Š 10.2.0.13:3366 (PLAIN)  ā”Š Online  ā”Š
      ā”Š ovbh-pprod-xen05                         ā”Š COMBINED  ā”Š 10.2.0.14:3366 (PLAIN)  ā”Š Online  ā”Š
      ā”Š ovbh-vprod-k8s01-worker01.example.com ā”Š SATELLITE ā”Š 10.1.8.103:3366 (PLAIN) ā”Š Online  ā”Š
      ā”Š ovbh-vprod-k8s01-worker02.example.com ā”Š SATELLITE ā”Š 10.1.8.104:3366 (PLAIN) ā”Š Online  ā”Š
      ā”Š ovbh-vprod-k8s01-worker03.example.com ā”Š SATELLITE ā”Š 10.1.8.105:3366 (PLAIN) ā”Š Online  ā”Š
      ā”Š ovbh-vprod-k8s01-worker10.example.com ā”Š SATELLITE ā”Š 10.1.8.112:3366 (PLAIN) ā”Š OFFLINE ā”Š
      ā”Š ovbh-vprod-k8s01-worker13.example.com ā”Š SATELLITE ā”Š 10.1.8.115:3366 (PLAIN) ā”Š Online  ā”Š
      ā”Š ovbh-vprod-rancher01.example.com      ā”Š SATELLITE ā”Š 10.1.8.41:3366 (PLAIN)  ā”Š Online  ā”Š
      ā”Š ovbh-vprod-rancher02.example.com      ā”Š SATELLITE ā”Š 10.1.8.42:3366 (PLAIN)  ā”Š Online  ā”Š
      ā”Š ovbh-vprod-rancher03.example.com      ā”Š SATELLITE ā”Š 10.1.8.43:3366 (PLAIN)  ā”Š Online  ā”Š
      ā”Š ovbh-vtest-k8s01-worker01.example.com ā”Š SATELLITE ā”Š 10.1.8.64:3366 (PLAIN)  ā”Š Online  ā”Š
      ā”Š ovbh-vtest-k8s01-worker02.example.com ā”Š SATELLITE ā”Š 10.1.8.65:3366 (PLAIN)  ā”Š Online  ā”Š
      ā”Š ovbh-vtest-k8s01-worker03.example.com ā”Š SATELLITE ā”Š 10.1.8.66:3366 (PLAIN)  ā”Š Online  ā”Š
      ā”Š ovbh-vtest-k8s01-worker04.example.com ā”Š SATELLITE ā”Š 10.1.8.60:3366 (PLAIN)  ā”Š OFFLINE ā”Š
      ā”Š ovbh-vtest-k8s01-worker05.example.com ā”Š SATELLITE ā”Š 10.1.8.59:3366 (PLAIN)  ā”Š Online  ā”Š
      ╰──────────────────────────────────────────────────────────────────────────────────────────╯
      

      Looking at logs on xen04

      [Mon May 11 00:46:28 2026] drbd xcp-volume-dc9de686-850a-4c6f-896f-0dacc2e1e35c ovbh-pprod-xen01: conn( StandAlone -> Unconnected ) [connect]
      [Mon May 11 00:46:28 2026] drbd xcp-volume-dc9de686-850a-4c6f-896f-0dacc2e1e35c ovbh-pprod-xen01: Starting receiver thread (peer-node-id 0)
      [Mon May 11 00:46:28 2026] drbd xcp-volume-dc9de686-850a-4c6f-896f-0dacc2e1e35c ovbh-pprod-xen01: conn( Unconnected -> Connecting ) [connecting]
      [Mon May 11 00:46:28 2026] drbd xcp-volume-dc9de686-850a-4c6f-896f-0dacc2e1e35c ovbh-pprod-xen02: conn( StandAlone -> Unconnected ) [connect]
      [Mon May 11 00:46:28 2026] drbd xcp-volume-dc9de686-850a-4c6f-896f-0dacc2e1e35c ovbh-pprod-xen02: Starting receiver thread (peer-node-id 1)
      [Mon May 11 00:46:28 2026] drbd xcp-volume-dc9de686-850a-4c6f-896f-0dacc2e1e35c ovbh-pprod-xen02: conn( Unconnected -> Connecting ) [connecting]
      [Mon May 11 00:46:28 2026] drbd xcp-volume-dc9de686-850a-4c6f-896f-0dacc2e1e35c ovbh-pprod-xen01: Handshake to peer 0 successful: Agreed network protocol version 123
      [Mon May 11 00:46:28 2026] drbd xcp-volume-dc9de686-850a-4c6f-896f-0dacc2e1e35c ovbh-pprod-xen01: Feature flags enabled on protocol level: 0x7f TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES RESYNC_DAGTAG
      [Mon May 11 00:46:28 2026] drbd xcp-volume-dc9de686-850a-4c6f-896f-0dacc2e1e35c ovbh-pprod-xen01: expected AuthChallenge packet, received: P_PROTOCOL (0x000b)
      [Mon May 11 00:46:28 2026] drbd xcp-volume-dc9de686-850a-4c6f-896f-0dacc2e1e35c ovbh-pprod-xen01: Authentication of peer failed
      [Mon May 11 00:46:28 2026] drbd xcp-volume-dc9de686-850a-4c6f-896f-0dacc2e1e35c ovbh-pprod-xen01: conn( Connecting -> Disconnecting )
      [Mon May 11 00:46:28 2026] drbd xcp-volume-dc9de686-850a-4c6f-896f-0dacc2e1e35c ovbh-pprod-xen01: Terminating sender thread
      [Mon May 11 00:46:28 2026] drbd xcp-volume-dc9de686-850a-4c6f-896f-0dacc2e1e35c ovbh-pprod-xen01: Starting sender thread (peer-node-id 0)
      [Mon May 11 00:46:28 2026] drbd xcp-volume-dc9de686-850a-4c6f-896f-0dacc2e1e35c ovbh-pprod-xen02: Handshake to peer 1 successful: Agreed network protocol version 123
      [Mon May 11 00:46:28 2026] drbd xcp-volume-dc9de686-850a-4c6f-896f-0dacc2e1e35c ovbh-pprod-xen02: Feature flags enabled on protocol level: 0x7f TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES RESYNC_DAGTAG
      [Mon May 11 00:46:28 2026] drbd xcp-volume-dc9de686-850a-4c6f-896f-0dacc2e1e35c ovbh-pprod-xen02: expected AuthChallenge packet, received: P_PROTOCOL (0x000b)
      [Mon May 11 00:46:28 2026] drbd xcp-volume-dc9de686-850a-4c6f-896f-0dacc2e1e35c ovbh-pprod-xen02: Authentication of peer failed
      [Mon May 11 00:46:28 2026] drbd xcp-volume-dc9de686-850a-4c6f-896f-0dacc2e1e35c ovbh-pprod-xen02: conn( Connecting -> Disconnecting )
      [Mon May 11 00:46:28 2026] drbd xcp-volume-dc9de686-850a-4c6f-896f-0dacc2e1e35c ovbh-pprod-xen02: Terminating sender thread
      [Mon May 11 00:46:28 2026] drbd xcp-volume-dc9de686-850a-4c6f-896f-0dacc2e1e35c ovbh-pprod-xen02: Starting sender thread (peer-node-id 1)
      [Mon May 11 00:46:28 2026] drbd xcp-volume-dc9de686-850a-4c6f-896f-0dacc2e1e35c ovbh-pprod-xen01: Connection closed
      [Mon May 11 00:46:28 2026] drbd xcp-volume-dc9de686-850a-4c6f-896f-0dacc2e1e35c ovbh-pprod-xen01: helper command: /sbin/drbdadm disconnected
      [Mon May 11 00:46:28 2026] drbd xcp-volume-dc9de686-850a-4c6f-896f-0dacc2e1e35c ovbh-pprod-xen02: Connection closed
      [Mon May 11 00:46:28 2026] drbd xcp-volume-dc9de686-850a-4c6f-896f-0dacc2e1e35c ovbh-pprod-xen02: helper command: /sbin/drbdadm disconnected
      [Mon May 11 00:46:28 2026] drbd xcp-volume-dc9de686-850a-4c6f-896f-0dacc2e1e35c ovbh-pprod-xen01: helper command: /sbin/drbdadm disconnected exit code 0
      [Mon May 11 00:46:28 2026] drbd xcp-volume-dc9de686-850a-4c6f-896f-0dacc2e1e35c ovbh-pprod-xen01: conn( Disconnecting -> StandAlone ) [disconnected]
      [Mon May 11 00:46:28 2026] drbd xcp-volume-dc9de686-850a-4c6f-896f-0dacc2e1e35c ovbh-pprod-xen01: Terminating receiver thread
      [Mon May 11 00:46:28 2026] drbd xcp-volume-dc9de686-850a-4c6f-896f-0dacc2e1e35c ovbh-pprod-xen02: helper command: /sbin/drbdadm disconnected exit code 0
      [Mon May 11 00:46:28 2026] drbd xcp-volume-dc9de686-850a-4c6f-896f-0dacc2e1e35c ovbh-pprod-xen02: conn( Disconnecting -> StandAlone ) [disconnected]
      [Mon May 11 00:46:28 2026] drbd xcp-volume-dc9de686-850a-4c6f-896f-0dacc2e1e35c ovbh-pprod-xen02: Terminating receiver thread
      
      [00:44 ovbh-pprod-xen04 ~]# drbdadm status xcp-volume-dc9de686-850a-4c6f-896f-0dacc2e1e35c
      xcp-volume-dc9de686-850a-4c6f-896f-0dacc2e1e35c role:Secondary suspended:quorum
        disk:UpToDate quorum:no open:no blocked:upper
        ovbh-pprod-xen01 connection:StandAlone
        ovbh-pprod-xen02 connection:StandAlone
      

      I did update all servers to this patch https://xcp-ng.org/blog/2026/05/05/april-2026-security-and-maintenance-updates-for-xcp-ng-8-3-lts-2/
      And everything got restarted and was happy. Shortly after I saw this, https://xcp-ng.org/blog/2026/05/07/may-2026-updates-2-for-xcp-ng-8-3-lts/, and installed it to all hosts.
      b6333da8-1667-4863-9a15-1452d9803dd0-image.jpeg
      xen01 is the current master
      61a98b9d-ac8d-4493-9e59-dcf5883a2a0b-image.jpeg

      Has anyone seen this before?

      posted in XOSTOR
      J
      Jonathon
    • RE: Attempting to add new host fail on xoa and on server, worked on xcp-ng center

      xe pool-enable-tls-verification Was exactly what I needed, thanks! Worked after that.

      posted in Management
      J
      Jonathon
    • RE: Attempting to add new host fail on xoa and on server, worked on xcp-ng center

      @psafont Sorry was swamped with other things. As listed above I get the same error, forced or not, from xcp-ng center, xcp-ng host, or xoa.

      1fdda333-0842-4281-ae69-e6c886ec1542-image.png
      tls verification has always been off, and in the past we have not had issue with adding new host to pool.

      I have taken no other actions since my last posting.

      posted in Management
      J
      Jonathon
    • RE: Attempting to add new host fail on xoa and on server, worked on xcp-ng center

      I see, it also says
      name ( RO): sdn-controller-ca.pem
      host ( RO): <not in database>
      Like in the issue, but the file exists.

      [11:28 ovbh-pprod-xen05 ~]# xe certificate-list
      uuid ( RO)           : afdd9c8e-dcae-17c7-c35c-0fd7cebd387a
                 type ( RO): host
                 name ( RO): 
                 host ( RO): f0cec10f-ad05-48e4-893c-414b3a3e15be
           not-before ( RO): 20251110T23:15:51Z
            not-after ( RO): 20351108T23:15:51Z
          fingerprint ( RO): BF:83:23:BB:7B:E9:30:DE:86:EA:9D:AF:DF:F8:BA:34:39:D0:81:AD:34:E5:C6:AB:0C:49:41:7B:4A:3C:8B:9E
      
      
      uuid ( RO)           : b8dcd1f0-ef65-e762-f189-46bb78766c6b
                 type ( RO): ca
                 name ( RO): sdn-controller-ca.pem
                 host ( RO): <not in database>
           not-before ( RO): 20200416T00:17:31Z
            not-after ( RO): 20470901T00:17:31Z
          fingerprint ( RO): 63:1F:89:3F:0E:1F:86:52:34:95:3C:6C:3F:9C:C8:B3:5A:61:6B:4D:EE:8F:A7:11:F0:BA:79:8B:C7:15:A0:E0
      
      
      uuid ( RO)           : e7daedf2-7f35-ba40-093a-e0c011d91633
                 type ( RO): host_internal
                 name ( RO): 
                 host ( RO): f0cec10f-ad05-48e4-893c-414b3a3e15be
           not-before ( RO): 20251110T23:15:46Z
            not-after ( RO): 20351108T23:15:46Z
          fingerprint ( RO): 71:41:B0:25:88:AA:E4:56:EE:F7:A9:8E:0A:A9:FE:C5:6A:0D:D5:37:30:BF:C8:81:C2:D7:B8:20:7A:6C:7F:B7
      
      
      [13:50 ovbh-pprod-xen05 ~]# ll /etc/stunnel/certs/sdn-controller-ca.pem
      -rw-r--r-- 1 root root 1907 Nov 12 09:45 /etc/stunnel/certs/sdn-controller-ca.pem
      

      Removing it did not help, same error

      [13:54 ovbh-pprod-xen05 ~]# xe certificate-list
      uuid ( RO)           : afdd9c8e-dcae-17c7-c35c-0fd7cebd387a
                 type ( RO): host
                 name ( RO): 
                 host ( RO): f0cec10f-ad05-48e4-893c-414b3a3e15be
           not-before ( RO): 20251110T23:15:51Z
            not-after ( RO): 20351108T23:15:51Z
          fingerprint ( RO): BF:83:23:BB:7B:E9:30:DE:86:EA:9D:AF:DF:F8:BA:34:39:D0:81:AD:34:E5:C6:AB:0C:49:41:7B:4A:3C:8B:9E
      
      
      uuid ( RO)           : e7daedf2-7f35-ba40-093a-e0c011d91633
                 type ( RO): host_internal
                 name ( RO): 
                 host ( RO): f0cec10f-ad05-48e4-893c-414b3a3e15be
           not-before ( RO): 20251110T23:15:46Z
            not-after ( RO): 20351108T23:15:46Z
          fingerprint ( RO): 71:41:B0:25:88:AA:E4:56:EE:F7:A9:8E:0A:A9:FE:C5:6A:0D:D5:37:30:BF:C8:81:C2:D7:B8:20:7A:6C:7F:B7
      

      I also confirmed that all the certs for the hosts are current and not expired.

      posted in Management
      J
      Jonathon
    • RE: Attempting to add new host fail on xoa and on server, worked on xcp-ng center

      eee8bee1-ce6f-47c2-b5f0-1cd9b942db79-image.png
      9eea1860-e725-4e3c-85ff-0c3351beff45-image.png

      Boo

      posted in Management
      J
      Jonathon