XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login
    1. Home
    2. MajorP93
    3. Best
    M
    Offline
    • Profile
    • Following 0
    • Followers 0
    • Topics 6
    • Posts 75
    • Groups 0

    Posts

    Recent Best Controversial
    • Xen Orchestra OpenMetrics Plugin - Grafana Dashboard

      Hello XCP-ng community!

      Since Vates released the new OpenMetrics plugin for Xen Orchestra we now have an official, built-in exporter for Prometheus metrics!

      I was using xen-exporter before in order to make hypervisor internal RRD database available in the form of Prometheus metrics.
      I migrated to the new plugin which works just fine.

      I updated the Grafana dashboard that I was using in order to be compatible with the official OpenMetrics plugin and thought "why not share it with other users"?

      In case you are interested you can find my dashboard JSON here: https://gist.github.com/MajorP93/3a933a6f03b4c4e673282fb54a68474b

      It is based on the xen-exporter dashboard made by MikeDombo: https://grafana.com/grafana/dashboards/16588-xen/

      In case you also use Prometheus for scraping Xen Orchestra OpenMetrics plugin in combination with Grafana you can copy the JSON from my gist, import it and you are ready to go!

      Hope it helps!

      Might even be a good idea to include the dashboard as an example in the Xen Orchestra documentation. 🙂

      Best regards

      posted in Infrastructure as Code
      M
      MajorP93
    • RE: XO5 breaks after defaulting to XO6 (from source)

      @MathieuRA I disabled Traefik and reverted to my old XO config (port 443, ssl encryption, http to https redirection), rebuild the docker container using your branch and tested:

      it is working fine on my end now 🙂

      Thank you very much!

      I did not expect this to get fixed so fast!

      posted in Xen Orchestra
      M
      MajorP93
    • RE: Xen Orchestra OpenMetrics Plugin - Grafana Dashboard

      @Mang0Musztarda said in Xen Orchestra OpenMetrics Plugin - Grafana Dashboard:

      @MajorP93 hi, how can i scrape openmetrics endpoint?
      i set up openmetrics plugin prometheus secret, enabled it, and ten tried to use curl like that: curl -H "Authorization: Bearer abc123" http://localhost:9004
      but response i got was
      {"error":"Query authentication does not match server setting"}
      what am i doing wrong?

      Hey!
      I scrape it like so:

      root@prometheus01:~# cat /etc/prometheus/scrape_configs/xen-orchestra-openmetrics.yml 
      scrape_configs:
        - job_name: xen-orchestra
          honor_labels: true
          scrape_interval: 30s
          scrape_timeout: 20s
          scheme: https
          tls_config:
            insecure_skip_verify: true
          bearer_token_file: /etc/prometheus/bearer.token
          metrics_path: /openmetrics/metrics
          static_configs:
          - targets:
            - xen-orchestra.domain.local
      

      /etc/prometheus/bearer.token file contains the bearer token as configured in openmetrics xen orchestra plugin.

      best regards

      posted in Infrastructure as Code
      M
      MajorP93
    • RE: Remote syslog broken after update/reboot? - Changing it away, then back fixes.

      @rzr Thank you very much!

      @michmoor0725 Absolutely! The community is another aspect of why working with XCP-ng is a lot more fun compared to working with VMWare!

      posted in Compute
      M
      MajorP93
    • RE: [VDDK V2V] Migration of VM that had more than 1 snapshot creates multiple VHDs

      @florent said in [VDDK V2V] Migration of VM that had more than 1 snapshot creates multiple VHDs:

      @MajorP93 the size are different between the disks, did you modify it since the snapshots ?

      would it be possible to take one new snapshot with the same disk structure ?

      Sorry it was my bad indeed.
      On the VMWare side there are 2 VMs that have almost the exact same name.
      When I checked for disk layout to verify this was an issue I looked at the wrong VM. 🤦

      I checked again and can confirm that the VM in question has 1x 60GiB and 1x 25GiB VMDK.

      So this is not an issue. It is working as intended.

      Thread can be closed / deleted.
      Sorry again and thanks for the replies.

      Best regards
      MajorP

      posted in Xen Orchestra
      M
      MajorP93
    • RE: Xen Orchestra Node 24 compatibility

      said in Xen Orchestra Node 24 compatibility:

      After moving from Node 22 to Node 24 on my XO instance I started to see more "Error: ENOMEM: not enough memory, close" for my backup jobs even though my XO VM has 8GB of RAM...

      I will revert back to Node 22 for now.

      I did some further troubleshooting and was able to pinpoint it down to SMB encryption on Xen Orchestra backup remotes ("seal" CIFS mount flag).
      "ENOMEM" errors seem to occur only when I enable previously explained option.
      Seems to be related to some buffering that is controlled by Linux kernel CIFS implementation that is failing when SMB encryption is being used.
      CIFS operation gets killed due to buffer exhaustion caused by encryption and Xen Orchestra shows "ENOMEM".
      Somehow this issue gets more visible when using Node 24 vs Node 22 which is why I thought it was caused by the Node version + XO version combination. I switched Node version at the same time I enabled SMB encryption.
      However this seems to be not directly related to Xen Orchestra and more a Node / Linux kernel CIFS implementation thing.
      Apparently not a Xen Orchestra bug per se.

      posted in Xen Orchestra
      M
      MajorP93
    • RE: Long backup times via NFS to Data Domain from Xen Orchestra

      Hey,
      small update:
      while adding the backup section and "diskPerVmConcurrency" option to "/etc/xo-server/config.diskConcurrency.toml" or "~/.config/xo-server/config.diskConcurrency.toml" had no effect for me, I was able to get this working by adding it at the end of my main XO config file at "/etc/xo-server/config.toml".

      Best regards

      posted in Backup
      M
      MajorP93
    • RE: Potential bug with Windows VM backup: "Body Timeout Error"

      I worked around this issue by changing my full backup job to "delta backup" and enabling "force full backup" in the schedule options.

      Delta backup seems more reliable as of now.

      Looking forward to a fix as Zstd compression is an appealing feature of the full backup method.

      posted in Backup
      M
      MajorP93
    • RE: Potential bug with Windows VM backup: "Body Timeout Error"

      I can imagine that a fix could be to send "keepalive" packets in addition to the XCP-ng export-VM-data-stream so that the timeout on XO side does not occur 🤔

      posted in Backup
      M
      MajorP93
    • Restoring folder via backup file restore feature broken for .tar.gz

      Hello XCP-ng community and Vates-Team,

      I just observed a weird behavior of Xen Orchestra during backup file restore.

      Background: I had to restore a directory that got deleted on a small file server Windows VM by accident.

      I used Xen Orchestra's file restore menu to select the VM, restore point and path of the directory in question.
      Initially I selected .tar.gz as export format and started the restore process.
      A new browser tab opened and after a few minutes it showed "Error proxying request".
      Then Xen Orchestra became almost fully unresponsive for like 5min but started to behave normal again after said time.

      I then tried the same thing again: selected same VM, restore point, path etc. but this time opted for ".zip (slow)" option as export format.
      That worked without any issues. Download started after like 5 seconds, no issues whatsoever.

      Did somebody else encounter similar issues?
      Maybe the .tar.gz functionality of Xen Orchestra needs investigation.
      Just wanted to report this issue and ask if maybe somebody else encountered it.

      Thanks and best regards

      //EDIT: oh forgot to mention: I am running a fully patched XCP-ng 8.3 pool and latest XO CE on a Debian 13 VM. NodeJS version is 24 LTS.

      posted in Backup
      M
      MajorP93
    • RE: "NOT_SUPPORTED_DURING_UPGRADE()" error after yesterday's update

      @magicker said in "NOT_SUPPORTED_DURING_UPGRADE()" error after yesterday's update:

      @olivierlambert said in "NOT_SUPPORTED_DURING_UPGRADE()" error after yesterday's update:

      Because doing an update without rebooting doesn't reload the updated main programs, like XAPI. A host in only updated after a full reboot.

      Reply

      Hi there
      Is it just me or is this a chicken and egg situation.

      you upgrade the master... how the pool is in NOT_SUPPORTED_DURING_UPGRADE() stage. You cant move vms off the master so all you can do is close down vms.. reboot.. pray

      then move the a non master.. you cant move the vms off here either NOT_SUPPORTED_DURING_UPGRADE(). So you have do the same..

      needless to say I hit issues on each reboot which caused 30- 60 min delays in getting vms back up and running.

      can you Warm migrate or is this dead also (to scared to test)

      For me this workflow worked every time there were upgrades available:

      -disable HA on pool level
      -disable load balancer plugin
      -upgrade master
      -upgrade all other nodes
      -restart toolstack on master
      -restart toolstack on all other nodes
      -live migrate all VMs running on master to other node(s)
      -reboot master
      -reboot next node (live migrate all VMs running on that particular node away before doing so)
      -repeat until all nodes have been rebooted (one node at a time)
      -re-enable HA on pool level
      -re-enable load balancer plugin

      Never had any issues with that. No downtime for none of the VMs.

      posted in Backup
      M
      MajorP93
    • RE: Potential bug with Windows VM backup: "Body Timeout Error"

      @andriy.sultanov said in Potential bug with Windows VM backup: "Body Timeout Error":

      xe-toolstack-restart

      Okay I was able to replicate the issue.
      This is the setup that I used and that resulted in the "body timeout error" previously discussed in this thread:

      OS: Windows Server 2019 Datacenter
      1.png
      2.png

      The versions of the packages in question that were used in order to replicate the issue (XCP-ng 8.3, fully upgraded):

      [11:58 dat-xcpng-test01 ~]# rpm -q xapi-core
      xapi-core-25.27.0-2.2.xcpng8.3.x86_64
      [11:59 dat-xcpng-test01 ~]# rpm -q qcow-stream-tool
      qcow-stream-tool-25.27.0-2.2.xcpng8.3.x86_64
      [11:59 dat-xcpng-test01 ~]# rpm -q vhd-tool
      vhd-tool-25.27.0-2.2.xcpng8.3.x86_64
      

      Result:
      3.png
      Backup log:

      {
        "data": {
          "mode": "full",
          "reportWhen": "failure"
        },
        "id": "1764585634255",
        "jobId": "b19ed05e-a34f-4fab-b267-1723a7195f4e",
        "jobName": "Full-Backup-Test",
        "message": "backup",
        "scheduleId": "579d937a-cf57-47b2-8cde-4e8325422b15",
        "start": 1764585634255,
        "status": "failure",
        "infos": [
          {
            "data": {
              "vms": [
                "36c492a8-e321-ef2b-94dc-a14e5757d711"
              ]
            },
            "message": "vms"
          }
        ],
        "tasks": [
          {
            "data": {
              "type": "VM",
              "id": "36c492a8-e321-ef2b-94dc-a14e5757d711",
              "name_label": "Win2019_EN_DC_TEST"
            },
            "id": "1764585635692",
            "message": "backup VM",
            "start": 1764585635692,
            "status": "failure",
            "tasks": [
              {
                "id": "1764585635919",
                "message": "snapshot",
                "start": 1764585635919,
                "status": "success",
                "end": 1764585644161,
                "result": "0f548c1f-ce5c-56e3-0259-9c59b7851a17"
              },
              {
                "data": {
                  "id": "f1bc8d14-10dd-4440-bb1d-409b91f3b550",
                  "type": "remote",
                  "isFull": true
                },
                "id": "1764585644192",
                "message": "export",
                "start": 1764585644192,
                "status": "failure",
                "tasks": [
                  {
                    "id": "1764585644201",
                    "message": "transfer",
                    "start": 1764585644201,
                    "status": "failure",
                    "end": 1764586308921,
                    "result": {
                      "name": "BodyTimeoutError",
                      "code": "UND_ERR_BODY_TIMEOUT",
                      "message": "Body Timeout Error",
                      "stack": "BodyTimeoutError: Body Timeout Error\n    at FastTimer.onParserTimeout [as _onTimeout] (/opt/xo/xo-builds/xen-orchestra-202511080402/node_modules/undici/lib/dispatcher/client-h1.js:646:28)\n    at Timeout.onTick [as _onTimeout] (/opt/xo/xo-builds/xen-orchestra-202511080402/node_modules/undici/lib/util/timers.js:162:13)\n    at listOnTimeout (node:internal/timers:588:17)\n    at process.processTimers (node:internal/timers:523:7)"
                    }
                  }
                ],
                "end": 1764586308922,
                "result": {
                  "name": "BodyTimeoutError",
                  "code": "UND_ERR_BODY_TIMEOUT",
                  "message": "Body Timeout Error",
                  "stack": "BodyTimeoutError: Body Timeout Error\n    at FastTimer.onParserTimeout [as _onTimeout] (/opt/xo/xo-builds/xen-orchestra-202511080402/node_modules/undici/lib/dispatcher/client-h1.js:646:28)\n    at Timeout.onTick [as _onTimeout] (/opt/xo/xo-builds/xen-orchestra-202511080402/node_modules/undici/lib/util/timers.js:162:13)\n    at listOnTimeout (node:internal/timers:588:17)\n    at process.processTimers (node:internal/timers:523:7)"
                }
              },
              {
                "id": "1764586443440",
                "message": "clean-vm",
                "start": 1764586443440,
                "status": "success",
                "end": 1764586443459,
                "result": {
                  "merge": false
                }
              },
              {
                "id": "1764586443624",
                "message": "snapshot",
                "start": 1764586443624,
                "status": "success",
                "end": 1764586451966,
                "result": "c3e9736e-d6eb-3669-c7b8-f603333a83bf"
              },
              {
                "data": {
                  "id": "f1bc8d14-10dd-4440-bb1d-409b91f3b550",
                  "type": "remote",
                  "isFull": true
                },
                "id": "1764586452003",
                "message": "export",
                "start": 1764586452003,
                "status": "success",
                "tasks": [
                  {
                    "id": "1764586452008",
                    "message": "transfer",
                    "start": 1764586452008,
                    "status": "success",
                    "end": 1764586686887,
                    "result": {
                      "size": 10464489322
                    }
                  }
                ],
                "end": 1764586686900
              },
              {
                "id": "1764586690122",
                "message": "clean-vm",
                "start": 1764586690122,
                "status": "success",
                "end": 1764586690140,
                "result": {
                  "merge": false
                }
              }
            ],
            "warnings": [
              {
                "data": {
                  "attempt": 1,
                  "error": "Body Timeout Error"
                },
                "message": "Retry the VM backup due to an error"
              }
            ],
            "end": 1764586690142
          }
        ],
        "end": 1764586690143
      }
      

      I then enabled your test repository and installed the packages that you mentioned:

      [12:01 dat-xcpng-test01 ~]# rpm -q xapi-core
      xapi-core-25.27.0-2.3.0.xvafix.1.xcpng8.3.x86_64
      [12:08 dat-xcpng-test01 ~]# rpm -q vhd-tool
      vhd-tool-25.27.0-2.3.0.xvafix.1.xcpng8.3.x86_64
      [12:08 dat-xcpng-test01 ~]# rpm -q qcow-stream-tool
      qcow-stream-tool-25.27.0-2.3.0.xvafix.1.xcpng8.3.x86_64
      

      I restarted tool-stack and re-ran the backup job.
      Unfortunately it did not solve the issue and made the backup behave very strangely:
      9c9e9fdc-8385-4df2-9d23-7b0e4ecee0cd-grafik.png
      The backup job ran only a few seconds and reported that it was "successful". But only 10.83KiB were transferred. There are 18GB used space on this VM. So the data unfortunately was not transferred by the backup job.

      25deccb4-295e-4ce1-a015-159780536122-grafik.png

      Here is the backup log:

      {
        "data": {
          "mode": "full",
          "reportWhen": "failure"
        },
        "id": "1764586964999",
        "jobId": "b19ed05e-a34f-4fab-b267-1723a7195f4e",
        "jobName": "Full-Backup-Test",
        "message": "backup",
        "scheduleId": "579d937a-cf57-47b2-8cde-4e8325422b15",
        "start": 1764586964999,
        "status": "success",
        "infos": [
          {
            "data": {
              "vms": [
                "36c492a8-e321-ef2b-94dc-a14e5757d711"
              ]
            },
            "message": "vms"
          }
        ],
        "tasks": [
          {
            "data": {
              "type": "VM",
              "id": "36c492a8-e321-ef2b-94dc-a14e5757d711",
              "name_label": "Win2019_EN_DC_TEST"
            },
            "id": "1764586966983",
            "message": "backup VM",
            "start": 1764586966983,
            "status": "success",
            "tasks": [
              {
                "id": "1764586967194",
                "message": "snapshot",
                "start": 1764586967194,
                "status": "success",
                "end": 1764586975429,
                "result": "ebe5c4e2-5746-9cb3-7df6-701774a679b5"
              },
              {
                "data": {
                  "id": "f1bc8d14-10dd-4440-bb1d-409b91f3b550",
                  "type": "remote",
                  "isFull": true
                },
                "id": "1764586975453",
                "message": "export",
                "start": 1764586975453,
                "status": "success",
                "tasks": [
                  {
                    "id": "1764586975473",
                    "message": "transfer",
                    "start": 1764586975473,
                    "status": "success",
                    "end": 1764586981992,
                    "result": {
                      "size": 11093
                    }
                  }
                ],
                "end": 1764586982054
              },
              {
                "id": "1764586985271",
                "message": "clean-vm",
                "start": 1764586985271,
                "status": "success",
                "end": 1764586985290,
                "result": {
                  "merge": false
                }
              }
            ],
            "end": 1764586985291
          }
        ],
        "end": 1764586985292
      }
      

      If you need me to test something else or if I should provide some log file from the XCP-ng system please let me know.

      Best regards

      posted in Backup
      M
      MajorP93
    • RE: Potential bug with Windows VM backup: "Body Timeout Error"

      @andriy.sultanov I created a small test setup in our lab. I created a WIndows VM with a lot of free disk space (2 virtual disks, 2.5 TB free space in total). Hopefully that way I will be able to replicate the issue with full backup timeout for VMs with a lot of free space that occurred in our production environment.
      The backup job is currently running. I will report back once it failed and once I had a chance to test if your fix solves the issue.

      posted in Backup
      M
      MajorP93
    • RE: Async.VM.pool_migrate stuck at 57%

      @wmazren I had a similar issue which costed my many hours to troubleshoot.

      I'd advise you to check "dmesg" output within the VM that is not able to get live migrated.

      XCP-ng / Xen behaves different than VMWare regarding live migration.

      XCP-ng will interact with the linux kernel upon live migration and the kernel will try to freeze all processes before performing the live migration.

      In my case a "fuse" process blocked the graceful freezing of all processes and my live migration task also stuck in task view similar to your case.

      After solving the fuse process issue and therefore making the system able to live migrate the issue was gone.

      All of this can be viewed in dmesg as the kernel will tell you about what is being done during live migration via XCP-ng.

      //EDIT: another thing you might want to try is toggling "migration compression" in pool settings as well as making sure you have a dedicated connection / VLAN configured for the live migration. Those 2 things also helped my live migrations being faster and more robust.

      posted in Management
      M
      MajorP93
    • RE: Remote syslog broken after update/reboot? - Changing it away, then back fixes.

      @gduperrey Hey,
      I tested it at can confirm that after applying latest set of patches and rebooting remote syslog is still working fine.
      It appears to be fixed, good job guys 🙂

      posted in Compute
      M
      MajorP93
    • RE: Remote syslog broken after update/reboot? - Changing it away, then back fixes.

      @gduperrey Thanks!
      Will test tomorrow as our internal lab / test environment is currently unavailable.
      I will inform you about the results of my testing here.

      Best regards

      posted in Compute
      M
      MajorP93
    • RE: Xen Orchestra Node 24 compatibility

      For everyone hitting the "ENOMEM" error on a Debian 13 system when using SMB/CIFS encryption for transferring backups:
      you might want to try a newer kernel.

      I was able to solve the issue on my end by moving from kernel 6.12 (the default in Debian 13) to kernel 6.17 from debian backports by executing

      apt install -t trixie-backports linux-image-amd64
      

      You can find some changelog for the SMB/CIFS kernel module here: https://wiki.samba.org/index.php/LinuxCIFSKernel

      In linux kernel 6.14 they fixed the issue regarding SMB encryption caching that was causing "ENOMEM" in Xen Orchestra on my end.
      Linux kernel 6.17 from debian backports includes that fix.

      Best regards

      posted in Xen Orchestra
      M
      MajorP93