XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login
    1. Home
    2. MajorP93
    3. Best
    M
    Offline
    • Profile
    • Following 0
    • Followers 0
    • Topics 4
    • Posts 54
    • Groups 0

    Posts

    Recent Best Controversial
    • Xen Orchestra OpenMetrics Plugin - Grafana Dashboard

      Hello XCP-ng community!

      Since Vates released the new OpenMetrics plugin for Xen Orchestra we now have an official, built-in exporter for Prometheus metrics!

      I was using xen-exporter before in order to make hypervisor internal RRD database available in the form of Prometheus metrics.
      I migrated to the new plugin which works just fine.

      I updated the Grafana dashboard that I was using in order to be compatible with the official OpenMetrics plugin and thought "why not share it with other users"?

      In case you are interested you can find my dashboard JSON here: https://gist.github.com/MajorP93/3a933a6f03b4c4e673282fb54a68474b

      It is based on the xen-exporter dashboard made by MikeDombo: https://grafana.com/grafana/dashboards/16588-xen/

      In case you also use Prometheus for scraping Xen Orchestra OpenMetrics plugin in combination with Grafana you can copy the JSON from my gist, import it and you are ready to go!

      Hope it helps!

      Might even be a good idea to include the dashboard as an example in the Xen Orchestra documentation. 🙂

      Best regards

      posted in Infrastructure as Code
      M
      MajorP93
    • RE: XO5 breaks after defaulting to XO6 (from source)

      @MathieuRA I disabled Traefik and reverted to my old XO config (port 443, ssl encryption, http to https redirection), rebuild the docker container using your branch and tested:

      it is working fine on my end now 🙂

      Thank you very much!

      I did not expect this to get fixed so fast!

      posted in Xen Orchestra
      M
      MajorP93
    • RE: [VDDK V2V] Migration of VM that had more than 1 snapshot creates multiple VHDs

      @florent said in [VDDK V2V] Migration of VM that had more than 1 snapshot creates multiple VHDs:

      @MajorP93 the size are different between the disks, did you modify it since the snapshots ?

      would it be possible to take one new snapshot with the same disk structure ?

      Sorry it was my bad indeed.
      On the VMWare side there are 2 VMs that have almost the exact same name.
      When I checked for disk layout to verify this was an issue I looked at the wrong VM. 🤦

      I checked again and can confirm that the VM in question has 1x 60GiB and 1x 25GiB VMDK.

      So this is not an issue. It is working as intended.

      Thread can be closed / deleted.
      Sorry again and thanks for the replies.

      Best regards
      MajorP

      posted in Xen Orchestra
      M
      MajorP93
    • RE: Xen Orchestra Node 24 compatibility

      said in Xen Orchestra Node 24 compatibility:

      After moving from Node 22 to Node 24 on my XO instance I started to see more "Error: ENOMEM: not enough memory, close" for my backup jobs even though my XO VM has 8GB of RAM...

      I will revert back to Node 22 for now.

      I did some further troubleshooting and was able to pinpoint it down to SMB encryption on Xen Orchestra backup remotes ("seal" CIFS mount flag).
      "ENOMEM" errors seem to occur only when I enable previously explained option.
      Seems to be related to some buffering that is controlled by Linux kernel CIFS implementation that is failing when SMB encryption is being used.
      CIFS operation gets killed due to buffer exhaustion caused by encryption and Xen Orchestra shows "ENOMEM".
      Somehow this issue gets more visible when using Node 24 vs Node 22 which is why I thought it was caused by the Node version + XO version combination. I switched Node version at the same time I enabled SMB encryption.
      However this seems to be not directly related to Xen Orchestra and more a Node / Linux kernel CIFS implementation thing.
      Apparently not a Xen Orchestra bug per se.

      posted in Xen Orchestra
      M
      MajorP93
    • RE: Long backup times via NFS to Data Domain from Xen Orchestra

      Hey,
      small update:
      while adding the backup section and "diskPerVmConcurrency" option to "/etc/xo-server/config.diskConcurrency.toml" or "~/.config/xo-server/config.diskConcurrency.toml" had no effect for me, I was able to get this working by adding it at the end of my main XO config file at "/etc/xo-server/config.toml".

      Best regards

      posted in Backup
      M
      MajorP93
    • RE: Potential bug with Windows VM backup: "Body Timeout Error"

      I worked around this issue by changing my full backup job to "delta backup" and enabling "force full backup" in the schedule options.

      Delta backup seems more reliable as of now.

      Looking forward to a fix as Zstd compression is an appealing feature of the full backup method.

      posted in Backup
      M
      MajorP93
    • RE: Potential bug with Windows VM backup: "Body Timeout Error"

      I can imagine that a fix could be to send "keepalive" packets in addition to the XCP-ng export-VM-data-stream so that the timeout on XO side does not occur 🤔

      posted in Backup
      M
      MajorP93
    • RE: "NOT_SUPPORTED_DURING_UPGRADE()" error after yesterday's update

      @magicker said in "NOT_SUPPORTED_DURING_UPGRADE()" error after yesterday's update:

      @olivierlambert said in "NOT_SUPPORTED_DURING_UPGRADE()" error after yesterday's update:

      Because doing an update without rebooting doesn't reload the updated main programs, like XAPI. A host in only updated after a full reboot.

      Reply

      Hi there
      Is it just me or is this a chicken and egg situation.

      you upgrade the master... how the pool is in NOT_SUPPORTED_DURING_UPGRADE() stage. You cant move vms off the master so all you can do is close down vms.. reboot.. pray

      then move the a non master.. you cant move the vms off here either NOT_SUPPORTED_DURING_UPGRADE(). So you have do the same..

      needless to say I hit issues on each reboot which caused 30- 60 min delays in getting vms back up and running.

      can you Warm migrate or is this dead also (to scared to test)

      For me this workflow worked every time there were upgrades available:

      -disable HA on pool level
      -disable load balancer plugin
      -upgrade master
      -upgrade all other nodes
      -restart toolstack on master
      -restart toolstack on all other nodes
      -live migrate all VMs running on master to other node(s)
      -reboot master
      -reboot next node (live migrate all VMs running on that particular node away before doing so)
      -repeat until all nodes have been rebooted (one node at a time)
      -re-enable HA on pool level
      -re-enable load balancer plugin

      Never had any issues with that. No downtime for none of the VMs.

      posted in Backup
      M
      MajorP93
    • RE: Potential bug with Windows VM backup: "Body Timeout Error"

      @andriy.sultanov said in Potential bug with Windows VM backup: "Body Timeout Error":

      xe-toolstack-restart

      Okay I was able to replicate the issue.
      This is the setup that I used and that resulted in the "body timeout error" previously discussed in this thread:

      OS: Windows Server 2019 Datacenter
      1.png
      2.png

      The versions of the packages in question that were used in order to replicate the issue (XCP-ng 8.3, fully upgraded):

      [11:58 dat-xcpng-test01 ~]# rpm -q xapi-core
      xapi-core-25.27.0-2.2.xcpng8.3.x86_64
      [11:59 dat-xcpng-test01 ~]# rpm -q qcow-stream-tool
      qcow-stream-tool-25.27.0-2.2.xcpng8.3.x86_64
      [11:59 dat-xcpng-test01 ~]# rpm -q vhd-tool
      vhd-tool-25.27.0-2.2.xcpng8.3.x86_64
      

      Result:
      3.png
      Backup log:

      {
        "data": {
          "mode": "full",
          "reportWhen": "failure"
        },
        "id": "1764585634255",
        "jobId": "b19ed05e-a34f-4fab-b267-1723a7195f4e",
        "jobName": "Full-Backup-Test",
        "message": "backup",
        "scheduleId": "579d937a-cf57-47b2-8cde-4e8325422b15",
        "start": 1764585634255,
        "status": "failure",
        "infos": [
          {
            "data": {
              "vms": [
                "36c492a8-e321-ef2b-94dc-a14e5757d711"
              ]
            },
            "message": "vms"
          }
        ],
        "tasks": [
          {
            "data": {
              "type": "VM",
              "id": "36c492a8-e321-ef2b-94dc-a14e5757d711",
              "name_label": "Win2019_EN_DC_TEST"
            },
            "id": "1764585635692",
            "message": "backup VM",
            "start": 1764585635692,
            "status": "failure",
            "tasks": [
              {
                "id": "1764585635919",
                "message": "snapshot",
                "start": 1764585635919,
                "status": "success",
                "end": 1764585644161,
                "result": "0f548c1f-ce5c-56e3-0259-9c59b7851a17"
              },
              {
                "data": {
                  "id": "f1bc8d14-10dd-4440-bb1d-409b91f3b550",
                  "type": "remote",
                  "isFull": true
                },
                "id": "1764585644192",
                "message": "export",
                "start": 1764585644192,
                "status": "failure",
                "tasks": [
                  {
                    "id": "1764585644201",
                    "message": "transfer",
                    "start": 1764585644201,
                    "status": "failure",
                    "end": 1764586308921,
                    "result": {
                      "name": "BodyTimeoutError",
                      "code": "UND_ERR_BODY_TIMEOUT",
                      "message": "Body Timeout Error",
                      "stack": "BodyTimeoutError: Body Timeout Error\n    at FastTimer.onParserTimeout [as _onTimeout] (/opt/xo/xo-builds/xen-orchestra-202511080402/node_modules/undici/lib/dispatcher/client-h1.js:646:28)\n    at Timeout.onTick [as _onTimeout] (/opt/xo/xo-builds/xen-orchestra-202511080402/node_modules/undici/lib/util/timers.js:162:13)\n    at listOnTimeout (node:internal/timers:588:17)\n    at process.processTimers (node:internal/timers:523:7)"
                    }
                  }
                ],
                "end": 1764586308922,
                "result": {
                  "name": "BodyTimeoutError",
                  "code": "UND_ERR_BODY_TIMEOUT",
                  "message": "Body Timeout Error",
                  "stack": "BodyTimeoutError: Body Timeout Error\n    at FastTimer.onParserTimeout [as _onTimeout] (/opt/xo/xo-builds/xen-orchestra-202511080402/node_modules/undici/lib/dispatcher/client-h1.js:646:28)\n    at Timeout.onTick [as _onTimeout] (/opt/xo/xo-builds/xen-orchestra-202511080402/node_modules/undici/lib/util/timers.js:162:13)\n    at listOnTimeout (node:internal/timers:588:17)\n    at process.processTimers (node:internal/timers:523:7)"
                }
              },
              {
                "id": "1764586443440",
                "message": "clean-vm",
                "start": 1764586443440,
                "status": "success",
                "end": 1764586443459,
                "result": {
                  "merge": false
                }
              },
              {
                "id": "1764586443624",
                "message": "snapshot",
                "start": 1764586443624,
                "status": "success",
                "end": 1764586451966,
                "result": "c3e9736e-d6eb-3669-c7b8-f603333a83bf"
              },
              {
                "data": {
                  "id": "f1bc8d14-10dd-4440-bb1d-409b91f3b550",
                  "type": "remote",
                  "isFull": true
                },
                "id": "1764586452003",
                "message": "export",
                "start": 1764586452003,
                "status": "success",
                "tasks": [
                  {
                    "id": "1764586452008",
                    "message": "transfer",
                    "start": 1764586452008,
                    "status": "success",
                    "end": 1764586686887,
                    "result": {
                      "size": 10464489322
                    }
                  }
                ],
                "end": 1764586686900
              },
              {
                "id": "1764586690122",
                "message": "clean-vm",
                "start": 1764586690122,
                "status": "success",
                "end": 1764586690140,
                "result": {
                  "merge": false
                }
              }
            ],
            "warnings": [
              {
                "data": {
                  "attempt": 1,
                  "error": "Body Timeout Error"
                },
                "message": "Retry the VM backup due to an error"
              }
            ],
            "end": 1764586690142
          }
        ],
        "end": 1764586690143
      }
      

      I then enabled your test repository and installed the packages that you mentioned:

      [12:01 dat-xcpng-test01 ~]# rpm -q xapi-core
      xapi-core-25.27.0-2.3.0.xvafix.1.xcpng8.3.x86_64
      [12:08 dat-xcpng-test01 ~]# rpm -q vhd-tool
      vhd-tool-25.27.0-2.3.0.xvafix.1.xcpng8.3.x86_64
      [12:08 dat-xcpng-test01 ~]# rpm -q qcow-stream-tool
      qcow-stream-tool-25.27.0-2.3.0.xvafix.1.xcpng8.3.x86_64
      

      I restarted tool-stack and re-ran the backup job.
      Unfortunately it did not solve the issue and made the backup behave very strangely:
      9c9e9fdc-8385-4df2-9d23-7b0e4ecee0cd-grafik.png
      The backup job ran only a few seconds and reported that it was "successful". But only 10.83KiB were transferred. There are 18GB used space on this VM. So the data unfortunately was not transferred by the backup job.

      25deccb4-295e-4ce1-a015-159780536122-grafik.png

      Here is the backup log:

      {
        "data": {
          "mode": "full",
          "reportWhen": "failure"
        },
        "id": "1764586964999",
        "jobId": "b19ed05e-a34f-4fab-b267-1723a7195f4e",
        "jobName": "Full-Backup-Test",
        "message": "backup",
        "scheduleId": "579d937a-cf57-47b2-8cde-4e8325422b15",
        "start": 1764586964999,
        "status": "success",
        "infos": [
          {
            "data": {
              "vms": [
                "36c492a8-e321-ef2b-94dc-a14e5757d711"
              ]
            },
            "message": "vms"
          }
        ],
        "tasks": [
          {
            "data": {
              "type": "VM",
              "id": "36c492a8-e321-ef2b-94dc-a14e5757d711",
              "name_label": "Win2019_EN_DC_TEST"
            },
            "id": "1764586966983",
            "message": "backup VM",
            "start": 1764586966983,
            "status": "success",
            "tasks": [
              {
                "id": "1764586967194",
                "message": "snapshot",
                "start": 1764586967194,
                "status": "success",
                "end": 1764586975429,
                "result": "ebe5c4e2-5746-9cb3-7df6-701774a679b5"
              },
              {
                "data": {
                  "id": "f1bc8d14-10dd-4440-bb1d-409b91f3b550",
                  "type": "remote",
                  "isFull": true
                },
                "id": "1764586975453",
                "message": "export",
                "start": 1764586975453,
                "status": "success",
                "tasks": [
                  {
                    "id": "1764586975473",
                    "message": "transfer",
                    "start": 1764586975473,
                    "status": "success",
                    "end": 1764586981992,
                    "result": {
                      "size": 11093
                    }
                  }
                ],
                "end": 1764586982054
              },
              {
                "id": "1764586985271",
                "message": "clean-vm",
                "start": 1764586985271,
                "status": "success",
                "end": 1764586985290,
                "result": {
                  "merge": false
                }
              }
            ],
            "end": 1764586985291
          }
        ],
        "end": 1764586985292
      }
      

      If you need me to test something else or if I should provide some log file from the XCP-ng system please let me know.

      Best regards

      posted in Backup
      M
      MajorP93
    • RE: Potential bug with Windows VM backup: "Body Timeout Error"

      @andriy.sultanov I created a small test setup in our lab. I created a WIndows VM with a lot of free disk space (2 virtual disks, 2.5 TB free space in total). Hopefully that way I will be able to replicate the issue with full backup timeout for VMs with a lot of free space that occurred in our production environment.
      The backup job is currently running. I will report back once it failed and once I had a chance to test if your fix solves the issue.

      posted in Backup
      M
      MajorP93
    • RE: Async.VM.pool_migrate stuck at 57%

      @wmazren I had a similar issue which costed my many hours to troubleshoot.

      I'd advise you to check "dmesg" output within the VM that is not able to get live migrated.

      XCP-ng / Xen behaves different than VMWare regarding live migration.

      XCP-ng will interact with the linux kernel upon live migration and the kernel will try to freeze all processes before performing the live migration.

      In my case a "fuse" process blocked the graceful freezing of all processes and my live migration task also stuck in task view similar to your case.

      After solving the fuse process issue and therefore making the system able to live migrate the issue was gone.

      All of this can be viewed in dmesg as the kernel will tell you about what is being done during live migration via XCP-ng.

      //EDIT: another thing you might want to try is toggling "migration compression" in pool settings as well as making sure you have a dedicated connection / VLAN configured for the live migration. Those 2 things also helped my live migrations being faster and more robust.

      posted in Management
      M
      MajorP93