Our future backup code: test it!

Tristis Oris

@florent yep, now it equal. Maybe my hw bottleneck? i can also check with ssd storage to see max speed.

Duration: 3 minutes
Size: 26.53 GiB
Speed: 157.78 MiB/s
Speed: 149.39 MiB/s
Speed: 163.76 MiB/s

No more errors incorrect backup size in metadata.

But still no NBD(

florent

@Tristis-Oris that is already a good news.

I pushed an additional fix : the NBD info was not shown on the UI

Tristis Oris

@florent haha, now it works too good)

60174b41-7179-4e31-8d02-bf0c9ec405a6-изображение.png

39e6a5fa-942f-4895-933d-51bd765ddde5-изображение.png

983f9e40-8f8b-417e-8360-85d195aeba2b-изображение.png

Tristis Oris

well, that was my CPU bottleneck. XO live at most stable DC, but oldest one.

Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz

flash:
Speed: 151.36 MiB/s
summary: { duration: '3m', cpuUsage: '131%', memoryUsage: '162.19 MiB' }
hdd:
Speed: 152 MiB/s
summary: { duration: '3m', cpuUsage: '201%', memoryUsage: '314.1 MiB' }

Intel(R) Xeon(R) Gold 5215 CPU @ 2.50GHz

flash:
Speed: 196.78 MiB/s
summary: { duration: '3m', cpuUsage: '129%', memoryUsage: '170.8 MiB' }
hdd:
Speed: 184.72 MiB/s
summary: { duration: '3m', cpuUsage: '198%', memoryUsage: '321.06 MiB' }

Intel(R) Xeon(R) Platinum 8260 CPU @ 2.40GHz

flash:
Speed: 222.32 MiB/s
Speed: 220 MiB/s
summary: { duration: '2m', cpuUsage: '155%', memoryUsage: '183.77 MiB' }

hdd:
Speed: 185.63 MiB/s
Speed: 185.21 MiB/s
summary: { duration: '3m', cpuUsage: '196%', memoryUsage: '315.87 MiB' }

Look at high memory usage with hdd.

sometimes i still got errors.

          "id": "1744875242122:0",
          "message": "export",
          "start": 1744875242122,
          "status": "success",
          "tasks": [
            {
              "id": "1744875245258",
              "message": "transfer",
              "start": 1744875245258,
              "status": "success",
              "end": 1744875430762,
              "result": {
                "size": 28489809920
              }
            },
            {
              "id": "1744875432586",
              "message": "clean-vm",
              "start": 1744875432586,
              "status": "success",
              "warnings": [
                {
                  "data": {
                    "path": "/xo-vm-backups/d4950e88-f6aa-dbc1-e6fe-e3c73ebe9904/20250417T073405Z.json",
                    "actual": 28489809920,
                    "expected": 28496828928
                  },
                  "message": "cleanVm: incorrect backup size in metadata"
                }

          "id": "1744876967012:0",
          "message": "export",
          "start": 1744876967012,
          "status": "success",
          "tasks": [
            {
              "id": "1744876970075",
              "message": "transfer",
              "start": 1744876970075,
              "status": "success",
              "end": 1744877108146,
              "result": {
                "size": 28489809920
              }
            },
            {
              "id": "1744877119430",
              "message": "clean-vm",
              "start": 1744877119430,
              "status": "success",
              "warnings": [
                {
                  "data": {
                    "path": "/xo-vm-backups/d4950e88-f6aa-dbc1-e6fe-e3c73ebe9904/20250417T080250Z.json",
                    "actual": 28489809920,
                    "expected": 28496828928
                  },
                  "message": "cleanVm: incorrect backup size in metadata"
                }

Tristis Oris

i tried to move tests to another vm, but again can't build it with same commands(

yarn start
yarn run v1.22.22
$ node dist/cli.mjs
node:internal/modules/esm/resolve:275
    throw new ERR_MODULE_NOT_FOUND(
          ^

Error [ERR_MODULE_NOT_FOUND]: Cannot find module '/opt/xen-orchestra/@xen-orchestra/xapi/disks/XapiProgress.mjs' imported from /opt/xen-orchestra/@xen-orchestra/xapi/disks/Xapi.mjs
    at finalizeResolution (node:internal/modules/esm/resolve:275:11)
    at moduleResolve (node:internal/modules/esm/resolve:860:10)
    at defaultResolve (node:internal/modules/esm/resolve:984:11)
    at ModuleLoader.defaultResolve (node:internal/modules/esm/loader:685:12)
    at #cachedDefaultResolve (node:internal/modules/esm/loader:634:25)
    at ModuleLoader.resolve (node:internal/modules/esm/loader:617:38)
    at ModuleLoader.getModuleJobForImport (node:internal/modules/esm/loader:273:38)
    at ModuleJob._link (node:internal/modules/esm/module_job:135:49) {
  code: 'ERR_MODULE_NOT_FOUND',
  url: 'file:///opt/xen-orchestra/@xen-orchestra/xapi/disks/XapiProgress.mjs'
}

Node.js v22.14.0
error Command failed with exit code 1.
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.

florent

@Tristis-Oris thanks , I missed a file
I pushed it just now

Andrew

@florent I finally got the new code running and I tested a Delta Backup (full first run) with NBD x3 enabled and it's leaving NBD transfer (on xcp1) 99% connected after a run. The backup does complete but the task is stuck.

florent

@Andrew nice catch andrew I will look into it
is it keeping disk attached to dom0 ? (in dashboard -> health )

Andrew

@florent No. The dashboard health is clean. No VDIs attached to control domain

florent

so that is probably only a off by one error in the task code
Thanks andrew

florent

@florent code is now merged into master

olivierlambert

flakpyro

@florent updated my homelab XO instance this morning to try this out since its been merged. ~~Seems like concurrency has changed if the field is left empty? It used to default to 2 but now i see it trying to back up every VM in the job at once?~~ I dont think this is the case anymore, its just the completed tasks do not update their progress and clear properly.

Under the "Backups" Tab in the XO the backup says successful, however none of the concurrent backup tasks have completed under tasks. L~~ooking at the backup "Remote" storage i see it still appears to be writing data so perhaps the job is not actually complete as XO states~~. --This was the merge operation happening in the background as it turns out the job was actually complete however the tasks were not clearing.

flakpyro

@florent Backup did seem to run however the tasks never cleared. I restarted the toolstack on both hosts which cleared the tasks.

Set job concurrency to 2 manually on the job and ran it again. Once again the task list filled with all the VMs from the job at once, the job seems to process and succeed but the tasks never clear and you cant really tell whats happening since the task window becomes cluttered with VMs from the job with a random state of progress that never seems to clear on its own. I think ill have to roll back to a previous release for now. Hopefully this plan isn't to push this to the XOA appliance just yet!

Stuck tasks:

Yet completed and successful job:

Andrew

@florent @olivierlambert Not good!... I reverted back due to problems.

Replication does run. It seems to work. But it leaves lots of zombie export tasks, like:
Exporting content of VDI ftp_root_jgfkd through NBD+CBT (on xcp1) 17%

I get about 80% stuck tasks. A toolstack restart clears the tasks....
CR settings: CBT enabled. NBD=2. Purge enabled.

flakpyro

@Andrew I am having the exact same issue! Also rolling back! Glad its not just me haha

Andrew

@flakpyro @florent I had the same problem with the test branch that did not get solved.

florent

thabnks for the test
are you doing replication only ? or is a job doiing replication and backup ?

flakpyro

@florent I am just running a "Delta Backup" to an NFS SR. I haven't rolled back yet so if there is any sort of logs i can provide let me know. This is my "home lab" at home where i test new versions of XOA and XCP-NG updates before we end up seeing new features in production with our full XOA appliance at work.

Andrew

@florent The job is just Continuous Replication to another local host. These were normal hourly delta transfers.