Full backup - VM failed in first attempt but successful in second attempt
-
xcp-ng:
8.2 latest patchedxen-orchestra:
from sources on bare metal (Xen Orchestra, commit d796e; Master, commit f1780; You are not up to date with master. 38 commits behind )added fix for inactivity:
/etc/xo-server/config.httpInactivityTimeout.toml# Work-around HTTP timeout issue during backups [xapiOptions] httpInactivityTimeout = 1800000 # 30 mins
I have a full backup job with 9 VMs
Concurrency: 10
Number of retries if VM backup fails: 5
Timeout: 72
Compression: zstdBackup is run weekly, on Sunday at 12:15 when there is no load on xcp-ng or VMs.
8 servers finish backup normally but 1 VM fails in first attempt, but manages to finish in second attempt. This also fails the entire backup job.
Retry the VM backup due to an error attempt 1 error "VDI must be free or attached to exactly one VM" Snapshot Start: 2024-09-01 12:15 End: 2024-09-01 12:16 full transfer Start: 2024-09-01 12:16 End: 2024-09-01 13:23 Duration: an hour Error: Body Timeout Error Start: 2024-09-01 12:16 End: 2024-09-01 13:23 Duration: an hour Error: Body Timeout Error Snapshot Start: 2024-09-01 22:16 End: 2024-09-01 22:16 full transfer Start: 2024-09-01 22:16 End: 2024-09-01 23:26 Duration: an hour Size: 17.59 GiB Speed: 4.31 MiB/s Start: 2024-09-01 22:16 End: 2024-09-01 23:26 Duration: an hour Start: 2024-09-01 12:15 End: 2024-09-01 23:26 Duration: 11 hours Type: full
This is present for couple of months now, and continues to be present after 2 XO updates from sources. Only thing that changed was the error message in XO web interface.
The strange thing is that it's always the same VM. There are 2 similar VMs, both were freshly installed in the spring.
There are no errors at that time on xcp-ng server.
Only errors available are on XO server in syslog:
root@backup:/var/log# grep "xo-server" syslog.1 2024-09-01T12:15:00.735286+02:00 backup xo-server[142373]: 2024-09-01T10:15:00.733Z xo:backups:worker INFO starting backup 2024-09-01T13:23:20.144978+02:00 backup xo-server[142373]: 2024-09-01T11:23:20.141Z xo:backups:worker WARN possibly unhandled rejection { 2024-09-01T13:23:20.145550+02:00 backup xo-server[142373]: error: BodyTimeoutError: Body Timeout Error 2024-09-01T13:23:20.145592+02:00 backup xo-server[142373]: at Timeout.onParserTimeout [as callback] (/opt/xen-orchestra/node_modules/undici/lib/dispatcher/client-h1.js:626:28) 2024-09-01T13:23:20.145617+02:00 backup xo-server[142373]: at Timeout.onTimeout [as _onTimeout] (/opt/xen-orchestra/node_modules/undici/lib/util/timers.js:22:13) 2024-09-01T13:23:20.145639+02:00 backup xo-server[142373]: at listOnTimeout (node:internal/timers:581:17) 2024-09-01T13:23:20.145660+02:00 backup xo-server[142373]: at process.processTimers (node:internal/timers:519:7) { 2024-09-01T13:23:20.145681+02:00 backup xo-server[142373]: code: 'UND_ERR_BODY_TIMEOUT' 2024-09-01T13:23:20.145703+02:00 backup xo-server[142373]: } 2024-09-01T13:23:20.145723+02:00 backup xo-server[142373]: } 2024-09-01T13:23:21.160056+02:00 backup xo-server[142373]: 2024-09-01T11:23:21.156Z xo:backups:AbstractVmRunner WARN writer step failed { 2024-09-01T13:23:21.160144+02:00 backup xo-server[142373]: error: BodyTimeoutError: Body Timeout Error 2024-09-01T13:23:21.160170+02:00 backup xo-server[142373]: at Timeout.onParserTimeout [as callback] (/opt/xen-orchestra/node_modules/undici/lib/dispatcher/client-h1.js:626:28) 2024-09-01T13:23:21.160191+02:00 backup xo-server[142373]: at Timeout.onTimeout [as _onTimeout] (/opt/xen-orchestra/node_modules/undici/lib/util/timers.js:22:13) 2024-09-01T13:23:21.160212+02:00 backup xo-server[142373]: at listOnTimeout (node:internal/timers:581:17) 2024-09-01T13:23:21.160232+02:00 backup xo-server[142373]: at process.processTimers (node:internal/timers:519:7) { 2024-09-01T13:23:21.160253+02:00 backup xo-server[142373]: code: 'UND_ERR_BODY_TIMEOUT' 2024-09-01T13:23:21.160285+02:00 backup xo-server[142373]: }, 2024-09-01T13:23:21.160312+02:00 backup xo-server[142373]: step: 'writer.run()', 2024-09-01T13:23:21.160336+02:00 backup xo-server[142373]: writer: 'FullRemoteWriter' 2024-09-01T13:23:21.160357+02:00 backup xo-server[142373]: } 2024-09-01T23:26:54.251959+02:00 backup xo-server[142373]: 2024-09-01T21:26:54.250Z xo:backups:worker INFO backup has ended 2024-09-01T23:26:54.383030+02:00 backup xo-server[142373]: 2024-09-01T21:26:54.380Z xo:backups:worker INFO process will exit { 2024-09-01T23:26:54.383086+02:00 backup xo-server[142373]: duration: 40313647176, 2024-09-01T23:26:54.383115+02:00 backup xo-server[142373]: exitCode: 0, 2024-09-01T23:26:54.383154+02:00 backup xo-server[142373]: resourceUsage: { 2024-09-01T23:26:54.383175+02:00 backup xo-server[142373]: userCPUTime: 34734731925, 2024-09-01T23:26:54.383200+02:00 backup xo-server[142373]: systemCPUTime: 4723503045, 2024-09-01T23:26:54.383224+02:00 backup xo-server[142373]: maxRSS: 70968, 2024-09-01T23:26:54.383247+02:00 backup xo-server[142373]: sharedMemorySize: 0, 2024-09-01T23:26:54.383272+02:00 backup xo-server[142373]: unsharedDataSize: 0, 2024-09-01T23:26:54.383294+02:00 backup xo-server[142373]: unsharedStackSize: 0, 2024-09-01T23:26:54.383315+02:00 backup xo-server[142373]: minorPageFault: 2966203, 2024-09-01T23:26:54.383339+02:00 backup xo-server[142373]: majorPageFault: 0, 2024-09-01T23:26:54.383364+02:00 backup xo-server[142373]: swappedOut: 0, 2024-09-01T23:26:54.383384+02:00 backup xo-server[142373]: fsRead: 105920, 2024-09-01T23:26:54.383411+02:00 backup xo-server[142373]: fsWrite: 3910825184, 2024-09-01T23:26:54.383437+02:00 backup xo-server[142373]: ipcSent: 0, 2024-09-01T23:26:54.383466+02:00 backup xo-server[142373]: ipcReceived: 0, 2024-09-01T23:26:54.383487+02:00 backup xo-server[142373]: signalsCount: 0, 2024-09-01T23:26:54.383507+02:00 backup xo-server[142373]: voluntaryContextSwitches: 112085178, 2024-09-01T23:26:54.383532+02:00 backup xo-server[142373]: involuntaryContextSwitches: 79583 2024-09-01T23:26:54.383557+02:00 backup xo-server[142373]: }, 2024-09-01T23:26:54.383587+02:00 backup xo-server[142373]: summary: { duration: '11h', cpuUsage: '98%', memoryUsage: '69.3 MiB' } 2024-09-01T23:26:54.383614+02:00 backup xo-server[142373]: }
Same VMs are successfully finishing delta backups.
What am I missing?
-
@hbogner said in Full backup - VM failed in first attempt but successful in second attempt:
You are not up to date with master. 38 commits behind
Hi,
This is the first thing to do before reporting a problem As per https://xen-orchestra.com/docs/community.html#report-a-bug because its impossible to guess which bugs are fixed since.
edit: also you might try to disable zstd compression, as I understand it's a full backup (XVA), right?
-
@hbogner This is an old error as @olivierlambert observed. With the latest build you should not see this error anymore.
-
I updated to latest.
xen orchestra:
from sources (Xen Orchestra, commit dfe69; Master, commit dfe69; Your Xen Orchestra is up to date)Started backup job manually, error is still happening:
Retry the VM backup due to an error attempt 1 error "Body Timeout Error" Snapshot Start: 2024-09-02 14:06 End: 2024-09-02 14:08 full transfer Start: 2024-09-02 14:08 End: 2024-09-02 15:16 Duration: an hour Error: Body Timeout Error Start: 2024-09-02 14:08 End: 2024-09-02 15:16 Duration: an hour Error: Body Timeout Error Start: 2024-09-02 14:06 Type: full
root@backup:~# grep "xo-server" /var/log/syslog 2024-09-02T14:06:01.524416+02:00 backup xo-server[152409]: 2024-09-02T12:06:01.522Z xo:backups:worker INFO starting backup 2024-09-02T15:16:10.845741+02:00 backup xo-server[152409]: 2024-09-02T13:16:10.842Z xo:backups:worker WARN possibly unhandled rejection { 2024-09-02T15:16:10.845964+02:00 backup xo-server[152409]: error: BodyTimeoutError: Body Timeout Error 2024-09-02T15:16:10.845997+02:00 backup xo-server[152409]: at Timeout.onParserTimeout [as callback] (/opt/xen-orchestra/node_modules/undici/lib/dispatcher/client-h1.js:626:28) 2024-09-02T15:16:10.846022+02:00 backup xo-server[152409]: at Timeout.onTimeout [as _onTimeout] (/opt/xen-orchestra/node_modules/undici/lib/util/timers.js:22:13) 2024-09-02T15:16:10.846046+02:00 backup xo-server[152409]: at listOnTimeout (node:internal/timers:581:17) 2024-09-02T15:16:10.846068+02:00 backup xo-server[152409]: at process.processTimers (node:internal/timers:519:7) { 2024-09-02T15:16:10.846088+02:00 backup xo-server[152409]: code: 'UND_ERR_BODY_TIMEOUT' 2024-09-02T15:16:10.846107+02:00 backup xo-server[152409]: } 2024-09-02T15:16:10.846136+02:00 backup xo-server[152409]: } 2024-09-02T15:16:11.325447+02:00 backup xo-server[152409]: 2024-09-02T13:16:11.324Z xo:backups:AbstractVmRunner WARN writer step failed { 2024-09-02T15:16:11.325536+02:00 backup xo-server[152409]: error: BodyTimeoutError: Body Timeout Error 2024-09-02T15:16:11.325561+02:00 backup xo-server[152409]: at Timeout.onParserTimeout [as callback] (/opt/xen-orchestra/node_modules/undici/lib/dispatcher/client-h1.js:626:28) 2024-09-02T15:16:11.325583+02:00 backup xo-server[152409]: at Timeout.onTimeout [as _onTimeout] (/opt/xen-orchestra/node_modules/undici/lib/util/timers.js:22:13) 2024-09-02T15:16:11.325604+02:00 backup xo-server[152409]: at listOnTimeout (node:internal/timers:581:17) 2024-09-02T15:16:11.325625+02:00 backup xo-server[152409]: at process.processTimers (node:internal/timers:519:7) { 2024-09-02T15:16:11.325644+02:00 backup xo-server[152409]: code: 'UND_ERR_BODY_TIMEOUT' 2024-09-02T15:16:11.325664+02:00 backup xo-server[152409]: }, 2024-09-02T15:16:11.325690+02:00 backup xo-server[152409]: step: 'writer.run()', 2024-09-02T15:16:11.325717+02:00 backup xo-server[152409]: writer: 'FullRemoteWriter' 2024-09-02T15:16:11.325737+02:00 backup xo-server[152409]: }
Entire task will finish tomorrow and I can paste whole output tomorrow.
-
@hbogner can u check if there are still VDI's attached to the control domain? (health page) it seems more people are facing issues with that kind of errors, maybe that is related?
-
Full report after first fail and second success:
Retry the VM backup due to an error attempt 1 error "Body Timeout Error" Snapshot Start: 2024-09-02 14:06 End: 2024-09-02 14:08 full transfer Start: 2024-09-02 14:08 End: 2024-09-02 15:16 Duration: an hour Error: Body Timeout Error Start: 2024-09-02 14:08 End: 2024-09-02 15:16 Duration: an hour Error: Body Timeout Error Snapshot Start: 2024-09-03 00:07 End: 2024-09-03 00:08 full transfer Start: 2024-09-03 00:08 End: 2024-09-03 01:17 Duration: an hour Size: 17.62 GiB Speed: 4.34 MiB/s Start: 2024-09-03 00:08 End: 2024-09-03 01:17 Duration: an hour Start: 2024-09-02 14:06 End: 2024-09-03 01:17 Duration: 11 hours Type: full
This is the rest of syslog regarding full backup, delta backup job has started and finished while this full backup was running.
root@backup:~# grep "xo-server" /var/log/syslog 2024-09-03T01:17:34.085428+02:00 backup xo-server[152409]: 2024-09-02T23:17:34.084Z xo:backups:worker INFO backup has ended 2024-09-03T01:17:34.242886+02:00 backup xo-server[152409]: 2024-09-02T23:17:34.240Z xo:backups:worker INFO process will exit { 2024-09-03T01:17:34.242985+02:00 backup xo-server[152409]: duration: 40292717752, 2024-09-03T01:17:34.243018+02:00 backup xo-server[152409]: exitCode: 0, 2024-09-03T01:17:34.243045+02:00 backup xo-server[152409]: resourceUsage: { 2024-09-03T01:17:34.243067+02:00 backup xo-server[152409]: userCPUTime: 34830207909, 2024-09-03T01:17:34.243098+02:00 backup xo-server[152409]: systemCPUTime: 5032889601, 2024-09-03T01:17:34.243121+02:00 backup xo-server[152409]: maxRSS: 75204, 2024-09-03T01:17:34.243155+02:00 backup xo-server[152409]: sharedMemorySize: 0, 2024-09-03T01:17:34.243180+02:00 backup xo-server[152409]: unsharedDataSize: 0, 2024-09-03T01:17:34.243200+02:00 backup xo-server[152409]: unsharedStackSize: 0, 2024-09-03T01:17:34.243220+02:00 backup xo-server[152409]: minorPageFault: 3320295, 2024-09-03T01:17:34.243243+02:00 backup xo-server[152409]: majorPageFault: 0, 2024-09-03T01:17:34.243266+02:00 backup xo-server[152409]: swappedOut: 0, 2024-09-03T01:17:34.243286+02:00 backup xo-server[152409]: fsRead: 4144, 2024-09-03T01:17:34.243308+02:00 backup xo-server[152409]: fsWrite: 3906893520, 2024-09-03T01:17:34.243335+02:00 backup xo-server[152409]: ipcSent: 0, 2024-09-03T01:17:34.243357+02:00 backup xo-server[152409]: ipcReceived: 0, 2024-09-03T01:17:34.243377+02:00 backup xo-server[152409]: signalsCount: 0, 2024-09-03T01:17:34.243396+02:00 backup xo-server[152409]: voluntaryContextSwitches: 113572473, 2024-09-03T01:17:34.243424+02:00 backup xo-server[152409]: involuntaryContextSwitches: 80368 2024-09-03T01:17:34.243451+02:00 backup xo-server[152409]: }, 2024-09-03T01:17:34.243477+02:00 backup xo-server[152409]: summary: { duration: '11h', cpuUsage: '99%', memoryUsage: '73.44 MiB' } 2024-09-03T01:17:34.243501+02:00 backup xo-server[152409]: }
@olivierlambert yes, it is full backup using zstd. Not using compression is problematic because it's a lot of data, but I will test it.
@rtjdamen There were no VDI's attached to control domain.
Difference between XO before upgrade and the latest XO is error message in web interface:
Now: "Body Timeout Error"
Before: "VDI must be free or attached to exactly one VM"Before previous upgrade ago it was: "Body Timeout Error"
But the syslog error references to the same lines.
PS. This VM in not the largest one nor the smallest one, but just this one fails, the rest are OK.
-
Also check your Dom0 memory, I have the feeling the zstd compression is failing and leaving the export opened until XO timeouts
-
@olivierlambert Dom0 memory is set to 12GiB of 256GiB available on the host.
Failed/successful again yesterday:
2024-09-08T12:15:00.729026+02:00 backup xo-server[186436]: 2024-09-08T10:15:00.726Z xo:backups:worker INFO starting backup 2024-09-08T13:24:53.065460+02:00 backup xo-server[186436]: 2024-09-08T11:24:53.061Z xo:backups:worker WARN possibly unhandled rejection { 2024-09-08T13:24:53.065709+02:00 backup xo-server[186436]: error: BodyTimeoutError: Body Timeout Error 2024-09-08T13:24:53.065746+02:00 backup xo-server[186436]: at Timeout.onParserTimeout [as callback] (/opt/xen-orchestra/node_modules/undici/lib/dispatcher/client-h1.js:626:28) 2024-09-08T13:24:53.065775+02:00 backup xo-server[186436]: at Timeout.onTimeout [as _onTimeout] (/opt/xen-orchestra/node_modules/undici/lib/util/timers.js:22:13) 2024-09-08T13:24:53.065799+02:00 backup xo-server[186436]: at listOnTimeout (node:internal/timers:581:17) 2024-09-08T13:24:53.065822+02:00 backup xo-server[186436]: at process.processTimers (node:internal/timers:519:7) { 2024-09-08T13:24:53.065843+02:00 backup xo-server[186436]: code: 'UND_ERR_BODY_TIMEOUT' 2024-09-08T13:24:53.065864+02:00 backup xo-server[186436]: } 2024-09-08T13:24:53.065894+02:00 backup xo-server[186436]: } 2024-09-08T13:24:53.612370+02:00 backup xo-server[186436]: 2024-09-08T11:24:53.609Z xo:backups:AbstractVmRunner WARN writer step failed { 2024-09-08T13:24:53.612468+02:00 backup xo-server[186436]: error: BodyTimeoutError: Body Timeout Error 2024-09-08T13:24:53.612494+02:00 backup xo-server[186436]: at Timeout.onParserTimeout [as callback] (/opt/xen-orchestra/node_modules/undici/lib/dispatcher/client-h1.js:626:28) 2024-09-08T13:24:53.612516+02:00 backup xo-server[186436]: at Timeout.onTimeout [as _onTimeout] (/opt/xen-orchestra/node_modules/undici/lib/util/timers.js:22:13) 2024-09-08T13:24:53.612538+02:00 backup xo-server[186436]: at listOnTimeout (node:internal/timers:581:17) 2024-09-08T13:24:53.612558+02:00 backup xo-server[186436]: at process.processTimers (node:internal/timers:519:7) { 2024-09-08T13:24:53.612578+02:00 backup xo-server[186436]: code: 'UND_ERR_BODY_TIMEOUT' 2024-09-08T13:24:53.612599+02:00 backup xo-server[186436]: }, 2024-09-08T13:24:53.612630+02:00 backup xo-server[186436]: step: 'writer.run()', 2024-09-08T13:24:53.612658+02:00 backup xo-server[186436]: writer: 'FullRemoteWriter' 2024-09-08T13:24:53.612681+02:00 backup xo-server[186436]: } 2024-09-08T22:53:59.126305+02:00 backup xo-server[186436]: 2024-09-08T20:53:59.125Z xo:backups:worker INFO backup has ended 2024-09-08T22:53:59.272277+02:00 backup xo-server[186436]: 2024-09-08T20:53:59.270Z xo:backups:worker INFO process will exit { 2024-09-08T22:53:59.272333+02:00 backup xo-server[186436]: duration: 38338542920, 2024-09-08T22:53:59.272372+02:00 backup xo-server[186436]: exitCode: 0, 2024-09-08T22:53:59.272403+02:00 backup xo-server[186436]: resourceUsage: { 2024-09-08T22:53:59.272424+02:00 backup xo-server[186436]: userCPUTime: 34156788569, 2024-09-08T22:53:59.272449+02:00 backup xo-server[186436]: systemCPUTime: 4578502065, 2024-09-08T22:53:59.272476+02:00 backup xo-server[186436]: maxRSS: 73808, 2024-09-08T22:53:59.272500+02:00 backup xo-server[186436]: sharedMemorySize: 0, 2024-09-08T22:53:59.272524+02:00 backup xo-server[186436]: unsharedDataSize: 0, 2024-09-08T22:53:59.272545+02:00 backup xo-server[186436]: unsharedStackSize: 0, 2024-09-08T22:53:59.272565+02:00 backup xo-server[186436]: minorPageFault: 2821906, 2024-09-08T22:53:59.272595+02:00 backup xo-server[186436]: majorPageFault: 0, 2024-09-08T22:53:59.272622+02:00 backup xo-server[186436]: swappedOut: 0, 2024-09-08T22:53:59.272643+02:00 backup xo-server[186436]: fsRead: 21304, 2024-09-08T22:53:59.272674+02:00 backup xo-server[186436]: fsWrite: 3902768784, 2024-09-08T22:53:59.272702+02:00 backup xo-server[186436]: ipcSent: 0, 2024-09-08T22:53:59.272723+02:00 backup xo-server[186436]: ipcReceived: 0, 2024-09-08T22:53:59.272743+02:00 backup xo-server[186436]: signalsCount: 0, 2024-09-08T22:53:59.272773+02:00 backup xo-server[186436]: voluntaryContextSwitches: 111664073, 2024-09-08T22:53:59.272802+02:00 backup xo-server[186436]: involuntaryContextSwitches: 65186 2024-09-08T22:53:59.272829+02:00 backup xo-server[186436]: }, 2024-09-08T22:53:59.272859+02:00 backup xo-server[186436]: summary: { duration: '11h', cpuUsage: '101%', memoryUsage: '72.08 MiB' } 2024-09-08T22:53:59.272884+02:00 backup xo-server[186436]: }
I'll now try with concurrency 3, to see how it acts with less load.
-
@hbogner are u connecting to vcenter or to a vmware esxi host directly. We had not such great experience with connecting to vcenter, maybe there is an issue there?
-
@rtjdamen Only XCP-ng 8.2
-
@hbogner sorry i am sleeping, monday morning ;-). mixed it up with a different thread.
-
@olivierlambert said in Full backup - VM failed in first attempt but successful in second attempt:
Also check your Dom0 memory, I have the feeling the zstd compression is failing and leaving the export opened until XO timeouts
When I disable compression backup is successful in the first run, but it lasts longer. This VM is finished, but larger ones are still at it, they'll finish tomorrow, estimated by XO in tasks menu.
-
With zstd compression backup fails on the first run, lasts 11 hours, Transfer: 1.81 TiB.
With no compression backup runs ok in the first run, lasts 15 hours, Transfer: 5.12 TiB.
I'd like to keep it compressed in the long run
Upgraded to latest: "Xen Orchestra(commit 74e6f), Master(commit 74e6f), Your Xen Orchestra is up to date"
Started new compressed backup and it failed again:
root@backup:~# grep "xo-server" /var/log/syslog 2024-09-13T09:24:17.518727+02:00 backup xo-server[243615]: 2024-09-13T07:24:17.516Z xo:backups:worker INFO starting backup 2024-09-13T11:11:56.199073+02:00 backup xo-server[243615]: 2024-09-13T09:11:56.195Z xo:backups:worker WARN possibly unhandled rejection { 2024-09-13T11:11:56.199261+02:00 backup xo-server[243615]: error: BodyTimeoutError: Body Timeout Error 2024-09-13T11:11:56.199310+02:00 backup xo-server[243615]: at Timeout.onParserTimeout [as callback] (/opt/xen-orchestra/node_modules/undici/lib/dispatcher/client-h1.js:626:28) 2024-09-13T11:11:56.199337+02:00 backup xo-server[243615]: at Timeout.onTimeout [as _onTimeout] (/opt/xen-orchestra/node_modules/undici/lib/util/timers.js:22:13) 2024-09-13T11:11:56.199362+02:00 backup xo-server[243615]: at listOnTimeout (node:internal/timers:581:17) 2024-09-13T11:11:56.199387+02:00 backup xo-server[243615]: at process.processTimers (node:internal/timers:519:7) { 2024-09-13T11:11:56.199408+02:00 backup xo-server[243615]: code: 'UND_ERR_BODY_TIMEOUT' 2024-09-13T11:11:56.199428+02:00 backup xo-server[243615]: } 2024-09-13T11:11:56.199450+02:00 backup xo-server[243615]: } 2024-09-13T11:11:57.748715+02:00 backup xo-server[243615]: 2024-09-13T09:11:57.745Z xo:backups:AbstractVmRunner WARN writer step failed { 2024-09-13T11:11:57.748812+02:00 backup xo-server[243615]: error: BodyTimeoutError: Body Timeout Error 2024-09-13T11:11:57.748837+02:00 backup xo-server[243615]: at Timeout.onParserTimeout [as callback] (/opt/xen-orchestra/node_modules/undici/lib/dispatcher/client-h1.js:626:28) 2024-09-13T11:11:57.748858+02:00 backup xo-server[243615]: at Timeout.onTimeout [as _onTimeout] (/opt/xen-orchestra/node_modules/undici/lib/util/timers.js:22:13) 2024-09-13T11:11:57.748878+02:00 backup xo-server[243615]: at listOnTimeout (node:internal/timers:581:17) 2024-09-13T11:11:57.748898+02:00 backup xo-server[243615]: at process.processTimers (node:internal/timers:519:7) { 2024-09-13T11:11:57.748917+02:00 backup xo-server[243615]: code: 'UND_ERR_BODY_TIMEOUT' 2024-09-13T11:11:57.748937+02:00 backup xo-server[243615]: }, 2024-09-13T11:11:57.748961+02:00 backup xo-server[243615]: step: 'writer.run()', 2024-09-13T11:11:57.748981+02:00 backup xo-server[243615]: writer: 'FullRemoteWriter' 2024-09-13T11:11:57.749000+02:00 backup xo-server[243615]: }
-
I think the issue might not be XO but XCP-ng with zstd.
-
@olivierlambert you may be right, while digging thru logs on xcpng I found following in xensource.log:
Sep 13 11:18:22 clientxcpng xapi: [ warn||4289975 HTTPS <--IP ADRESS OF XO SERVER->:::80|[XO] VM export R:cd1a9a8d151f|pervasiveext] finally: Error while running cleanup after failure of main function: (Failure "Compression via zcat failed: exit code 25") Sep 13 11:18:22 clientxcpng xapi: [error||4289975 :::80||backtrace] [XO] VM export R:cd1a9a8d151f failed with exception Server_error(CLIENT_ERROR, [ INTERNAL_ERROR: [ Unix.Unix_error(Unix.EPIPE, "write", "") ] ]) Sep 13 11:18:22 clientxcpng xapi: [error||4289975 :::80||backtrace] Raised Server_error(CLIENT_ERROR, [ INTERNAL_ERROR: [ Unix.Unix_error(Unix.EPIPE, "write", "") ] ]) Sep 13 11:18:22 clientxcpng xapi: [error||4289975 :::80||backtrace] 1/18 xapi Raised at file ocaml/xapi/stream_vdi.ml, line 129 Sep 13 11:18:22 clientxcpng xapi: [error||4289975 :::80||backtrace] 2/18 xapi Called from file ocaml/xapi/stream_vdi.ml, line 263 Sep 13 11:18:22 clientxcpng xapi: [error||4289975 :::80||backtrace] 3/18 xapi Called from file lib/xapi-stdext-pervasives/pervasiveext.ml, line 24 Sep 13 11:18:22 clientxcpng xapi: [error||4289975 :::80||backtrace] 4/18 xapi Called from file lib/xapi-stdext-pervasives/pervasiveext.ml, line 35 Sep 13 11:18:22 clientxcpng xapi: [error||4289975 :::80||backtrace] 5/18 xapi Called from file lib/xapi-stdext-pervasives/pervasiveext.ml, line 24 Sep 13 11:18:22 clientxcpng xapi: [error||4289975 :::80||backtrace] 6/18 xapi Called from file lib/xapi-stdext-pervasives/pervasiveext.ml, line 35 Sep 13 11:18:22 clientxcpng xapi: [error||4289975 :::80||backtrace] 7/18 xapi Called from file ocaml/xapi/stream_vdi.ml, line 240 Sep 13 11:18:22 clientxcpng xapi: [error||4289975 :::80||backtrace] 8/18 xapi Called from file list.ml, line 110 Sep 13 11:18:22 clientxcpng xapi: [error||4289975 :::80||backtrace] 9/18 xapi Called from file ocaml/xapi/export.ml, line 653 Sep 13 11:18:22 clientxcpng xapi: [error||4289975 :::80||backtrace] 10/18 xapi Called from file lib/xapi-stdext-pervasives/pervasiveext.ml, line 24 Sep 13 11:18:22 clientxcpng xapi: [error||4289975 :::80||backtrace] 11/18 xapi Called from file lib/xapi-stdext-pervasives/pervasiveext.ml, line 35 Sep 13 11:18:22 clientxcpng xapi: [error||4289975 :::80||backtrace] 12/18 xapi Called from file lib/xapi-stdext-pervasives/pervasiveext.ml, line 24 Sep 13 11:18:22 clientxcpng xapi: [error||4289975 :::80||backtrace] 13/18 xapi Called from file lib/xapi-stdext-pervasives/pervasiveext.ml, line 35 Sep 13 11:18:22 clientxcpng xapi: [error||4289975 :::80||backtrace] 14/18 xapi Called from file ocaml/xapi/server_helpers.ml, line 92 Sep 13 11:18:22 clientxcpng xapi: [error||4289975 :::80||backtrace] 15/18 xapi Called from file ocaml/xapi/server_helpers.ml, line 113 Sep 13 11:18:22 clientxcpng xapi: [error||4289975 :::80||backtrace] 16/18 xapi Called from file lib/xapi-stdext-pervasives/pervasiveext.ml, line 24 Sep 13 11:18:22 clientxcpng xapi: [error||4289975 :::80||backtrace] 17/18 xapi Called from file lib/xapi-stdext-pervasives/pervasiveext.ml, line 35 Sep 13 11:18:22 clientxcpng xapi: [error||4289975 :::80||backtrace] 18/18 xapi Called from file lib/backtrace.ml, line 177 Sep 13 11:18:22 clientxcpng xapi: [error||4289975 :::80||backtrace] Sep 13 11:18:22 clientxcpng xapi: [error||4289975 :::80||backtrace] VM.export D:1c0a6106f079 failed with exception Server_error(CLIENT_ERROR, [ INTERNAL_ERROR: [ Unix.Unix_error(Unix.EPIPE, "write", "") ] ]) Sep 13 11:18:22 clientxcpng xapi: [error||4289975 :::80||backtrace] Raised Server_error(CLIENT_ERROR, [ INTERNAL_ERROR: [ Unix.Unix_error(Unix.EPIPE, "write", "") ] ]) Sep 13 11:18:22 clientxcpng xapi: [error||4289975 :::80||backtrace] 1/1 xapi Raised at file (Thread 4289975 has no backtrace table. Was with_backtraces called?, line 0 Sep 13 11:18:22 clientxcpng xapi: [error||4289975 :::80||backtrace] Sep 13 11:18:22 clientxcpng xapi: [error||4289975 :::80||xapi_http] Unhandled Api_errors.Server_error(CLIENT_ERROR, [ INTERNAL_ERROR: [ Unix.Unix_error(Unix.EPIPE, "write", "") ] ])
Is this any help?
-
@hbogner said in Full backup - VM failed in first attempt but successful in second attempt:
Compression via zcat failed: exit code 25
Yes, it is useful. I wonder if you have enough RAM or disk space in your Dom0
-
@hbogner said in Full backup - VM failed in first attempt but successful in second attempt:
@olivierlambert Dom0 memory is set to 12GiB of 256GiB available on the host.
12GB should be enough for 9 VM's.
Disk should also be ok:
Filesystem Size Used Avail Use% Mounted on devtmpfs 5.9G 32K 5.9G 1% /dev tmpfs 5.9G 532K 5.9G 1% /dev/shm tmpfs 5.9G 11M 5.9G 1% /run tmpfs 5.9G 0 5.9G 0% /sys/fs/cgroup /dev/md127p1 18G 2.1G 15G 13% / xenstore 5.9G 0 5.9G 0% /var/lib/xenstored /dev/md127p5 3.9G 337M 3.3G 10% /var/log tmpfs 1.2G 0 1.2G 0% /run/user/0
This is a Windows VM, with separate 100GB disk just for pagefile, could that be an issue while compressing?
-
I don't know but I'm suspecting something going bad with ZSTD compression. Can you monitor your disk space during the export?
-
@olivierlambert Tested it just now, no disk usage change on host(sample interval was 1 minute).
8 VM's finish ok, just this 1 VM fails on first run. -
@olivierlambert for now I "fixed" the problem by isolating this VM into separate backup job which doesn't overlap with any other backups and it finishes without error.
We'll try it again(smart mode VM group select) in a few months to see if new updates change anything.