Backup fails with "Body Timeout Error", "all targets have failed, step: writer.run()"

archw

For the last week or so, the backup on one paticular VM gives this error:

VM1 (xcp-ng-1) 
•	Snapshot 
Start: 2024-05-12 04:00
End: 2024-05-12 04:00
•	NFS2 
o	transfer
Start: 2024-05-12 04:00
End: 2024-05-12 05:21
Duration: an hour
Error: Body Timeout Error
•	Start: 2024-05-12 04:00
•	End: 2024-05-12 05:21
•	Duration: an hour
•	Error: Body Timeout Error
NFS2 
o	transfer
Start: 2024-05-12 04:00
End: 2024-05-12 05:21
Duration: an hour
Error: Body Timeout Error
•	Start: 2024-05-12 04:00
•	End: 2024-05-12 05:21
•	Duration: an hour
•	Error: Body Timeout Error
Start: 2024-05-12 04:00
End: 2024-05-12 05:23
Duration: an hour
Error: all targets have failed, step: writer.run()
Type: full

The host is running xcp beta 3 (updated through 5-12-24), XO is version "Xen Orchestra, commit 9b9c7".

This VM and numerous other VM's back up the same way (goes to two different NFS targets) for the last five months.

I've tried running it in the middle of the day (when other backups are not running) but it didn't help.

"Number of retries if VM backup fails"=0
"Timeout " = <blank>
"Compression"="zstd"

Any ideas?

archw

@archw
FWIW...It happened again last night.

Start: 2024-05-15 04:00
End: 2024-05-15 07:07
Duration: 3 hours
Error: Body Timeout Error
Duration: 3 hours
Error: all targets have failed, step: writer.run()
Type: full

fesch

@archw did you find a solution for this? We expiriencing this error on all of our backup jobs since two days.

Start: 2024-06-28 14:07
End: 2024-06-28 14:28
Duration: 21 minutes
Error: Body Timeout Error
Type: delta

archw

@fesch

In the words of Ronald Reagan "I don't recall the answer to that question"

If I remember correctly (subject to the after effects of many happy hours since 5-12-24), I ended up rebooting the host that had that VM. It has never done it since.

Arch

supportcphl

We realize this is an older issue, but we're experiencing something similar. Last week, I performed a rolling pool update, which involved rebooting all nodes and migrating VMs as part of the process.

Interestingly, the issue consistently affects the same VMs each time. These VMs have the necessary tools installed, and I can't pinpoint why only they are impacted.

We're encountering the same error across multiple pools. All pools use the same backup repositories, but out of approximately 100 VMs, only 3-4 are affected.

i know even more happy hours since the last post haha.

I could clone the VM etc but that seems a bit drastic.

archw

@supportcphl

Many, many happy hours have since transpired

I ended up wiping out the XO vm that was running the process and making a new one. That seems to have fixed it.

With all that said, I got one again last night with backing up the same VM that has caused an issue in the past. I just told the backup to restart so lets see what happens.

Bambos

@archw hello.

I had the same issue with xoa VM (vates build).
Body timeout error on backups.
I tried another VM XO from sources. Didn't fix the issue and some backups are also failing.
To my understanding is not XO issue.
Also this started when i upgrade the hosts from 8.2 to 8.3, im not sure if this is relevant or not, or is just a coincidence.
Also i have 2 hosts in the same pool. Backups are failing on both hosts.

Any updates from your side?? What is your suggestion on what i should try next?

DustinB

@Bambos try restarting your hosts.