Potential bug with Windows VM backup: "Body Timeout Error"
-
@MajorP93 said in Potential bug with Windows VM backup: "Body Timeout Error":
Why would there be long “no data” gaps? With full exports, XAPI compresses on the host side. When it encounters very large runs of zeros / sparse/unused regions, compression may yield almost nothing for long stretches. If the stream goes quiet longer than Undici’s bodyTimeout (default ~5 minutes), XO aborts. This explains why it only hits some big VMs and why delta (NBD) backups are fine.
I think that's exactly the issue in here (if I remember our internal discussion).
-
@olivierlambert said in Potential bug with Windows VM backup: "Body Timeout Error":
@MajorP93 said in Potential bug with Windows VM backup: "Body Timeout Error":
Why would there be long “no data” gaps? With full exports, XAPI compresses on the host side. When it encounters very large runs of zeros / sparse/unused regions, compression may yield almost nothing for long stretches. If the stream goes quiet longer than Undici’s bodyTimeout (default ~5 minutes), XO aborts. This explains why it only hits some big VMs and why delta (NBD) backups are fine.
I think that's exactly the issue in here (if I remember our internal discussion).
Do you think it could be a good idea to make the timeout configurable via Xen Orchestra?
-
I can't answer for my devs, just let them come here to provide an answer

-
Going back up to the top of the thread, removing compression was what fixed this for me, and my backups are still saying they are successful (maybe I better check).
I wish there was an easier way to "right size" the storage on these Windows VMs, the suggestion is to copy the VM to a new and smaller "disk" but I haven't looked into what is involved to get this done.
-
We will fix it with compression, now we are pretty sure to know where the issue is

-
Happy to hear that there's a potential lead, im also happy I found this thread so I can kick back and wait for Vates to fix it

-
I can imagine that a fix could be to send "keepalive" packets in addition to the XCP-ng export-VM-data-stream so that the timeout on XO side does not occur

-
I worked around this issue by changing my full backup job to "delta backup" and enabling "force full backup" in the schedule options.
Delta backup seems more reliable as of now.
Looking forward to a fix as Zstd compression is an appealing feature of the full backup method.
-
I've got a fix for this issue in the pipeline and would really appreciate if anyone here could test to confirm if it resolves the problem for you.
You'd need to install a custom build (not production-ready, be careful, all the usual warnings apply here) according to these steps:
-
Before updating, preferably check that exporting a particular VM with compression on results in the timeout error consistently.
-
Write the following file to
/etc/yum.repos.d/xvatest.repo:
[xva-test-repo] name=xva-test-repo baseurl=https://koji.xcp-ng.org/repos/user/8/8.3/asultanov1/x86_64/ enabled=0 metadata_expire=1m gpgcheck=1 repo_gpgcheck=1 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-xcpng- Make sure you're up to date on the rest of your packages with
yum upgrade --enablerepo=testing(reboot if necessary) - Install the fix (this should install
*.xvafix.1packages):
yum upgrade xapi-core qcow-stream-tool vhd-tool --enablerepo=xva-test-repo- Restart the toolstack:
xe-toolstack-restart -
-
@andriy.sultanov Thank you very much for working on this!
I can test your proposed fix in our lab during next week.
Best regards