Backup restore fails with message VDI_IO_ERROR(Device I/O errors) on 8.2
-
Hello,
When trying to restore a backup through Xen orchestra it always fails after it passes 5 minutes. If the restore takes less than 5 minutes, it completes successfully. I have two XCP-ng installations which are exactly the same version and are managed by the same XEN orchestra installation. They are installed on different hardware though. Only one of the two installs has this issue. Attached you may find relevant logs.
The host machine on which I'm unable to restore backups is rather new and there is plenty of disk space available. I'm also able to create new VMs, migrate VMs to it etc.XCP version: 8.2 (latest patches installed)
Xen Orchestra community (5.100.0) commit f01a8 (latest commit)LOGS for failed backup at July 29, 2022, 04:37:15 PM
XEN Orchestra log:
https://pastebin.com/W7kgX9qcxensource.log:
https://pastebin.com/scPP2uQxSMLog:
https://pastebin.com/XJUjxGRLAnybody run into a similar issue?
-
Dear @stenet75,
While waiting for a more detailed answer from the community, you can find some information about this error message on this page: https://xen-orchestra.com/docs/backup_troubleshooting.html#error-vdi-io-error
-
It's a failure in
vhd-tool
, from XenSource log:vhd-tool failed, returning VDI_IO_ERROR`
But why it's failing, wellโฆ Do you already have disks on the SR you are trying to import?
-
Hi @olivierlambert ,
This server has only one SR and all the VMs that are currently running are located on this same SR. The SR has over 1.4T free space, while the image I'm trying to restore is around 40GB.
It's very strange to me that the restore is running rather fine until it reaches exactly 5 mins. It has reached around 70% at that time but after that I'm seeing in the tasklist in Orchestra "vbd.unplug" tasks and then it fails the backup with the message attached.
If I'm trying to restore smaller images, which takes less time than 5 minutes to complete, it is working perfectly fine.
The parameters of the SR are shown belowuuid ( RO) : d1b230d9-cf5d-1653-a6f8-8945d50c9e7e name-label ( RW): RAID10-HDD name-description ( RW): HDD RAID 10 controller 10 x HDD Enterprise disks host ( RO): ng02 allowed-operations (SRO): VDI.enable_cbt; VDI.list_changed_blocks; unplug; plug; PBD.create; VDI.disable_cbt; update; PBD.destroy; VDI.resize; VDI.clone; VDI.data_destroy; scan; VDI.snapshot; VDI.mirror; VDI.create; VDI.destroy; VDI.set_on_boot current-operations (SRO): VDIs (SRO): 535f83b4-4441-4f2e-a6f5-b948c84495de; 67e47176-8de5-4633-9c93-bc2e63028904; eccb43d1-c73f-4272-b06f-97c0951f0123; 7ca1fffc-cf39-499d-9875-14f9fc22d362; ab63129c-e5fb-45ae-aeec-f89b18907f8b; 51bdc194-ccab-486a-aff8-f69ee882692e; 0748ecc2-da83-45dd-9060-87ef30cf9230; 04126c41-647e-48d2-98b8-088c38b47b34; 674c2ab4-4ef4-433b-837c-852840a1c460; 81169362-73b1-4783-bb55-e583fd119594; 4af08658-97a5-42c3-9af2-69552dd66bb0; 6796057d-1dfb-4d0f-911d-c8a584a86fe4; d3d3f3c5-c10e-4247-b659-03247e3e171a; 444ccb51-aeac-458b-bf4a-ac7cc164c642; 8d7fadbe-01a8-4b5e-9d24-ea5542a1f036; 249734b0-b73b-4fc8-b553-e68162316be5; c585b7bd-1524-4c7c-8c71-1ecc4773fa6d; 7edcce2e-5500-4a82-a31b-108024031e91; 81c56b31-b697-48ad-848c-5d8e3edadbe0; c1d780af-d2d5-408a-bc72-3483efd47c38; 7853dc82-a48e-4e6f-a1cf-16750d7a623d; fb0cd83f-d74e-42d3-bd9c-a34a9160c1e1; bd8009fe-8d74-4a30-826f-c921da24fa6b; 43d27901-4ba4-4e77-bbc3-9d1277845794; 049c56b3-efd2-47c0-a127-19c2979d13cc; 98834565-a072-4095-abea-d59cf9c1b4c2; 28739520-3fd5-47fc-847f-030d2d7c869a; 6963bc09-8978-4f93-902e-e49f8fde7149; 4edcbce3-b719-43d5-a426-a78245060adb; 1dd76e14-b327-4d9b-9a8e-8d59ddf5b041; c738eed7-56c8-45ef-8346-7cd2a61c635e; a1c49204-dd7f-491f-bd73-d16cf63767d0; 80ee180f-1449-4e2b-99da-f7b5c4a70090; 35a83b93-edc6-48b4-99c7-1064b89624f7; d6c90116-3673-47ca-8ea5-31d7892ef615; 10cf5c84-4543-420a-ae9d-77b2eece3d77; 69fd7ede-8317-43a7-b60d-75448a2e8120; 4d755d7e-1288-429f-8dda-5f7e3322a847; 76ef754f-9b73-4be4-8fe1-a85b209d6211; 5514327f-0041-4b77-9d72-0b72f355b19c; c39b38b6-92d7-4d88-8c32-aa5ab48f19aa; 92e0f1e2-5649-41dd-b2db-76562e1fca76; 9b198b58-987e-4e4b-863f-6afb55262c39; 91cf4731-57b0-4654-a028-8e08f2895631; e7a26aa1-2123-4603-a39a-efd20db3d90c; f295aa10-9bea-4ed9-a51a-0b99dc544a62; 03d83e78-6311-454e-bae6-90a43aa9b559 PBDs (SRO): 2d610609-0a55-130f-2638-7778330bf96a virtual-allocation ( RO): 1525644525568 physical-utilisation ( RO): 1398284484608 physical-size ( RO): 2953175891968 type ( RO): lvm content-type ( RO): user shared ( RW): false introduced-by ( RO): <not in database> is-tools-sr ( RO): false other-config (MRW): i18n-original-value-name_label: Local storage; i18n-key: local-storage sm-config (MRO): allocation: thick; use_vhd: true; devserial: scsi-36848f690ee146f002a32301c2897bf15 blobs ( RO): local-cache-enabled ( RO): false tags (SRW): clustered ( RO): false
-
It might be that amount of data written before there's a problem with the disk or the SR
Would it be possible to test import on another SR on the same problematic host?
-
Unfortunately I have only 1 SR available on this host... But I just did a test, which I think the results are interesting.
The backup is stored in a nfs remote. I just mounted the same nfs with exactly the same mount options in the XEN host directly and tried to import the image through the cli by specifing filename and sr-uuid paramaters. It completed successfully after 8 mins. However I'm unable to restore the same image through XEN Orchestra, no matter where it is located (nfs, samba, locally). Orchestra VM is located in the same XEN host.
Another think I'd like to point out is that I'm getting the VDI_IO_ERROR when trying to restore delta backup. When trying to restore another image which is full backup, I'm getting the error IMPORT_ERROR_PREMATURE_EOF(), after exactly 5 minutes. -
So there is something interrupting the connection between your XO and XCP-ng after 5 minutes.
-
That's also something I considered by I can't explain how something like this is possible to happen. XO runs on the same host which has the issue. All the while I'm able to restore backups fine on another host which is in the same LAN and has exactly the same xcp version.
Many thanks for you help @olivierlambert . Will concentrate my investigation on the interconnection between xen-host and XO. Will come back if I find anything. -
I suppose you are in HTTPS between XO and XCP-ng?
Can you reproduce the issue with XOA in
latest
channel? -
I updated XO to latest commit (a2267), but still the same issue. In the end I gave up and reinstalled XO completely from scratch, by doing the exact same steps I did last time, in order to see where it breaks. OS + packages are the same, plugins + conf the same, everything. Only this time I'm able to restore backups successfully even though the two installs are identical. I'm suspecting some type of file corruption but I'm not sure. I already checked the whole xen-orchestra directory for any differences but could only find this one:
./node_modules/.yarn-integrity
4268a4269,4274"argon2@0.28.5": [ "lib", "lib/binding", "lib/binding/napi-v3", "lib/binding/napi-v3/argon2.node" ],
I'm trying to compare the two installations further when I have time in order to see why the one works while the other doesn't. For now I'm able to restore backups at least.
Many thanks for your help @olivierlambert . -
You are welcome. Weird issue indeed