Task has already ended errors during backup
-
Morning team, we have a backup job that exports (delta) to remote NFS and also does a CR to a failover server. Out of the 16 VM's (most are small), every now and then we get 1 that fails (normally a large one) at random, which then triggers a full backup on the next backup (as expected). The last couple of days however it has kept failing and I cannot quite determine which of the 3 hosts involved (hypervisor, nfs server or failover server) is at fault.
Here is the output of the most recent failure (just changed the vm name)
vmname.masked (jhb06) Snapshot Start: Jan 9, 2022, 04:25:59 AM End: Jan 9, 2022, 04:26:03 AM Frank transfer Start: Jan 9, 2022, 04:26:04 AM End: Jan 9, 2022, 07:05:53 AM Duration: 3 hours Error: aborted Start: Jan 9, 2022, 04:26:03 AM End: Jan 9, 2022, 07:05:53 AM Duration: 3 hours Error: aborted Local storage (14.3 TiB free - thin) - failover transfer Start: Jan 9, 2022, 04:26:04 AM End: Jan 9, 2022, 06:10:00 AM Duration: 2 hours Error: VDI_IO_ERROR(Device I/O errors) Start: Jan 9, 2022, 04:26:03 AM End: Jan 9, 2022, 06:10:00 AM Duration: 2 hours Error: VDI_IO_ERROR(Device I/O errors) Start: Jan 9, 2022, 04:25:40 AM End: Jan 9, 2022, 07:05:53 AM Duration: 3 hours Error: task has already ended Type: full
From what I can gather, the Device I/O errors could be the fault, however, I also know this can be due to the task ending and the host not receiving further data. It completes all other exports (so out of the 16, only 1 fails).
Any ideas where I can start fiddling? All hosts are XCP-NG
-
Have you modified the disks on those VMs? (growing the VDI)
-
@olivierlambert all the best for 2022!
Not at all, however, the disks are thin provisioned so actual size may differ, but as for the size set of each disk these have not changed at all. Last week we had another VM do the same, however, it fixed itself but this week it seems to be this VM causing the above issue
-
Weird… Can you remove the snapshot of the failing VM and try again to see if you still have the error?
To me it's like the delta capability on XCP-ng side is failing on a specific disk