Hi,
i have a really strange behaviour on one of your xcp-ng Hosts.
We have some XCP-ng Pools and 2 identical StandAlone Hosts.
We use Delta-Backups (nightly) on a Xen-Orchestra VM from sources.
A few weeks ago Delta Backups suddenly stopped working on only one of the two Standalone-Hosts, while Delta Backups keep working without any Problems on all other Hosts/Pools.
The two identical Stand-Alone Hosts are:
- Lenovo SR655
- AMD EPYC 7282 16-Core Processor
- 512GB RAM
- Local ext4 SAS-Raid (around 3,5TB used of 17,3TB on a 9-Disk Raid-5)
- 2x 10Gbit as bond0
Both StandAlone Hosts are absolutely identical (even Firmware up2date and latest XCP-ng Patchlevel and rebootet in the last days, to try if anything will fix the problem)
As the error initially appeared, the backup-logs started saying "stream has ended with not enough data", at the transfer-stage of the delta backups.
I then started to clean snapshots and old backups on some VMs.
After that, the first full backup of a that VMs was working fine, but the second then delta backup showed the same error.
To dig deeper, i installed a fully new ubuntu 22 VM and installed Xen Orchestra from sources again and connected the 2 Standalone Hosts on that new XOfs-VM with a remote NFS-backup-remote.
Same again. Initial Full-Backup works fine, first Delta fails one that one Host only, while working without problems on the other Host.
But this time with staying in "transfer" forever. This status is staying even for days and the backup Job never finishes, so the job next day fails with "Error: the job (x) is already running".
Today i restarted the XOfs-VM and updated to commit "afadc" and tried to reproduce with a new backup job with just one single VM.
It seems to be a XCP-ng related thing, cause the other identical Host is working perfect.
On that one Host i have the same thing. Initial Full is working, Delta comes never back and stuck at stage "transfer".
When i watch the xe task-list while the backup is running, it seems the export-task is working fine for the delta and there is new data on the nfs-remote. Then at 100% the task dissapears, but the Backup-job stays in transfer and never comes Back.
To eliminate all things maybe related to my "from sources" Installation (even the error is only on this one host and all others are working fine), i deployed a XOA-VM, but i cant start a free trial (you already consumed ...) and so i can not test Delta Backup.
Do you have any ideas or maybe had a similar issue in the past?
Kind regards
Alex