AlexD2006

AlexD2006

Updated and tested:

one single Host
lab-Pool (3 Hosts) - NFS Storage
lab-Pool (4 Hosts) - iscsi Storage

Everthing seems to work as expected.
(new VM, live-migration, snapshots, clone VMs, import/export VMs)

AlexD2006

Hi @all,
sorry for my late reply, but i was on site at a customer yesterday.
I can confirm its working again as expected.
Not only this, but it also cleaned up all the additional Replikas now.

Great work and thanks for all your efforts.
Kind regards
Alex

AlexD2006

@probain
I have a similar, even not the same Problem right now.
Maybe there is the same root-cause, so im linkin it here.
https://xcp-ng.org/forum/topic/11540/continuous-replication-isnt-deleting-old-replikas-anymore-since-update

Kind Regards
Alex

AlexD2006

Hi @all,

running XO from source - (Commit fa020) on a fully update XCP-NG 8.2.
I have 2 CR Jobs on 2 XCP-Hosts, replicating all running VMs vice versa with a retention of 2 three times a day.
This worked for Months flawlessly, so i always have on srv01 2 replikas of each running VM on srv02 and vice versa.

Since Friday (did an Update of XenOrchestra that day), replication is still running fine but the old replikas are not deleted anymore, resulting in 12 existing replikas today.

I've tried to delete the old schedule and re-create it.
I've tried to delete the replication-job, and re-create it.
I've tried adjusting the retetion
I deleted all Replikas and startet the jobs from scratch.
None of the above have had any discernable affect.

Any Idea where to start investigating?
I searched the forum, but could not find any existing topic like this, except one similar, but not the same Problem.
https://xcp-ng.org/forum/topic/11539/snapshots-are-no-longer-being-pruned-commit-58f02
Maybe it could be the same root cause, so i am linking it here.

Thanks for suggestions in advance!
Alex

AlexD2006

@stormi
Did some Testing over the Weekend too.
Setup with 2 Hosts in a Pool and shared iSCSI-LMV Storage with multipath 8 paths per LUN.
Anything seems to work fine (migrate/import/cross-pool-migrate/snapshots/backups).

Even our longtime Problem (snapshots taking much too long) is getting much better (still not good, but much better).

AlexD2006

@florent

Hi @all,
sorry for my late reply, but i was on site at a customer yesterday.
I can confirm its working again as expected.
Not only this, but it also cleaned up all the additional Replikas now.

Great work and thanks for all your efforts.
Kind regards
Alex

AlexD2006

@probain
I have a similar, even not the same Problem right now.
Maybe there is the same root-cause, so im linkin it here.
https://xcp-ng.org/forum/topic/11540/continuous-replication-isnt-deleting-old-replikas-anymore-since-update

Kind Regards
Alex

AlexD2006

Hi @all,

running XO from source - (Commit fa020) on a fully update XCP-NG 8.2.
I have 2 CR Jobs on 2 XCP-Hosts, replicating all running VMs vice versa with a retention of 2 three times a day.
This worked for Months flawlessly, so i always have on srv01 2 replikas of each running VM on srv02 and vice versa.

Since Friday (did an Update of XenOrchestra that day), replication is still running fine but the old replikas are not deleted anymore, resulting in 12 existing replikas today.

I've tried to delete the old schedule and re-create it.
I've tried to delete the replication-job, and re-create it.
I've tried adjusting the retetion
I deleted all Replikas and startet the jobs from scratch.
None of the above have had any discernable affect.

Any Idea where to start investigating?
I searched the forum, but could not find any existing topic like this, except one similar, but not the same Problem.
https://xcp-ng.org/forum/topic/11539/snapshots-are-no-longer-being-pruned-commit-58f02
Maybe it could be the same root cause, so i am linking it here.

Thanks for suggestions in advance!
Alex

AlexD2006

crosstestet with XOA

same remote-nfs
identical delta backup job
VM on the other identical Host.

All working fine as expected.

So it seems that first Host has some Problems and is not providing any useful data when it comes to exporting the first delta snapshot.

Unfortunately i have to change to customer support right now and have to stop my testings for today.
I will keep going tomorrow.

Thx for your support so far.

AlexD2006

ok, so i tested with XOA.

fully new (empty) NFS-Export mounted as remote in XOA.

initial full backup in delta backup job successfull.
delta job fails with "Error: Expected values to be strictly equal: 430 !== 1536"

i will crosstest now with a VM on the working Host.

AlexD2006

@olivierlambert

its a 12-Disk Synology-NAS with btrfs.
On that NFS-Storage are multiple folders exported as remotes in my XOfs Installations.
As i said. All other Hosts/XOfs Installations work fine on that NFS-Storage.
Only this one specific XCP-ng Host has these Problems. So i think its not XenOrchestra related. It seems to be a problem with XCP-ng on this specific Host, but the identical second Host does not have this Problem.

AlexD2006

Thx @olivierlambert for extending the trial.

I will make new tests with the XOA and tell my findings.

Meanwhile i found out why the backup-job is hanging forever on my new XOfs Installation.

When i mounted the NFS-remote, i checked the option "Store backup as multiple data".

I removed the remote-nfs and reconnected it without that option.
Now the full backup is working as expected and the first delta fails with "Error: stream has ended with not enough data (actual: 430, expected: 512)".

So maybe there is a difference in the error handling when this option is active and the exception is not handled correctly.
Just for your Information.

I will come back when i tested with XOA.

AlexD2006

@olivierlambert
Many thanks for your help.
Wrote you a p.m. with my e-mail.

AlexD2006

Hi,

i have a really strange behaviour on one of your xcp-ng Hosts.
We have some XCP-ng Pools and 2 identical StandAlone Hosts.

We use Delta-Backups (nightly) on a Xen-Orchestra VM from sources.

A few weeks ago Delta Backups suddenly stopped working on only one of the two Standalone-Hosts, while Delta Backups keep working without any Problems on all other Hosts/Pools.

The two identical Stand-Alone Hosts are:

Lenovo SR655
AMD EPYC 7282 16-Core Processor
512GB RAM
Local ext4 SAS-Raid (around 3,5TB used of 17,3TB on a 9-Disk Raid-5)
2x 10Gbit as bond0

Both StandAlone Hosts are absolutely identical (even Firmware up2date and latest XCP-ng Patchlevel and rebootet in the last days, to try if anything will fix the problem)

As the error initially appeared, the backup-logs started saying "stream has ended with not enough data", at the transfer-stage of the delta backups.

I then started to clean snapshots and old backups on some VMs.
After that, the first full backup of a that VMs was working fine, but the second then delta backup showed the same error.

To dig deeper, i installed a fully new ubuntu 22 VM and installed Xen Orchestra from sources again and connected the 2 Standalone Hosts on that new XOfs-VM with a remote NFS-backup-remote.
Same again. Initial Full-Backup works fine, first Delta fails one that one Host only, while working without problems on the other Host.
But this time with staying in "transfer" forever. This status is staying even for days and the backup Job never finishes, so the job next day fails with "Error: the job (x) is already running".

Today i restarted the XOfs-VM and updated to commit "afadc" and tried to reproduce with a new backup job with just one single VM.

It seems to be a XCP-ng related thing, cause the other identical Host is working perfect.
On that one Host i have the same thing. Initial Full is working, Delta comes never back and stuck at stage "transfer".

When i watch the xe task-list while the backup is running, it seems the export-task is working fine for the delta and there is new data on the nfs-remote. Then at 100% the task dissapears, but the Backup-job stays in transfer and never comes Back.

To eliminate all things maybe related to my "from sources" Installation (even the error is only on this one host and all others are working fine), i deployed a XOA-VM, but i cant start a free trial (you already consumed ...) and so i can not test Delta Backup.

Do you have any ideas or maybe had a similar issue in the past?

Kind regards
Alex

AlexD2006

@stormi
Thx for your reply.
I try to wait so i can do it all in one Task.

Kind Regards
Alex

AlexD2006

@AlexD2006

Best posts made by AlexD2006

Latest posts made by AlexD2006