@olivierlambert
Im on it! However after searching the XCP-NG docs as well as the XenServer docs i can't see to find how to specify a migration network using xe from the cli. Are you able to provide me the flag i need to use?
Posts
-
RE: CBT: the thread to centralize your feedback
-
RE: CBT: the thread to centralize your feedback
@olivierlambert Glad we're getting to the bottom of this!
Out of curiosity is having an isolated migration network only available to the XCP-NG hosts considered best practice with XCP-NG? It was with VMware to keep VMotion traffic on its own subnet and since the VLAN was already created on our switches i decided to keep with that setup. Ideally we can get this fixed either way, I'm just curious if I'm doing something considered strange?
-
RE: CBT: the thread to centralize your feedback
@olivierlambert
We're making progress i think!Correct letting the migration run with those settings results in 0000 when running the cbt-check command.
I tried removing the migration network and ran a migration with the following settings:
Before migration:
[14:27 xcpng-test-01 45e457aa-16f8-41e0-d03d-8201e69638be]# cbt-util get -c -n 7560326c-8b15-4c58-841f-6a8f962a7d28.cbtlog fe6e3edd-4d63-4005-b0f3-932f5f34e036
And after migration:
[14:27 xcpng-test-01 45e457aa-16f8-41e0-d03d-8201e69638be]# cbt-util get -c -n 7560326c-8b15-4c58-841f-6a8f962a7d28.cbtlog fe6e3edd-4d63-4005-b0f3-932f5f34e036
If i select a default migration network and run the same migration:
[14:31 xcpng-test-01 45e457aa-16f8-41e0-d03d-8201e69638be]# cbt-util get -c -n 7560326c-8b15-4c58-841f-6a8f962a7d28.cbtlog 00000000-0000-0000-0000-000000000000
I think we're getting somewhere now! I have the migration network on both the test and DR pools. This used to be our "Vmotion" network back when we ran vsphere and i decided to continue using it to keep migration traffic on an isolated secure vlan.
In fact these Veeam VMs are not even being used anymore they exist in our test lab as VMs to mess around with for things like this.
-
RE: CBT: the thread to centralize your feedback
Here is a screenshot of how i am doing the migration in XOA: moving from host 2 to host 1, leaving the SR drop down empty.
-
RE: CBT: the thread to centralize your feedback
For sure, i ran:
xe vm-migrate uuid=a14f0ad0-854f-b7a8-de5c-88056100b6c6 host-uuid=c354a202-3b30-486b-9645-2fd713dee85f
To move the VM from host 1 to host 2....
Doing it this way i noticed checking the CBT log file does not result in all zeros being output.
[10:00 xcpng-test-01 45e457aa-16f8-41e0-d03d-8201e69638be]# cbt-util get -c -n 087ad136-f31b-4d7c-9271-7c926fd51089.cbtlog fe6e3edd-4d63-4005-b0f3-932f5f34e036
For fun i then moved the VM back from Host 2 to host 1 and again, the cbtlog file seems to be intact:
[10:02 xcpng-test-01 45e457aa-16f8-41e0-d03d-8201e69638be]# cbt-util get -c -n 087ad136-f31b-4d7c-9271-7c926fd51089.cbtlog fe6e3edd-4d63-4005-b0f3-932f5f34e036
After all this migrating i then ran a job which ran fine and without any errors about not being able to do a delta.
So it seems like it works fine via xe CLI
Update:
After the backup ran properly and generated a new CBT log file i then moved it back and forth between hosts again using the CLI. And the cbtlog file seems to stay in tact again when checking using cbt-util. When i do this with XOA the result from cbtutil is all zeros. -
RE: CBT: the thread to centralize your feedback
In XOA i browse to the VM inventory list, search for the VM i want to migrate, check the box beside it, and click the migrate button located at the top right of the page, the "Migrate VM" popup appears and select the second host which is in the same pool, and click "Ok"
We have 2 pools i can reproduce this on:
The "Test Environment pool" with 2 HP DL325 Gen 10 servers backed by a TrueNAS MINI R running NFS 4.1
Our Production pool running 5 HP DL320 Gen 11 servers backed by a Pure //20R4 running NFS 3.
On the networking side:
Both pools are connected to 2 Aruba CX 10G switches (VSX Stack), each host as 4 physical connections:
2x !0G Bond0: Storage/Management/Backup, MTU 1500, VLANs for VM Traffic/Managemnt/Backup
2 x 10G Bond1: Dedicated storage: MTU 9000, ONLY used for NFS storage traffic on an isolated storage VLAN.
Both the TrueNAS and Pure use MTU 9000 on their "Storage" ports as well. I know Vates steers people away from Jumbo frames as a rule, and i agree but Pure engineering was pretty adamant about using them, so they are only present on these dedicated ports for storage only.
I will soon have a 3rd pool to test on as our DR site comes online next month, it will also be backed by Pure Storage.
I see others are also experiencing this issue as well now, looking at some more recent posts on this thread.
It should be noted regular backups with "NBD and CBT" enabled but with the snapshot deletion button turned off run without issue and have for months now proven themselves reliable. It would just be nice to not have to keep that snapshot daily
-
RE: CBT: the thread to centralize your feedback
Another interesting development. In our test environment this week i installed the latest HP Service pack for proliant, doing so required a server reboot so I ran a rolling pool reboot from XOA, later when the test environment backup job kicked off, i noticed it was running a regular Delta despite the migrations that must have occurred during rolling pool reboot.
SSHing onto a host and checking i see sure enough the cbtlog is reporting all zeros...
[17:27 xcpng-test-01 45e457aa-16f8-41e0-d03d-8201e69638be]# cbt-util get -c -n 73877c18-a5bf-43bb-aaf5-299f46710d7e.cbtlog 00000000-0000-0000-0000-000000000000
However the backup ran as a delta, running the backup again manually and it is once again it runs as a delta.
Checking after the manual backup the result is not all zeros anymore:
[17:28 xcpng-test-01 45e457aa-16f8-41e0-d03d-8201e69638be]# cbt-util get -c -n 65d8656e-93e8-4e81-b1a8-0b0462f6fbb8.cbtlog 1950d6a3-c6a9-4b0c-b79f-068dd44479cc
Now..just for fun i decided to manually migrate a small VM to another host and then back to see what happens:
After the migration back to all zeros:
[17:32 xcpng-test-01 45e457aa-16f8-41e0-d03d-8201e69638be]# cbt-util get -c -n 65d8656e-93e8-4e81-b1a8-0b0462f6fbb8.cbtlog 00000000-0000-0000-0000-000000000000
And running a backup manually resulted in the usual error:
Can't do delta with this vdi, transfer will be a full Can't do delta, will try to get a full stream
So...this just makes the issue even more confusing, why does a rolling pool reboot not cause this behaviour but a manual migration does? Does the ID being all zeros not actually matter? I seem to be able to consistently reproduce this too. Ill be curious to next test if a "rolling pool update" causes this behaviour next time a batch of updates is released.
-
RE: Windows Server 2025 on XCP-ng
I have some more to add to this after playing with inplace upgrades:
Any VM that seems to have had Xen Tools 9.3.3 installed at some point (even if upgraded to 9.4 since) will fail to upgrade hanging on first reboot during the process. If the VM was created once 9.4 was the current stable release this is not an issue and the upgrade will run.
To fix this we need to totally remove all traces of XenTools drivers from the system
Before upgrading:
Open a command prompt as Administrator
cd "C:\Program Files\XenServer\XenTools" Run: uninstall.exe purge verbose
Reboot
Confirm Xen Management agent is not running via Xen Orchestra Console
Proceed with Server 2025 inplace Upgrade.
Once into Server 2025 you will need to run to Xen Tools MSI to reinstall the tools. It will detect the management agent is still present even though it is not running and the drivers have been removed, run though the uninstall process, reboot then install a clean copy of the latest available version of the tools.
-
RE: XCP-NG 9, Dom0 considerations
@olivierlambert thats totally understandable as Xen is the core of the appliance!
Only reason i mention NFS as an example is, as you probably remember, i had a lot of issues with NFSv4 under load with drop outs that switching to NFSv3 totally resolved, when i look at a 8.3 host i see:
[15:03 xcp-ng-01 ~]# rpm -q nfs-utils nfs-utils-1.3.0-0.54.el7.x86_64
compared to the verison in Almalinux 10:
[root@localhost ~]$ rpm -q nfs-utils nfs-utils-2.7.1-1.el10.x86_64
Im hopeful that some of the improvements and fixes from jumping up to a much newer version like will help with this sort of thing, and likely other examples of this exist as well.
-
RE: XCP-NG 9, Dom0 considerations
Very much looking forward to a more up to date Dom0 and the kernel / package improvments that come along with that, i'm sure it will help iron out many oddities like NFSv4 drops, as well as hopefully the AMD Epyc VM networking speed issues. I would love to go back to buying AMD systems instead of Intel
-
RE: Backup from replicas possible?
@olivierlambert we have our refreshed DR storage on order! (Replacing block only storage with another NFS capable array), as we close in on rebuilding our DR site i wonder if there has been any discussion on adding this feature? Being able to append a custom tag to a replication job would allow us to backup said replicas with the new GFS functionality to cheaper long term storage. This would in practice be pretty similar to what Veeam does with backing up from replicated storage snapshots but instead be totally hardware agnostic
-
RE: Windows Server 2025 on XCP-ng
@Greg_E Good to know! We are mostly on server 2022 currently when it comes to windows, but anything new would probably be deployed on 2025 so was wondering what the status was.
-
RE: Windows Server 2025 on XCP-ng
So what's the overall consensus on on Server 2025 with XCP-NG? I see XenServer is claiming its fully supported now. Is delayed start on the management agent still required? Or only if using it as a AD domain controller?
-
RE: XCP-ng 8.3 updates announcements and testing
@gduperrey installed on 2 test machines
Machine 1:
Intel Xeon E-2336
SuperMicro board.Machine 2:
Minisforum MS-01
i9-13900H
32 GB Ram
Using Intel X710 onboard NICBoth machines installed fine and all VMs came up without issue after.
I ran a backup job after to test snapshot coalesce, no issues there.
-
RE: CBT: the thread to centralize your feedback
@dthenot @olivierlambert thanks guys ill hold off on submitting a ticket for now to keep the conversation centralized here but if you need any more info, would like me to try anything or would like a remote support tunnel opened just let me know!
-
RE: CBT: the thread to centralize your feedback
@olivierlambert Hmm im really not sure whats unique about my two pools. One is AMD + TrueNAS the other Intel + Pure Storage. If this is actually unique to me only perhaps i would be better off submitting a ticket to help get to the bottom of this?
-
RE: CBT: the thread to centralize your feedback
Sadly the latest XOA release from today does not resolve my strange CBT issue,
[08:32 xcpng-test-01 45e457aa-16f8-41e0-d03d-8201e69638be]# cbt-util get -c -n 4d7f0341-bbce-4957-a4c4-d603725a807a.cbtlog 1950d6a3-c6a9-4b0c-b79f-068dd44479cc After Migration from Host 01 to Host 02 (Shared NFS SR): [08:33 xcpng-test-01 45e457aa-16f8-41e0-d03d-8201e69638be]# cbt-util get -c -n 4d7f0341-bbce-4957-a4c4-d603725a807a.cbtlog 00000000-0000-0000-0000-000000000000
-
RE: CBT: the thread to centralize your feedback
@Rhodderz I agree we are using NFS so snapshots are thin at least but we would love to be able to delete the snapshots after a backup run as well. Hopefully in time we can get this working!
-
RE: CBT: the thread to centralize your feedback
For our production pool i have CBT + NBD enabled but i have "Purge snapshot data when using CBT" disabled. The results in successful backups but the snapshot is retained. I assume it then ends up using that snapshot for the following delta backups.
-
RE: CBT: the thread to centralize your feedback
@florent Testing a storage migration i do see CBT get disabled and reset during the process which is expected! I do notice it leaves the .cbtlog file on the old SR after the storage migration is complete but that's easy enough to clean up manually.
The issue i posted above however is just a VM migration from host to host on a shared NFS SR, the SR the VM is on is not changing.