CBT: the thread to centralize your feedback

flakpyro

@Rhodderz I agree we are using NFS so snapshots are thin at least but we would love to be able to delete the snapshots after a backup run as well. Hopefully in time we can get this working!

Rhodderz

To add an update and to not leave on a cliff hanger.
We have since updated our XOA to the latest channel to attempt to fix an NBD issue.
This move broke a proxy of ours, but also all the backups are going through the XOA and after this the backups have not had an issue since.
So either the new NBD fixes, it being only on an XOA or something somehwere else resolved this problem for now.

We will be enabling the same in our other pool soon so will update if we have the same issues there.

flakpyro

Sadly the latest XOA release from today does not resolve my strange CBT issue,

[08:32 xcpng-test-01 45e457aa-16f8-41e0-d03d-8201e69638be]#  cbt-util get -c -n 4d7f0341-bbce-4957-a4c4-d603725a807a.cbtlog 
1950d6a3-c6a9-4b0c-b79f-068dd44479cc
After Migration from Host 01 to Host 02 (Shared NFS SR):
[08:33 xcpng-test-01 45e457aa-16f8-41e0-d03d-8201e69638be]#  cbt-util get -c -n 4d7f0341-bbce-4957-a4c4-d603725a807a.cbtlog 
00000000-0000-0000-0000-000000000000

olivierlambert

I don't think that's an XO issue, but more something weird in your XCP-ng setup that nobody can reproduce (but it doesn't mean we couldn't solve it)

flakpyro

@olivierlambert Hmm im really not sure whats unique about my two pools. One is AMD + TrueNAS the other Intel + Pure Storage. If this is actually unique to me only perhaps i would be better off submitting a ticket to help get to the bottom of this?

olivierlambert

You manage to find a CBT issue without using any XO command, which is great because we know it's not XO now I think @dthenot is already taking a look internally.

dthenot

@olivierlambert I am

flakpyro

@dthenot @olivierlambert thanks guys ill hold off on submitting a ticket for now to keep the conversation centralized here but if you need any more info, would like me to try anything or would like a remote support tunnel opened just let me know!

Tristis Oris

can't run live migration to another pool because VDI_CBT_ENABLED. is it intended?

Tristis Oris

@Tristis-Oris even halted VMs can't migrate with snapshot. need to remove it.

olivierlambert

That's weird, ping @florent

Forza

@Tristis-Oris We've had the same problem, so are not using CBT for now.

rtjdamen

@Tristis-Oris migration of a vdi between sr is not supported with cbt enabled. U need to disable cbt first. This is done by xoa. Live migration of vm between hosts is supported as long as the sr stays the same. This is by design on xen

Tristis Oris

@rtjdamen but i can't disable CBT globaly? it auto applied to every VDI when been implemented.
Disable CBT for each VDI not required, because it happens automaticaly during migration. I only need to remove all snapshots.

rtjdamen

@Tristis-Oris indeed seems like thats a bug in xoa that it does not delete the snapshots

jon02

@flakpyro
I have the same problem.
I'm on 8.2 as well and have a local ZFS SR.
I'm going to upgrade to 8.3 and look, if it helps.

flakpyro

Another interesting development. In our test environment this week i installed the latest HP Service pack for proliant, doing so required a server reboot so I ran a rolling pool reboot from XOA, later when the test environment backup job kicked off, i noticed it was running a regular Delta despite the migrations that must have occurred during rolling pool reboot.

SSHing onto a host and checking i see sure enough the cbtlog is reporting all zeros...

[17:27 xcpng-test-01 45e457aa-16f8-41e0-d03d-8201e69638be]#  cbt-util get -c -n 73877c18-a5bf-43bb-aaf5-299f46710d7e.cbtlog 
00000000-0000-0000-0000-000000000000

However the backup ran as a delta, running the backup again manually and it is once again it runs as a delta.

Checking after the manual backup the result is not all zeros anymore:

[17:28 xcpng-test-01 45e457aa-16f8-41e0-d03d-8201e69638be]#  cbt-util get -c -n 65d8656e-93e8-4e81-b1a8-0b0462f6fbb8.cbtlog 
1950d6a3-c6a9-4b0c-b79f-068dd44479cc

Now..just for fun i decided to manually migrate a small VM to another host and then back to see what happens:

After the migration back to all zeros:

[17:32 xcpng-test-01 45e457aa-16f8-41e0-d03d-8201e69638be]#  cbt-util get -c -n 65d8656e-93e8-4e81-b1a8-0b0462f6fbb8.cbtlog 
00000000-0000-0000-0000-000000000000

And running a backup manually resulted in the usual error:

Can't do delta with this vdi, transfer will be a full
Can't do delta, will try to get a full stream

So...this just makes the issue even more confusing, why does a rolling pool reboot not cause this behaviour but a manual migration does? Does the ID being all zeros not actually matter? I seem to be able to consistently reproduce this too. Ill be curious to next test if a "rolling pool update" causes this behaviour next time a batch of updates is released.

rtjdamen

@flakpyro very strange issue indeed, we can’t reproduce the problem on our end. So as @olivierlambert mentioned before it has to be something specific. The new developments make it indeed more difficult to understand, but i believe there is a logical explanation to this.

olivierlambert

@flakpyro How do you migrate the VM already? I mean the exact steps you do.

flakpyro

@olivierlambert

In XOA i browse to the VM inventory list, search for the VM i want to migrate, check the box beside it, and click the migrate button located at the top right of the page, the "Migrate VM" popup appears and select the second host which is in the same pool, and click "Ok"

We have 2 pools i can reproduce this on:

The "Test Environment pool" with 2 HP DL325 Gen 10 servers backed by a TrueNAS MINI R running NFS 4.1

Our Production pool running 5 HP DL320 Gen 11 servers backed by a Pure //20R4 running NFS 3.

On the networking side:

Both pools are connected to 2 Aruba CX 10G switches (VSX Stack), each host as 4 physical connections:

2x !0G Bond0: Storage/Management/Backup, MTU 1500, VLANs for VM Traffic/Managemnt/Backup

2 x 10G Bond1: Dedicated storage: MTU 9000, ONLY used for NFS storage traffic on an isolated storage VLAN.

Both the TrueNAS and Pure use MTU 9000 on their "Storage" ports as well. I know Vates steers people away from Jumbo frames as a rule, and i agree but Pure engineering was pretty adamant about using them, so they are only present on these dedicated ports for storage only.

I will soon have a 3rd pool to test on as our DR site comes online next month, it will also be backed by Pure Storage.

I see others are also experiencing this issue as well now, looking at some more recent posts on this thread.

It should be noted regular backups with "NBD and CBT" enabled but with the snapshot deletion button turned off run without issue and have for months now proven themselves reliable. It would just be nice to not have to keep that snapshot daily