CBT: the thread to centralize your feedback
-
I think I may have a bit of a similar problem here. About a week ago, I did an update to the broken version of XO and it threw the same error as is in the subject line here. I reverted and everything was OK, but then I started to get unhealthy VDI warnings on my backups.
I tried to rescan the SR and I would see in the SMLog that it believed another GC was running, so it would abort. Rebooting the host was the only way to force the coalesce to complete; however as soon as the next inc-backup ran, it would go into the same problem (the GC thinking another is running and would no do any work).
I then did a full power off of the host, reboot and let all the VM's sit in a "powered off" state, rescanned the SR and let it coalesce. Once everything was idle, I then deleted all snapshots and waited for the coalesce to finish. Only then did I restart the VM's. Now a few VM's immediately have come up as 'unhealthy' and once again the GC will not run, thinking there is another GC working..
I'm kind of running out of idea's 8-) Does anyone know what might be stuck or what I need to look for to find out?
Just a side note here. I noticed that all the VM's that I am having problems with have CBT enabled.
I have a VM that is a snapshot only VM and even when the coalesces is stuck, I can delete snapshots off this non-cbt VM and the coalesces process runs (then gives an exception when it gets to the VM's that have CBT enabled)
Is there a way to disable CBT?
-
Hello everybody,
thanks for your feedback, Here is a work branch with CBT enabled : https://github.com/vatesfr/xen-orchestra/pull/7792 . The branch name isfix_cbt
It fixes :
-
snapshot retention with full backups
-
off by one error for retention length
-
parent locator error
-
can't destructure undefined error
-
it don't leak vdi attached in the dom0 in our lab
-
progress is back on the export task
Please test it if you can , and don't hesitate to provide feedback
Regards,
Florent
-
-
For those that may be stuck, like I was, I finally have un-done the coaless nightmare the previous CBT did.
For note: I am using XCP-ng 8.3 Beta fully patched.
- What I had to do was shutdown every VM and delete every snapshot
- Find every VDI that had CBT enabled and disable it. I did this in a simple bash command (not the best, I know)
for i in `xe vdi-list cbt-enabled=true | grep "^uuid ( RO)" | cut -d " " -f 20` do echo $i xe vdi-disable-cbt uuid=$i done
- Reboot the server
- Create a snapshot on any VM and immidately delete it. (If you just do a rescan, it says that the GC is running when it is not but for whatever reason, deleting a shapshot seems to kick in the GC regardless)
- Keep an eye on the SMLog and look for exceptions... I tend to do something like: (It will sleep for 5 minutes - so dont get anxious)
tail -f /var/log/SMLog | grep SMGC
- When it finishes, check XO to see if there are any remaining uncoalessed disk and repeat from step 4.
It took about 5 iterations of the above to finally clean up all the stuck coalessed leafs but it eventually did it. The key, for me, was making sure the VM's were not running and turning CBT off.
-
@florent hi Florent, i would love to help u test this on our lab, i have XO from sources running there, but i have no cbt options, do i need to download it in a specific way?
-
@florent I'll be more than happy to help. I will get my homelab instance upgraded to that branch and report back with any issues,
-
@rtjdamen You need to switch on
fix_cbt
branch, likegit checkout fix_cbt
and rebuild. -
@olivierlambert thank you, found it, i will run some backups with one or two vms to start with and will report the results.
-
This seems to be working fine. Once the backup is complete, we'll execute the vdi_data_destroy command, right? Currently, it doesn't appear obvious that this is a CBT metadata-only snapshot. Is there a way to make this more visible?
-
You mean in the VM view/snapshot tab? You are seeing the VM snapshot, not the VDI snapshot, so I wonder if this VM snapshot can be reverted while being CBT metadata only, and if not, we must make it clear in the UI, yes!
-
I enabled cbt on the disks and nbd + cbt in my delta backup and so far so good. I plan on letting another backup run over night. I also ran a full backup and it removed the snapshot like it's supposed to.
-
@olivierlambert yes indeed, this is currently visible like a normal snapshot, i think it should be visible like a metadata only snapshot.
-
@florent i have been watching the backup process and in the end i only seed vdi.destroy happening nog vdi.data_destroy is this correct? are we handling this last step correct or do we remain data on the snapshot at this time?
-
dataDestroy will be enable-able (not sure if it's really a word) today, in he meantime, the
Please note that the metadata snapshot won't be visible in the UI since it's not a VM Snapshot, but only the metadata of the vdi snapshots
latest commits in the fix_cbt branch add an additionnal check on dom0 connect, more error handling
-
@florent ok so currently the data remains? When do u think this addition is ready for testing? I am interested as we saw some issues with this on nfs and i am curious if it will make a difference with this code.
@olivierlambert i now understand there is in general no difference on coalesce as long as the data destroy is not done. So u were right on that part and itβs safe pushing it this way!
-
Yes, that's why we'll be able to offer a safe route for people not using the data destroy but leave people who want to explore it to do so in opt in
-
@rtjdamen it's still fresh, but on the other hand, the worse that can happen is falling back to a full backup. So for now I would not use it on the bigger VM ( multi terabytes )
We are sure that it will be a game changer on thick provisioning ( because snapshot cost the full virtual size) or on fast changing VM , where coalescing an older snapshot is a major hurdleIf everything goes well it will be on stable by the end of july, and we'll probably enable it by default on new backup in the near future
-
can't commit, too small for ticket.
typo
preferNbdInformation: 'A network accessible by XO or the proxy must have NBD enabled,. Storage must support Change Block Tracking (CBT) to ue it in a backup',
enabled,.
to ue -
This post is deleted! -
updated to
fix_cbt
branch.CR NBD backup works.
Delta NBD backup works.
just once, so we can't be sure yet.No broken tasks is generated.
Still confused why CBT toggle is enabled on some VMs.
2 similars vms on same pool, same storage, same ubuntu version. One is enabled automaticaly, other is not. -
@florent i did some testing with the data_destroy branch on my lab, it seems to work as required, indeed the snapshot is hidden when it is cbt only.
What i am not shure is correct, when the data destroy action is done, i would expect a snapshot is showing up for coalesce but it does not. Is it too small, and quick removed so it will not be visible in XOA? on larger vms with our production i can see these snapshots showing for coalesce? Or when you do vdi.data_destroy will it try to coalesce directly without garbage collection afterwards?