CBT: the thread to centralize your feedback
-
Hmm indeed, so it could be that CBT reset code catching far too many cases where it shouldn't
(so it might be still related to the initial problem)
-
@olivierlambert that could for sure be the case. We are using NBD now without Purge snapshot data enabled for now and its been very reliable but hoping to keep chipping away at these issues so we can one day enable this on our production VMs.
if there is any testing you need me to do just let me know as we have a decent test environment setup where we prototype these things before deploying for real.
-
Goodmorning all! Hope the xen winter meetup has been great for the ones who joined!
We migrated to XCP-NG 8.3 this week from 8.2.1. On 8.2.1 we had everything working like a charm without issues with cbt anymore. On 8.3 however we seem to run into some frustrating things i can’t really put a finger on the root cause. So i think it’s important to share them with you all so we can see where it goes wrong and how we could fix them.
The first one is related to NBD/Xapi, what i found is that in some rare cases one of the vdi’s from a vm is stucked at the nbd host. I had 3 overnight after restarting the dedicated hosts but now i see new ones on different servers. We have set nbd limit to 1 connection but strange thing was in one case i was able to see there where 4 nbd connections or 4 vdi’s connected, not shure but 3 where disconnected 1 left. Could it be the software is sometimes not respecting the limit and then setting up multiple connections?
Second issue i found is that sometimes cbt is broken on a vm, in that case u can run a full over and over again but that will not work until you disable cbt and enable it again, forcing it someway to setup a new cbt chain. For some reason in the other cases it remains broken. Would it be an option to let the software handle this automatically? Be aware that i found some cases where disabling cbt on a running vm caused an error map_duplicate_key, this only happened when i disabled cbt manually on that vdi, when i did an online storage migration (what is doing the same if i am correct) it was working without this issue, not shure what is the difference but if the same error occurs when doing it automatically from the software u can cause your vm to be destroyed ;-).
Hope anyone can help me into the correct direction on this subjects! I will also open a ticket with support to investigate but would like to share my new experiences here, anyone seeing the same issues?
-
Hi All,
are there other users running 8.3 combined with CBT backups? we keep running into vdi's that stay connected to the control domain, we are investigating these with Vates as it looks like the problem is within xapi of tapdisk it would be helpfull to understand if there are others running into the same issue.
As far as i understand from vates it looks like the problem we face is somewhere within Xapi but we need more logging to understand it better. If anyone has this running with the same issues or without please let us know so we can compare.
Cheers!
-
@rtjdamen I can confirm in our environment with XCP-NG 8.3 and XO from sources (from January's end of month release) that we have this exact issue. To confirm what happens is if you define a default migration network during a VM migration, CBT gets disabled and the next backup displays the unable to do CBT on this VDI error. This happens on shared NFS SR on same pool.
-
@Andrw0830 Can you also confirm if taking a regular snapshot from XO, then deleting it sometime later causes CBT to also reset as i have ran into as well? (Above)
-
@Andrw0830 so just to confirm, you also have random vdi’s that stay connected to control domain?
The issue with the migration network is a known onee i understand from vates and is currently being worked on.
-
@flakpyro I usually don't keep snapshots on my VMs and just have it create snapshots as part of the NBD/CBT backups and have it purge then afterwards. I've tested created a standard snapshot (without memory), deleted it and did a backup and it was able to use the old NBD and CBT chain without error.
@rtjdamen I just have the migration network issue so thought you were referring to that so not sure about the control domain part. Is there a way to check or know if we are affected?
-
@Andrw0830 yeah shure, from dashboard u have the health tab, there u find vdi attached to control domain, we gave random vdi’s that stay attached. Although it seems to be random so we are investigating if this is an infeastructure thing or a bug. In xapi. We did not face this on 8.2.
We do not have issues with cbt after creating and then removing snapshots.
-
@rtjdamen As of writing this, I don't see any VMs VDI's listed as being attached to Control Domain. I have 10+ VMs on 3 XCP-NG 8.3 servers in one pool so I'll let you know if we come across that.
-
So testing after a few more rounds of updates and it appears i'm still having the same issue. If i have CBT with snapshot removal enabled, and i take a manual snapshot of a VM (say for running maintenance), then remove it after some time CBT will be reset and a full backup will run during the next backup schedule. This is fine for local backups where i have 10GBe between the ZFS backup server and the pool but not ideal for replication offsite. I see there are some big changes coming with the backup code which is great news but i'd REALLY like to be able to use CBT with snapshot deletion enabled!
-
-
Hi everyone,
Using Xen-NG for couple of years but I am new to Xen Orchestra. And first of all thank you for this great software.
Considering moving partly away from my own written backups scripts I am currently playing around being particular interested in the XOA backup routines, in particular the CBT based delta backups. But maybe first the official facts about my setup:
Server: HP ProLiant DL380p Gen8
Xen-NG: 8.3 - fully patched
Xen Orchestra: self build - commit 9ed55 (as of today)
SR: 2 / both local ext / both are raid 10 disc based by the HPE Smart Array Gen8 Controllers
Remote: NFS mounted (Synology - Fully patched as of today)Everything I tested so far works great. However I cannot get CBT based delta backups to run. I always get full backups and the error message is:
Can't do delta with this vdi, transfer will be a full
Can't do delta, will try to get a full streamAs said I am using an NFS based remote and all "normal" backups work like a charm. The Dummy VM I am testing to backup a newly set up Debian 12 (management tools installed) on the 2nd - non default- SR. CBT is enabled upon the disc as well in the the backup job ("Use NBD + CBT to transfer disk if available" enabled as well as "Purge snapshot data when using CBT "). I also enabled "Merge backups synchronously".
Unfortunately I do not really get my hands on this problem, because I do not find any particular error messages and do not really find good docs upon cbt. There are 2 observations I made:
- There are 2 *.cbtlog files in the SR directory of the VM. One showing 00000000-0000-0000-0000-000000000000 the other e011e9dd-e14b-4f0a-b143-092ea8f1b6a3. Is that normal?
- If I enable a memory including snapshot mode for each backup run there is one snapshot created in the default SR of the Host. These are not removed after the backup and remain orphaned. It looks to me these snapshots contain the memory (maybe this total BS - then please excuse this, but the observation might be helpful). However this problem is gone, if I switch to "offline snapshots".
Maybe I am just missing some stupid setting. Does anyone have any suggestion where to start troubleshoot.