CBT: the thread to centralize your feedback

rtjdamen

@florent i have been watching the backup process and in the end i only seed vdi.destroy happening nog vdi.data_destroy is this correct? are we handling this last step correct or do we remain data on the snapshot at this time?

florent

dataDestroy will be enable-able (not sure if it's really a word) today, in he meantime, the

Please note that the metadata snapshot won't be visible in the UI since it's not a VM Snapshot, but only the metadata of the vdi snapshots

latest commits in the fix_cbt branch add an additionnal check on dom0 connect, more error handling

rtjdamen

@florent ok so currently the data remains? When do u think this addition is ready for testing? I am interested as we saw some issues with this on nfs and i am curious if it will make a difference with this code.

@olivierlambert i now understand there is in general no difference on coalesce as long as the data destroy is not done. So u were right on that part and it’s safe pushing it this way!

olivierlambert

Yes, that's why we'll be able to offer a safe route for people not using the data destroy but leave people who want to explore it to do so in opt in

florent

@rtjdamen it's still fresh, but on the other hand, the worse that can happen is falling back to a full backup. So for now I would not use it on the bigger VM ( multi terabytes )
We are sure that it will be a game changer on thick provisioning ( because snapshot cost the full virtual size) or on fast changing VM , where coalescing an older snapshot is a major hurdle

If everything goes well it will be on stable by the end of july, and we'll probably enable it by default on new backup in the near future

Tristis Oris

can't commit, too small for ticket.

typo

preferNbdInformation:
    'A network accessible by XO or the proxy must have NBD enabled,. Storage must support Change Block Tracking (CBT) to ue it in a backup',

enabled,.
to ue

rtjdamen

This post is deleted!

Tristis Oris

updated to fix_cbt branch.

CR NBD backup works.
Delta NBD backup works.
just once, so we can't be sure yet.

No broken tasks is generated.

Still confused why CBT toggle is enabled on some VMs.
2 similars vms on same pool, same storage, same ubuntu version. One is enabled automaticaly, other is not.

rtjdamen

@florent i did some testing with the data_destroy branch on my lab, it seems to work as required, indeed the snapshot is hidden when it is cbt only.

What i am not shure is correct, when the data destroy action is done, i would expect a snapshot is showing up for coalesce but it does not. Is it too small, and quick removed so it will not be visible in XOA? on larger vms with our production i can see these snapshots showing for coalesce? Or when you do vdi.data_destroy will it try to coalesce directly without garbage collection afterwards?

rtjdamen

@florent what does happen when we upgrade to this version by the end of july, we do now use NDB without cbt on most backups. will all need to run a full or does it 'convert' the method to the cbt situation? i asume as the checkbox for data destroy will be disabled in general it will not change that much to the backup at day one as long as u not switch to the data destroy option?

olivierlambert

The transition to CBT should be done smoothly and without any manual intervention. @florent will provide more details on how

rtjdamen

All tests with 2 vms were so far succesfull, no issues found in our lab. Good job guys!

robyt

@olivierlambert how many time for us with precompiled XOA?

olivierlambert

Tomorrow

rtjdamen

@olivierlambert sounds good!

Delgado

Things are looking good on my end as well.

Andrew

@olivierlambert Looks like it's back to single threaded bottlenecks...

I see a lot of single core 100% utilization on the XO VM.

rtjdamen

@Andrew Hi Andrew, can't reproduce on my end, all cores utilized at the same time around 30 to 40 % for 2 simultanious backups.

Andrew

@rtjdamen It happens when Continuous Replication is running. The source and destination and network can do 10Gb/sec.

I'll have to work on a better set of conditions and tests to replicate the issue.

I know it's slower because the hourly replication was taking 5-6 minutes and now takes 9-10 minutes. It's more of an issue when the transfer is >300GB.

Just feedback....

rtjdamen

@Andrew understood! We do not use that at this time.