CBT: the thread to centralize your feedback

olivierlambert

@flakpyro moving to another SR or just live migrating?

flakpyro

So as a test today after the failure last night i ran a full backup with CBT enabled on 2 VMs, i then migrated both from Host 1 to Host 2 (using the same shared SR mounted on each host via NFS 4.1 (TrueNAS)). Tonight the backup failed with the same error:

Error: stream has ended with not enough data (actual: 449, expected: 512)

If it would at all help i could open a ticket referencing this thread and enable the remote support tunnel. We can always go back to the old backup method but if i can help make CBT rock solid im always willing to help!

rtjdamen

@flakpyro i did the same test at out end, live migration of a vm and no issue with the backup after. I can’t reproduce. We use iscsi so maybe the difference is there, i have seen cbt issues on nfs as well with our 3th party tooling. Will do some more testing tonight.

manilx

@flakpyro Did you move the VM with the snapshots? Or did you delete them before (or had CBT option on to autodelete them)?

I'd like to test this also.

flakpyro

@manilx I have the the option "Purge snapshot data when using CBT" in the backup job enabled.

The VMs themselves are staying on the same NFS SR shared between the two hosts in the same pool. I simply migrated the VM from host 1 to host 2 leaving the storage location itself unchanged.

olivierlambert

Is your XO able to reach both hosts directly?

flakpyro

@olivierlambert Yes, the two hosts and XO are all on the same subnet.

olivierlambert

I've seen some changes/updates on GH by @florent , but I can't tell much more on my side We can try to test this very scenario though.

Andrew

@olivierlambert Using current XO (07024). I'm having problems with one VM and hourly Continuous Replication (Delta, CBT/NBD). The VM has three VDIs attached. Sometimes it works just fine. But it does error out at least once a day and transferring hundreds of gigs of data for its retries.

stream has ended with not enough data (actual: 446, expected: 512)
Couldn't deleted snapshot data
Disk is still attached to DOM0 VM

On the next CR job it works (and deletes snapshots) but normally leaves a task running from before on the pool master for the export (that's doing nothing, stuck at 0%). A tool stack restart is required to free it.

It's odd that it complains about the disk attached to Dom0 on a run where it leaves a Dom0 export task but then on the next run it works without error (and the VDI task still attached to Dom0).

I can't force the problem to happen, but it keeps happening.

rtjdamen

@olivierlambert last week @florent implemented some changed code at our end, but this did not seem to resolve the issue, we are using a proxy for backups, i am not shure is the changed code was deployed to our xoa or xoa proxy, as we are still encountering the issue i have now changed the settings to not use the proxy, so far no stream errors yet. I still see the data destory vdi in use but that could be a different problem. I will keep an eye on it to check if indeed florent did not deploy the upgrade to the proxy and we did therefore not see the changes.

Anyone using proxy in this situations where the stream error occurs?

rtjdamen

@flakpyro did the same test with nfs based vm, but also can’t reproduce the problem here.

rtjdamen

@olivierlambert unfortunatly it did not resolve the stream problem, just got another one.

manilx

@olivierlambert I just got this error to on our production pools!
ScreenShot 2024-07-21 at 09.29.37.png

manilx

@manilx ticket opened.

julien-f

@manilx The incorrect backup size warning is a known issue that we will investigate soon

manilx

@julien-f said in CBT: the thread to centralize your feedback:

@manilx The incorrect backup size warning is a known issue that we will investigate soon

rtjdamen

@julien-f and the stream error?

julien-f

@rtjdamen We are investigating it as well

For the XO backend team, the next few months are focused on stabilization, bug fixes and maintenance

Andrew

@julien-f @florent @olivierlambert I'm using XO source master (commit c5f6b). Running Continuous Replication with NBD+CBT and Purge Snapshot. With a single (one) NBD connection on the backup job, things work correctly. With two NBD connections I see some Orphan VDIs left almost every time after the backup job runs. They are different ones each time.

flakpyro

I noticed when migrating some VDIs with CBT enabled from one NFS SR another that the CBT only snapshots are left behind. Perhaps this is why a full backup seems to be required after a storage migration?