CBT: the thread to centralize your feedback

rtjdamen

The current cbt backup is working but it contains several bugs that needs to be resolved. @florent is working on fixes but from what i understand it is difficult to be fixed. Hope there will be progress soon!

flakpyro

@rtjdamen I noticed this today as well. A VM within a pool migrated from one host to another, after which i received

Error: stream has ended with not enough data (actual: 449, expected: 512)

Retrying the backup resulting it in running a Full Backup. Hopefully XO can start to handle this otherwise im not sure CBT backups are worth it over the old method for pools with multiple hosts. Running a Rolling Pool Update or Rolling Pool Reboot would guarantee every VM would fail and require a full backup? Or is this not expected behaviour?

rtjdamen

@flakpyro no this is a bug, we see this on vms that are not migrated. In our other backup software we do not see this issue, using cbt there as well.

olivierlambert

I cannot reproduce in my own production here, the only problem is when I have a job on 2 different pool, one pool doesn't have NBD enabled and re-doing a full. The NBD enabled pool works perfectly though

rtjdamen

@olivierlambert From what i understand about this issue is it occurs when snapshot is deleted at the wrong time, although this is what florent told me. he created a fix for it but that does not resolve the issue.

olivierlambert

I do not reproduce that problem here, so it's clearly something subtle and/or configuration dependent. Any other feedback from the community with CBT? More feedback will be helpful to pinpoint the issues left

flakpyro

@olivierlambert Maybe it was just a coincidence then that this happened after a migration. Backups with this VM had been working well for about a week until i migrated it to a different host yesterday.

rtjdamen

@flakpyro we see this happening to a few vms (2 or 3) random on a large pool 400+ so i think it is a coincidence.

olivierlambert

@flakpyro moving to another SR or just live migrating?

flakpyro

@olivierlambert Just live migrating.

So as a test today after the failure last night i ran a full backup with CBT enabled on 2 VMs, i then migrated both from Host 1 to Host 2 (using the same shared SR mounted on each host via NFS 4.1 (TrueNAS)). Tonight the backup failed with the same error:

Error: stream has ended with not enough data (actual: 449, expected: 512)

If it would at all help i could open a ticket referencing this thread and enable the remote support tunnel. We can always go back to the old backup method but if i can help make CBT rock solid im always willing to help!

rtjdamen

@flakpyro i did the same test at out end, live migration of a vm and no issue with the backup after. I can’t reproduce. We use iscsi so maybe the difference is there, i have seen cbt issues on nfs as well with our 3th party tooling. Will do some more testing tonight.

manilx

@flakpyro Did you move the VM with the snapshots? Or did you delete them before (or had CBT option on to autodelete them)?

I'd like to test this also.

flakpyro

@manilx I have the the option "Purge snapshot data when using CBT" in the backup job enabled.

The VMs themselves are staying on the same NFS SR shared between the two hosts in the same pool. I simply migrated the VM from host 1 to host 2 leaving the storage location itself unchanged.

olivierlambert

Is your XO able to reach both hosts directly?

flakpyro

@olivierlambert Yes, the two hosts and XO are all on the same subnet.

olivierlambert

I've seen some changes/updates on GH by @florent , but I can't tell much more on my side We can try to test this very scenario though.

Andrew

@olivierlambert Using current XO (07024). I'm having problems with one VM and hourly Continuous Replication (Delta, CBT/NBD). The VM has three VDIs attached. Sometimes it works just fine. But it does error out at least once a day and transferring hundreds of gigs of data for its retries.

stream has ended with not enough data (actual: 446, expected: 512)
Couldn't deleted snapshot data
Disk is still attached to DOM0 VM

On the next CR job it works (and deletes snapshots) but normally leaves a task running from before on the pool master for the export (that's doing nothing, stuck at 0%). A tool stack restart is required to free it.

It's odd that it complains about the disk attached to Dom0 on a run where it leaves a Dom0 export task but then on the next run it works without error (and the VDI task still attached to Dom0).

I can't force the problem to happen, but it keeps happening.

rtjdamen

@olivierlambert last week @florent implemented some changed code at our end, but this did not seem to resolve the issue, we are using a proxy for backups, i am not shure is the changed code was deployed to our xoa or xoa proxy, as we are still encountering the issue i have now changed the settings to not use the proxy, so far no stream errors yet. I still see the data destory vdi in use but that could be a different problem. I will keep an eye on it to check if indeed florent did not deploy the upgrade to the proxy and we did therefore not see the changes.

Anyone using proxy in this situations where the stream error occurs?

rtjdamen

@flakpyro did the same test with nfs based vm, but also can’t reproduce the problem here.

rtjdamen

@olivierlambert unfortunatly it did not resolve the stream problem, just got another one.