Backup Job HTTP connection abruptly closed

Forza

@_danielgurgel said in Backup Job HTTP connection abruptly closed:

Error: all targets have failed, step: writer.run()

I had similar issue today too. But restarting the backup worked. Weird. I had another similar case a little while ago that I opened a ticket for too.

olivierlambert

@_danielgurgel if you mean basic backup, it's XVA export for both. The only different is in a back case, you are writing the file in a remote instead of sending it to your browser.

lavamind

We've been having the same problem with our Delta backups for several weeks now. The job runs every day and about 1 / 3 days, we have failures like this. It seems to affect random VMs, but one or two seem to be affected more often.

We tried increasing the ring buffers on the physical network interfaces but it didn't help. Now we're going to try to pause GC during the backups to see if it helps.

We looked at SMlog and daemon.log and could not find any obvious problems on the host occuring at the time of the error. If it's a problem with networking, how could we verify this?

olivierlambert

@lavamind please triple check you are using XOA on latest or if XO from the sources, on master.

lavamind

@olivierlambert Yeah that's definately the next thing we'll try. For now we're using sources on release 5.59. If the problem persists we'll upgrade to 5.63 next week.

Not too keen on following master, since we have issues with it in the past (including bad backups)...

lavamind

This post is deleted!

lavamind

FYI, we do our best to ensure master is not broken but we only do the complete QA process just before an XOA release

Is that still the case?

From https://github.com/vatesfr/xen-orchestra/issues/3784#issuecomment-447797895

olivierlambert

It's always the case

_danielgurgel

@olivierlambert Even updating the host from 8.0 to 8.2 (with last update level) and after cluster and NFS migration, the problem persists.

We updated the virtualization agent on the virtual server to the latest available version from Citrix and we were able to back it up for a few weeks...but the problem reoccurred, again only for the same server.

Are there any logs I can paste to help identify this failure?

olivierlambert

This is not an easy questions This would require investigation on the host I'm afraid.

lavamind

For the record, since upgrading to 5.63 the issue hasn't re-occurred at all.