Backup Job HTTP connection abruptly closed
-
We are getting backup error on only 1 server in our pool. We've already swapped NFS storage and done a FULL CLONE of the VM for testing, but we're still failing (all other servers work fine in the backup operation to the same NFS Server).
I have not found anything related to this error and the snapshot operations are working correctly. Any tips to solve this problem?
transfer Start: Jul 27, 2021, 08:50:02 AM End: Jul 27, 2021, 09:39:49 AM Duration: an hour Error: HTTP connection abruptly closed Start: Jul 27, 2021, 08:50:02 AM End: Jul 27, 2021, 09:39:49 AM Duration: an hour Error: HTTP connection abruptly closed Start: Jul 27, 2021, 08:49:33 AM End: Jul 27, 2021, 09:44:45 AM Duration: an hour Error: all targets have failed, step: writer.run() Type: full
-
@_danielgurgel Here is complete log of the operation.
vm.copy { "vm": "54676579-2328-d137-1002-0f32920eab23", "sr": "50c59b18-5b5c-2eed-8c82-b8f7fdc8e9b5", "name": "VM_NAME" } { "call": { "method": "VM.destroy", "params": [ "OpaqueRef:fc032b38-d8d7-43ab-983c-f54bc9dc6f85" ] }, "message": "operation timed out", "name": "TimeoutError", "stack": "TimeoutError: operation timed out at Promise.call (/opt/xen-orchestra/node_modules/promise-toolbox/timeout.js:13:16) at Xapi._call (/opt/xen-orchestra/packages/xen-api/src/index.js:644:37) at /opt/xen-orchestra/packages/xen-api/src/index.js:722:21 at loopResolver (/opt/xen-orchestra/node_modules/promise-toolbox/retry.js:94:23) at Promise._execute (/opt/xen-orchestra/node_modules/bluebird/js/release/debuggability.js:384:9) at Promise._resolveFromExecutor (/opt/xen-orchestra/node_modules/bluebird/js/release/promise.js:518:18) at new Promise (/opt/xen-orchestra/node_modules/bluebird/js/release/promise.js:103:10) at loop (/opt/xen-orchestra/node_modules/promise-toolbox/retry.js:98:12) at retry (/opt/xen-orchestra/node_modules/promise-toolbox/retry.js:101:10) at Xapi._sessionCall (/opt/xen-orchestra/packages/xen-api/src/index.js:713:20) at Xapi.call (/opt/xen-orchestra/packages/xen-api/src/index.js:247:14) at loopResolver (/opt/xen-orchestra/node_modules/promise-toolbox/retry.js:94:23) at Promise._execute (/opt/xen-orchestra/node_modules/bluebird/js/release/debuggability.js:384:9) at Promise._resolveFromExecutor (/opt/xen-orchestra/node_modules/bluebird/js/release/promise.js:518:18) at new Promise (/opt/xen-orchestra/node_modules/bluebird/js/release/promise.js:103:10) at loop (/opt/xen-orchestra/node_modules/promise-toolbox/retry.js:98:12) at Xapi.retry (/opt/xen-orchestra/node_modules/promise-toolbox/retry.js:101:10) at Xapi.call (/opt/xen-orchestra/node_modules/promise-toolbox/retry.js:119:18) at Xapi.destroy (/opt/xen-orchestra/@xen-orchestra/xapi/src/vm.js:324:16) at Xapi._copyVm (file:///opt/xen-orchestra/packages/xo-server/src/xapi/index.mjs:322:9) at Xapi.copyVm (file:///opt/xen-orchestra/packages/xo-server/src/xapi/index.mjs:337:7) at Api.callApiMethod (file:///opt/xen-orchestra/packages/xo-server/src/xo-mixins/api.mjs:304:20)" }
-
It means XO sent an order to your XAPI (of your pool) and it never answered, at least not before a timeout.
-
@olivierlambert But any reason why this issue only occurs for this VM? Even cloning the VM, the problem happens with the clone... even changing the NFS Server, the problem happens... Let's try moving it to a new cluster.
-
I can't guess without taking more time to investigate, ideally on the host directly.
My guess is the issue is related to the host/pool connection with XO, not the storage.
-
@olivierlambert Is there any difference between "traditional Backup" and Export VM performed by Xen Orchestra?
Even changing the cluster virtual server, the problem still occurs. However, the Export operation works normally.
-
@_danielgurgel said in Backup Job HTTP connection abruptly closed:
Error: all targets have failed, step: writer.run()
I had similar issue today too. But restarting the backup worked. Weird. I had another similar case a little while ago that I opened a ticket for too.
-
@_danielgurgel if you mean basic backup, it's XVA export for both. The only different is in a back case, you are writing the file in a remote instead of sending it to your browser.
-
We've been having the same problem with our Delta backups for several weeks now. The job runs every day and about 1 / 3 days, we have failures like this. It seems to affect random VMs, but one or two seem to be affected more often.
We tried increasing the ring buffers on the physical network interfaces but it didn't help. Now we're going to try to pause GC during the backups to see if it helps.
We looked at
SMlog
anddaemon.log
and could not find any obvious problems on the host occuring at the time of the error. If it's a problem with networking, how could we verify this? -
@lavamind please triple check you are using XOA on
latest
or if XO from the sources, onmaster
. -
@olivierlambert Yeah that's definately the next thing we'll try. For now we're using sources on release
5.59
. If the problem persists we'll upgrade to5.63
next week.Not too keen on following
master
, since we have issues with it in the past (including bad backups)... -
This post is deleted! -
FYI, we do our best to ensure master is not broken but we only do the complete QA process just before an XOA release
Is that still the case?
From https://github.com/vatesfr/xen-orchestra/issues/3784#issuecomment-447797895
-
It's always the case
-
@olivierlambert Even updating the host from 8.0 to 8.2 (with last update level) and after cluster and NFS migration, the problem persists.
We updated the virtualization agent on the virtual server to the latest available version from Citrix and we were able to back it up for a few weeks...but the problem reoccurred, again only for the same server.
Are there any logs I can paste to help identify this failure?
-
This is not an easy questions This would require investigation on the host I'm afraid.
-
For the record, since upgrading to 5.63 the issue hasn't re-occurred at all.