XO task watcher issue/CR broken
-
@julien-f Using XO source
817911a...
, Backup CR still gets stuck.The job starts, the snapshot works, the transfer kicks off, but the job never sees the transfer finish. The import VM sits there and XO keeps waiting.
-
@Andrew I don't reproduce the issue on my end.
-
@julien-f I happens to me every time with the new XO source... when I use the older version it works correctly.
XO just does not see the transfer task complete. I don't see any errors or timeouts.
When I reload the XO server process I get:
Feb 08 12:39:52 xo1 systemd[1]: Reloading. Feb 08 12:39:52 xo1 xo-server[389]: 2023-02-08T17:39:52.416Z xo:main INFO SIGTERM caught, closing⦠Feb 08 12:39:52 xo1 systemd[1]: Stopping XO Server... Feb 08 12:39:52 xo1 xo-server[389]: 2023-02-08T17:39:52.427Z xo:api WARN admin | backupNg.runJob(...) [28m] =!> Error: worker exited with code null Feb 08 12:39:52 xo1 xo-server[389]: 2023-02-08T17:39:52.442Z xo:main WARN WebSocket send: { Feb 08 12:39:52 xo1 xo-server[389]: error: Error: The socket was closed while data was being compressed Feb 08 12:39:52 xo1 xo-server[389]: at /opt/xo/xo-builds/xen-orchestra-202302081151/node_modules/ws/lib/sender.js:410:21 Feb 08 12:39:52 xo1 xo-server[389]: at /opt/xo/xo-builds/xen-orchestra-202302081151/node_modules/ws/lib/permessage-deflate.js:326:9 Feb 08 12:39:52 xo1 xo-server[389]: at PerMessageDeflate.cleanup (/opt/xo/xo-builds/xen-orchestra-202302081151/node_modules/ws/lib/permessage-deflate.js:143:9) Feb 08 12:39:52 xo1 xo-server[389]: at WebSocket.emitClose (/opt/xo/xo-builds/xen-orchestra-202302081151/node_modules/ws/lib/websocket.js:253:57) Feb 08 12:39:52 xo1 xo-server[389]: at TLSSocket.socketOnClose (/opt/xo/xo-builds/xen-orchestra-202302081151/node_modules/ws/lib/websocket.js:1260:15) Feb 08 12:39:52 xo1 xo-server[389]: at TLSSocket.emit (node:events:525:35) Feb 08 12:39:52 xo1 xo-server[389]: at TLSSocket.patchedEmit [as emit] (/opt/xo/xo-builds/xen-orchestra-202302081151/@xen-orchestra/log/configure.js:52:17) Feb 08 12:39:52 xo1 xo-server[389]: at node:net:322:12 Feb 08 12:39:52 xo1 xo-server[389]: at Socket.done (node:_tls_wrap:588:7) Feb 08 12:39:52 xo1 xo-server[389]: at Object.onceWrapper (node:events:628:26) Feb 08 12:39:52 xo1 xo-server[389]: at Socket.emit (node:events:525:35) Feb 08 12:39:52 xo1 xo-server[389]: at Socket.patchedEmit [as emit] (/opt/xo/xo-builds/xen-orchestra-202302081151/@xen-orchestra/log/configure.js:52:17) Feb 08 12:39:52 xo1 xo-server[389]: at TCP.<anonymous> (node:net:322:12) Feb 08 12:39:52 xo1 xo-server[389]: at TCP.callbackTrampoline (node:internal/async_hooks:130:17) Feb 08 12:39:52 xo1 xo-server[389]: } Feb 08 12:39:52 xo1 xo-server[389]: 2023-02-08T17:39:52.908Z xo:main INFO bye :-) Feb 08 12:39:52 xo1 systemd[1]: xo-server.service: Succeeded. Feb 08 12:39:52 xo1 systemd[1]: Stopped XO Server. Feb 08 12:39:52 xo1 systemd[1]: xo-server.service: Consumed 5min 16.239s CPU time.
-
@Andrew Just tried with a fresh build of XO and I don't see any problems.
What's your job config?
-
@julien-f Here's the backup info for the CR job. It works correctly on XO commit bf51b.
-
@Andrew What's your Node version?
-
@julien-f node.js v18.14.0 on Debian 11.6
-
@Andrew What kind of remote are you using? And with which configuration (encryption, multiple data blocks, etc)?
-
My bad, there is no remote with CR
I have no idea what's going thoughβ¦
-
@julien-f I was just going to say it's another host local storage, not a remote. Desination host is not in the same pool but is on the same 10GB LAN. All hosts are 8.2.1.
-
@Andrew I'm unable to reproduce on my end
If you can reproduce with an official XOA, open a support tunnel and I'll investigate further.
-
@julien-f I loaded XOA (5.109.0) but it's not new enough to include to problematic code that cause problems in XO source.
-
-
@Andrew If you open a support tunnel, I can deploy a sources XO in your appliance.
-
@julien-f ok.
-
-
@julien-f
cr-issue
branch commit 27d81 resolved this new CR problem for me. Thanks! -
@Andrew I need to understand why it's working now
Which version of XCP-ng/XenServer are you using as source and as the target of the CR?
-
This commit 27d81 is working for me to.
XCP-ng is 8.2.1, up to date -
@Andrew & @Gheppy: I've just pushed a new commit in the
cr-issue
branch which adds some debug logs, it adds something like that:putResource(b1imtu1j0y): taskRef: OpaqueRef:10eb89fd-d31e-495b-955d-672c61a4ea48 putResource(b1imtu1j0y): useHack: false putResource(b1imtu1j0y): body#end putResource(b1imtu1j0y): request#unpipe putResource(b1imtu1j0y): request#prefinish putResource(b1imtu1j0y): request#finish putResource(b1imtu1j0y): response#resume putResource(b1imtu1j0y): body#close putResource(b1imtu1j0y): response#readable putResource(b1imtu1j0y): response#end putResource(b1imtu1j0y): response#close putResource(b1imtu1j0y): request#close
If you could try it and show me the output, that would help me understand what's going on
-
Eager to read the output, it's weird we can't reproduce here, there's something