S3 backup broken
-
@odeawan Thanks for the idea, but I know that's not the problem in this case. This test setup that fails is between a XO server and a S3 server on the same LAN, so no firewall... and it only fails during the delta merge phase. My main (older version) XO server that runs backups to off site S3 storage works fine (also to the same local S3 server).
-
@florent I did a git update again (to commit aa261) and it works! Maybe I missed and update?,,,,
{ "data": { "mode": "delta", "reportWhen": "never" }, "id": "1661987889299", "jobId": "d6c0a656-62c5-4c39-a57a-f246b39f1cef", "jobName": "minio-test", "message": "backup", "scheduleId": "bd4ef436-fd85-4f16-bf9e-71d1d0c8586f", "start": 1661987889299, "status": "success", "infos": [ { "data": { "vms": [ "c45dd52b-fa92-df6f-800a-10853c183c23" ] }, "message": "vms" } ], "tasks": [ { "data": { "type": "VM", "id": "c45dd52b-fa92-df6f-800a-10853c183c23" }, "id": "1661987890202", "message": "backup VM", "start": 1661987890202, "status": "success", "tasks": [ { "id": "1661987890615", "message": "clean-vm", "start": 1661987890615, "status": "success", "end": 1661987891797, "result": { "merge": false } }, { "id": "1661987891992", "message": "snapshot", "start": 1661987891992, "status": "success", "end": 1661987893470, "result": "ec858b82-4f64-c5fe-a258-887ba57c7458" }, { "data": { "id": "9890e0c4-ba3a-4810-8245-a49fdf16b16e", "isFull": false, "type": "remote" }, "id": "1661987893471", "message": "export", "start": 1661987893471, "status": "success", "tasks": [ { "id": "1661987893513", "message": "transfer", "start": 1661987893513, "status": "success", "end": 1661987897713, "result": { "size": 75549184 } }, { "id": "1661987898186", "message": "clean-vm", "start": 1661987898186, "status": "success", "tasks": [ { "id": "1661987899091", "message": "merge", "start": 1661987899091, "status": "success", "end": 1661987902420 } ], "end": 1661987902474, "result": { "merge": true } } ], "end": 1661987902480 } ], "end": 1661987902480 } ], "end": 1661987902480 }
-
@Andrew the problem occurs only during resuming a merge
Does your S3 provider apply some rate limits to queries ? We do a lot of copy / delete query during merge
-
@florent For local testing I'm using MinIO and there is no limit or throttling enabled.
-
@Andrew I am testing on another user with less concurrency during merge ( today there is 16 blocks merged in parallel) , and it seems to solve the problem
I will make it configurable soon ( today it's a parameter in the code )
-
@florent I see the
fix-s3-merge
branch is gone. I updated to current master (commit d8e01) and delta backup S3 is not working (same problem as before). -
@olivierlambert @florent The
s3.js
program has different code between thefix-s3-merge
branch and master. -
@Andrew back to the nosuchkey error ?
-
@florent 10 minute timeout...
{ "data": { "mode": "delta", "reportWhen": "never" }, "id": "1662038805540", "jobId": "d6c0a656-62c5-4c39-a57a-f246b39f1cef", "jobName": "minio-test", "message": "backup", "scheduleId": "bd4ef436-fd85-4f16-bf9e-71d1d0c8586f", "start": 1662038805540, "status": "failure", "infos": [ { "data": { "vms": [ "c45dd52b-fa92-df6f-800a-10853c183c23" ] }, "message": "vms" } ], "tasks": [ { "data": { "type": "VM", "id": "c45dd52b-fa92-df6f-800a-10853c183c23" }, "id": "1662038806415", "message": "backup VM", "start": 1662038806415, "status": "failure", "tasks": [ { "id": "1662038806845", "message": "clean-vm", "start": 1662038806845, "status": "success", "end": 1662038807088, "result": { "merge": false } }, { "id": "1662038807286", "message": "snapshot", "start": 1662038807286, "status": "success", "end": 1662038808752, "result": "03c16279-9af3-e35a-a7a0-7724e62c9cc2" }, { "data": { "id": "9890e0c4-ba3a-4810-8245-a49fdf16b16e", "isFull": false, "type": "remote" }, "id": "1662038808752:0", "message": "export", "start": 1662038808752, "status": "failure", "tasks": [ { "id": "1662038808802", "message": "transfer", "start": 1662038808802, "status": "success", "end": 1662038832144, "result": { "size": 1558597632 } }, { "id": "1662038832622", "message": "clean-vm", "start": 1662038832622, "status": "failure", "tasks": [ { "id": "1662038832809", "message": "merge", "start": 1662038832809, "status": "failure", "end": 1662039432907, "result": { "chain": [ "/xo-vm-backups/c45dd52b-fa92-df6f-800a-10853c183c23/vdis/d6c0a656-62c5-4c39-a57a-f246b39f1cef/ae8fffde-b2bd-4205-a596-9139ef59193f/20220901T124634Z.alias.vhd", "/xo-vm-backups/c45dd52b-fa92-df6f-800a-10853c183c23/vdis/d6c0a656-62c5-4c39-a57a-f246b39f1cef/ae8fffde-b2bd-4205-a596-9139ef59193f/20220901T130323Z.alias.vhd" ], "message": "operation timed out", "name": "TimeoutError", "stack": "TimeoutError: operation timed out\n at Promise.timeout (/opt/xo/xo-builds/xen-orchestra-202209010921/node_modules/promise-toolbox/timeout.js:11:16)\n at S3Handler.rename (/opt/xo/xo-builds/xen-orchestra-202209010921/@xen-orchestra/fs/dist/abstract.js:338:37)\n at Queue.next (/opt/xo/xo-builds/xen-orchestra-202209010921/node_modules/limit-concurrency-decorator/dist/index.js:21:22)\n at process.processTicksAndRejections (node:internal/process/task_queues:95:5)" } } ], "end": 1662039432908, "result": { "chain": [ "/xo-vm-backups/c45dd52b-fa92-df6f-800a-10853c183c23/vdis/d6c0a656-62c5-4c39-a57a-f246b39f1cef/ae8fffde-b2bd-4205-a596-9139ef59193f/20220901T124634Z.alias.vhd", "/xo-vm-backups/c45dd52b-fa92-df6f-800a-10853c183c23/vdis/d6c0a656-62c5-4c39-a57a-f246b39f1cef/ae8fffde-b2bd-4205-a596-9139ef59193f/20220901T130323Z.alias.vhd" ], "message": "operation timed out", "name": "TimeoutError", "stack": "TimeoutError: operation timed out\n at Promise.timeout (/opt/xo/xo-builds/xen-orchestra-202209010921/node_modules/promise-toolbox/timeout.js:11:16)\n at S3Handler.rename (/opt/xo/xo-builds/xen-orchestra-202209010921/@xen-orchestra/fs/dist/abstract.js:338:37)\n at Queue.next (/opt/xo/xo-builds/xen-orchestra-202209010921/node_modules/limit-concurrency-decorator/dist/index.js:21:22)\n at process.processTicksAndRejections (node:internal/process/task_queues:95:5)" } } ], "end": 1662039433046 } ], "end": 1662039433046 } ], "end": 1662039433047 }
-
@Andrew yes The timeout are not fixed for now, only the nosuchkey
the fix will allow you to at least let you set a custom concurrency limit or maybe to calculate a value in a smarter way
-
@florent Sure.... but delta backup merge S3 no longer works in master.
-
@Andrew yes, we will reduce default concurrency while waiting for the parametrized version
here is the branch if you want to test it https://github.com/vatesfr/xen-orchestra/pull/6400
also , can you monitor the minio resource usage ? I'm curious to where is the bottleneck during a rename ( cpu / ram or disk usage)
-
@florent Limiting concurrency did not fix my S3 backup problem, but it's working again after updating the build. So I guess it's resolved.