XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    S3 backup broken

    Scheduled Pinned Locked Moved Xen Orchestra
    31 Posts 5 Posters 4.9k Views 6 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • florentF Offline
      florent Vates 🪐 XO Team @Andrew
      last edited by

      @Andrew the problem occurs only during resuming a merge

      Does your S3 provider apply some rate limits to queries ? We do a lot of copy / delete query during merge

      A 1 Reply Last reply Reply Quote 0
      • A Offline
        Andrew Top contributor @florent
        last edited by

        @florent For local testing I'm using MinIO and there is no limit or throttling enabled.

        florentF 1 Reply Last reply Reply Quote 0
        • florentF Offline
          florent Vates 🪐 XO Team @Andrew
          last edited by

          @Andrew I am testing on another user with less concurrency during merge ( today there is 16 blocks merged in parallel) , and it seems to solve the problem

          I will make it configurable soon ( today it's a parameter in the code 😰 )

          A 2 Replies Last reply Reply Quote 0
          • A Offline
            Andrew Top contributor @florent
            last edited by

            @florent I see the fix-s3-merge branch is gone. I updated to current master (commit d8e01) and delta backup S3 is not working (same problem as before).

            florentF 1 Reply Last reply Reply Quote 0
            • A Offline
              Andrew Top contributor @florent
              last edited by

              @olivierlambert @florent The s3.js program has different code between the fix-s3-merge branch and master.

              1 Reply Last reply Reply Quote 0
              • florentF Offline
                florent Vates 🪐 XO Team @Andrew
                last edited by

                @Andrew back to the nosuchkey error ?

                A 1 Reply Last reply Reply Quote 0
                • A Offline
                  Andrew Top contributor @florent
                  last edited by

                  @florent 10 minute timeout...

                  {
                    "data": {
                      "mode": "delta",
                      "reportWhen": "never"
                    },
                    "id": "1662038805540",
                    "jobId": "d6c0a656-62c5-4c39-a57a-f246b39f1cef",
                    "jobName": "minio-test",
                    "message": "backup",
                    "scheduleId": "bd4ef436-fd85-4f16-bf9e-71d1d0c8586f",
                    "start": 1662038805540,
                    "status": "failure",
                    "infos": [
                      {
                        "data": {
                          "vms": [
                            "c45dd52b-fa92-df6f-800a-10853c183c23"
                          ]
                        },
                        "message": "vms"
                      }
                    ],
                    "tasks": [
                      {
                        "data": {
                          "type": "VM",
                          "id": "c45dd52b-fa92-df6f-800a-10853c183c23"
                        },
                        "id": "1662038806415",
                        "message": "backup VM",
                        "start": 1662038806415,
                        "status": "failure",
                        "tasks": [
                          {
                            "id": "1662038806845",
                            "message": "clean-vm",
                            "start": 1662038806845,
                            "status": "success",
                            "end": 1662038807088,
                            "result": {
                              "merge": false
                            }
                          },
                          {
                            "id": "1662038807286",
                            "message": "snapshot",
                            "start": 1662038807286,
                            "status": "success",
                            "end": 1662038808752,
                            "result": "03c16279-9af3-e35a-a7a0-7724e62c9cc2"
                          },
                          {
                            "data": {
                              "id": "9890e0c4-ba3a-4810-8245-a49fdf16b16e",
                              "isFull": false,
                              "type": "remote"
                            },
                            "id": "1662038808752:0",
                            "message": "export",
                            "start": 1662038808752,
                            "status": "failure",
                            "tasks": [
                              {
                                "id": "1662038808802",
                                "message": "transfer",
                                "start": 1662038808802,
                                "status": "success",
                                "end": 1662038832144,
                                "result": {
                                  "size": 1558597632
                                }
                              },
                              {
                                "id": "1662038832622",
                                "message": "clean-vm",
                                "start": 1662038832622,
                                "status": "failure",
                                "tasks": [
                                  {
                                    "id": "1662038832809",
                                    "message": "merge",
                                    "start": 1662038832809,
                                    "status": "failure",
                                    "end": 1662039432907,
                                    "result": {
                                      "chain": [
                                        "/xo-vm-backups/c45dd52b-fa92-df6f-800a-10853c183c23/vdis/d6c0a656-62c5-4c39-a57a-f246b39f1cef/ae8fffde-b2bd-4205-a596-9139ef59193f/20220901T124634Z.alias.vhd",
                                        "/xo-vm-backups/c45dd52b-fa92-df6f-800a-10853c183c23/vdis/d6c0a656-62c5-4c39-a57a-f246b39f1cef/ae8fffde-b2bd-4205-a596-9139ef59193f/20220901T130323Z.alias.vhd"
                                      ],
                                      "message": "operation timed out",
                                      "name": "TimeoutError",
                                      "stack": "TimeoutError: operation timed out\n    at Promise.timeout (/opt/xo/xo-builds/xen-orchestra-202209010921/node_modules/promise-toolbox/timeout.js:11:16)\n    at S3Handler.rename (/opt/xo/xo-builds/xen-orchestra-202209010921/@xen-orchestra/fs/dist/abstract.js:338:37)\n    at Queue.next (/opt/xo/xo-builds/xen-orchestra-202209010921/node_modules/limit-concurrency-decorator/dist/index.js:21:22)\n    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)"
                                    }
                                  }
                                ],
                                "end": 1662039432908,
                                "result": {
                                  "chain": [
                                    "/xo-vm-backups/c45dd52b-fa92-df6f-800a-10853c183c23/vdis/d6c0a656-62c5-4c39-a57a-f246b39f1cef/ae8fffde-b2bd-4205-a596-9139ef59193f/20220901T124634Z.alias.vhd",
                                    "/xo-vm-backups/c45dd52b-fa92-df6f-800a-10853c183c23/vdis/d6c0a656-62c5-4c39-a57a-f246b39f1cef/ae8fffde-b2bd-4205-a596-9139ef59193f/20220901T130323Z.alias.vhd"
                                  ],
                                  "message": "operation timed out",
                                  "name": "TimeoutError",
                                  "stack": "TimeoutError: operation timed out\n    at Promise.timeout (/opt/xo/xo-builds/xen-orchestra-202209010921/node_modules/promise-toolbox/timeout.js:11:16)\n    at S3Handler.rename (/opt/xo/xo-builds/xen-orchestra-202209010921/@xen-orchestra/fs/dist/abstract.js:338:37)\n    at Queue.next (/opt/xo/xo-builds/xen-orchestra-202209010921/node_modules/limit-concurrency-decorator/dist/index.js:21:22)\n    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)"
                                }
                              }
                            ],
                            "end": 1662039433046
                          }
                        ],
                        "end": 1662039433046
                      }
                    ],
                    "end": 1662039433047
                  }
                  
                  florentF 1 Reply Last reply Reply Quote 0
                  • florentF Offline
                    florent Vates 🪐 XO Team @Andrew
                    last edited by

                    @Andrew yes The timeout are not fixed for now, only the nosuchkey

                    the fix will allow you to at least let you set a custom concurrency limit or maybe to calculate a value in a smarter way

                    A 1 Reply Last reply Reply Quote 0
                    • A Offline
                      Andrew Top contributor @florent
                      last edited by

                      @florent Sure.... but delta backup merge S3 no longer works in master.

                      florentF 1 Reply Last reply Reply Quote 0
                      • florentF Offline
                        florent Vates 🪐 XO Team @Andrew
                        last edited by florent

                        @Andrew yes, we will reduce default concurrency while waiting for the parametrized version

                        here is the branch if you want to test it https://github.com/vatesfr/xen-orchestra/pull/6400

                        also , can you monitor the minio resource usage ? I'm curious to where is the bottleneck during a rename ( cpu / ram or disk usage)

                        fbeauchamp opened this pull request in vatesfr/xen-orchestra

                        closed fix(vhd-lib/merge): reduce concurrency to protect slower backends #6400

                        A 1 Reply Last reply Reply Quote 0
                        • A Offline
                          Andrew Top contributor @florent
                          last edited by

                          @florent Limiting concurrency did not fix my S3 backup problem, but it's working again after updating the build. So I guess it's resolved.

                          1 Reply Last reply Reply Quote 1
                          • First post
                            Last post