XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    S3 backup broken

    Scheduled Pinned Locked Moved Xen Orchestra
    31 Posts 5 Posters 4.9k Views 6 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • A Offline
      Andrew Top contributor @odeawan
      last edited by

      @odeawan Thanks for the idea, but I know that's not the problem in this case. This test setup that fails is between a XO server and a S3 server on the same LAN, so no firewall... and it only fails during the delta merge phase. My main (older version) XO server that runs backups to off site S3 storage works fine (also to the same local S3 server).

      1 Reply Last reply Reply Quote 0
      • A Offline
        Andrew Top contributor @florent
        last edited by

        @florent I did a git update again (to commit aa261) and it works! Maybe I missed and update?,,,,

        {
          "data": {
            "mode": "delta",
            "reportWhen": "never"
          },
          "id": "1661987889299",
          "jobId": "d6c0a656-62c5-4c39-a57a-f246b39f1cef",
          "jobName": "minio-test",
          "message": "backup",
          "scheduleId": "bd4ef436-fd85-4f16-bf9e-71d1d0c8586f",
          "start": 1661987889299,
          "status": "success",
          "infos": [
            {
              "data": {
                "vms": [
                  "c45dd52b-fa92-df6f-800a-10853c183c23"
                ]
              },
              "message": "vms"
            }
          ],
          "tasks": [
            {
              "data": {
                "type": "VM",
                "id": "c45dd52b-fa92-df6f-800a-10853c183c23"
              },
              "id": "1661987890202",
              "message": "backup VM",
              "start": 1661987890202,
              "status": "success",
              "tasks": [
                {
                  "id": "1661987890615",
                  "message": "clean-vm",
                  "start": 1661987890615,
                  "status": "success",
                  "end": 1661987891797,
                  "result": {
                    "merge": false
                  }
                },
                {
                  "id": "1661987891992",
                  "message": "snapshot",
                  "start": 1661987891992,
                  "status": "success",
                  "end": 1661987893470,
                  "result": "ec858b82-4f64-c5fe-a258-887ba57c7458"
                },
                {
                  "data": {
                    "id": "9890e0c4-ba3a-4810-8245-a49fdf16b16e",
                    "isFull": false,
                    "type": "remote"
                  },
                  "id": "1661987893471",
                  "message": "export",
                  "start": 1661987893471,
                  "status": "success",
                  "tasks": [
                    {
                      "id": "1661987893513",
                      "message": "transfer",
                      "start": 1661987893513,
                      "status": "success",
                      "end": 1661987897713,
                      "result": {
                        "size": 75549184
                      }
                    },
                    {
                      "id": "1661987898186",
                      "message": "clean-vm",
                      "start": 1661987898186,
                      "status": "success",
                      "tasks": [
                        {
                          "id": "1661987899091",
                          "message": "merge",
                          "start": 1661987899091,
                          "status": "success",
                          "end": 1661987902420
                        }
                      ],
                      "end": 1661987902474,
                      "result": {
                        "merge": true
                      }
                    }
                  ],
                  "end": 1661987902480
                }
              ],
              "end": 1661987902480
            }
          ],
          "end": 1661987902480
        }
        
        florentF 1 Reply Last reply Reply Quote 0
        • florentF Offline
          florent Vates 🪐 XO Team @Andrew
          last edited by

          @Andrew the problem occurs only during resuming a merge

          Does your S3 provider apply some rate limits to queries ? We do a lot of copy / delete query during merge

          A 1 Reply Last reply Reply Quote 0
          • A Offline
            Andrew Top contributor @florent
            last edited by

            @florent For local testing I'm using MinIO and there is no limit or throttling enabled.

            florentF 1 Reply Last reply Reply Quote 0
            • florentF Offline
              florent Vates 🪐 XO Team @Andrew
              last edited by

              @Andrew I am testing on another user with less concurrency during merge ( today there is 16 blocks merged in parallel) , and it seems to solve the problem

              I will make it configurable soon ( today it's a parameter in the code 😰 )

              A 2 Replies Last reply Reply Quote 0
              • A Offline
                Andrew Top contributor @florent
                last edited by

                @florent I see the fix-s3-merge branch is gone. I updated to current master (commit d8e01) and delta backup S3 is not working (same problem as before).

                florentF 1 Reply Last reply Reply Quote 0
                • A Offline
                  Andrew Top contributor @florent
                  last edited by

                  @olivierlambert @florent The s3.js program has different code between the fix-s3-merge branch and master.

                  1 Reply Last reply Reply Quote 0
                  • florentF Offline
                    florent Vates 🪐 XO Team @Andrew
                    last edited by

                    @Andrew back to the nosuchkey error ?

                    A 1 Reply Last reply Reply Quote 0
                    • A Offline
                      Andrew Top contributor @florent
                      last edited by

                      @florent 10 minute timeout...

                      {
                        "data": {
                          "mode": "delta",
                          "reportWhen": "never"
                        },
                        "id": "1662038805540",
                        "jobId": "d6c0a656-62c5-4c39-a57a-f246b39f1cef",
                        "jobName": "minio-test",
                        "message": "backup",
                        "scheduleId": "bd4ef436-fd85-4f16-bf9e-71d1d0c8586f",
                        "start": 1662038805540,
                        "status": "failure",
                        "infos": [
                          {
                            "data": {
                              "vms": [
                                "c45dd52b-fa92-df6f-800a-10853c183c23"
                              ]
                            },
                            "message": "vms"
                          }
                        ],
                        "tasks": [
                          {
                            "data": {
                              "type": "VM",
                              "id": "c45dd52b-fa92-df6f-800a-10853c183c23"
                            },
                            "id": "1662038806415",
                            "message": "backup VM",
                            "start": 1662038806415,
                            "status": "failure",
                            "tasks": [
                              {
                                "id": "1662038806845",
                                "message": "clean-vm",
                                "start": 1662038806845,
                                "status": "success",
                                "end": 1662038807088,
                                "result": {
                                  "merge": false
                                }
                              },
                              {
                                "id": "1662038807286",
                                "message": "snapshot",
                                "start": 1662038807286,
                                "status": "success",
                                "end": 1662038808752,
                                "result": "03c16279-9af3-e35a-a7a0-7724e62c9cc2"
                              },
                              {
                                "data": {
                                  "id": "9890e0c4-ba3a-4810-8245-a49fdf16b16e",
                                  "isFull": false,
                                  "type": "remote"
                                },
                                "id": "1662038808752:0",
                                "message": "export",
                                "start": 1662038808752,
                                "status": "failure",
                                "tasks": [
                                  {
                                    "id": "1662038808802",
                                    "message": "transfer",
                                    "start": 1662038808802,
                                    "status": "success",
                                    "end": 1662038832144,
                                    "result": {
                                      "size": 1558597632
                                    }
                                  },
                                  {
                                    "id": "1662038832622",
                                    "message": "clean-vm",
                                    "start": 1662038832622,
                                    "status": "failure",
                                    "tasks": [
                                      {
                                        "id": "1662038832809",
                                        "message": "merge",
                                        "start": 1662038832809,
                                        "status": "failure",
                                        "end": 1662039432907,
                                        "result": {
                                          "chain": [
                                            "/xo-vm-backups/c45dd52b-fa92-df6f-800a-10853c183c23/vdis/d6c0a656-62c5-4c39-a57a-f246b39f1cef/ae8fffde-b2bd-4205-a596-9139ef59193f/20220901T124634Z.alias.vhd",
                                            "/xo-vm-backups/c45dd52b-fa92-df6f-800a-10853c183c23/vdis/d6c0a656-62c5-4c39-a57a-f246b39f1cef/ae8fffde-b2bd-4205-a596-9139ef59193f/20220901T130323Z.alias.vhd"
                                          ],
                                          "message": "operation timed out",
                                          "name": "TimeoutError",
                                          "stack": "TimeoutError: operation timed out\n    at Promise.timeout (/opt/xo/xo-builds/xen-orchestra-202209010921/node_modules/promise-toolbox/timeout.js:11:16)\n    at S3Handler.rename (/opt/xo/xo-builds/xen-orchestra-202209010921/@xen-orchestra/fs/dist/abstract.js:338:37)\n    at Queue.next (/opt/xo/xo-builds/xen-orchestra-202209010921/node_modules/limit-concurrency-decorator/dist/index.js:21:22)\n    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)"
                                        }
                                      }
                                    ],
                                    "end": 1662039432908,
                                    "result": {
                                      "chain": [
                                        "/xo-vm-backups/c45dd52b-fa92-df6f-800a-10853c183c23/vdis/d6c0a656-62c5-4c39-a57a-f246b39f1cef/ae8fffde-b2bd-4205-a596-9139ef59193f/20220901T124634Z.alias.vhd",
                                        "/xo-vm-backups/c45dd52b-fa92-df6f-800a-10853c183c23/vdis/d6c0a656-62c5-4c39-a57a-f246b39f1cef/ae8fffde-b2bd-4205-a596-9139ef59193f/20220901T130323Z.alias.vhd"
                                      ],
                                      "message": "operation timed out",
                                      "name": "TimeoutError",
                                      "stack": "TimeoutError: operation timed out\n    at Promise.timeout (/opt/xo/xo-builds/xen-orchestra-202209010921/node_modules/promise-toolbox/timeout.js:11:16)\n    at S3Handler.rename (/opt/xo/xo-builds/xen-orchestra-202209010921/@xen-orchestra/fs/dist/abstract.js:338:37)\n    at Queue.next (/opt/xo/xo-builds/xen-orchestra-202209010921/node_modules/limit-concurrency-decorator/dist/index.js:21:22)\n    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)"
                                    }
                                  }
                                ],
                                "end": 1662039433046
                              }
                            ],
                            "end": 1662039433046
                          }
                        ],
                        "end": 1662039433047
                      }
                      
                      florentF 1 Reply Last reply Reply Quote 0
                      • florentF Offline
                        florent Vates 🪐 XO Team @Andrew
                        last edited by

                        @Andrew yes The timeout are not fixed for now, only the nosuchkey

                        the fix will allow you to at least let you set a custom concurrency limit or maybe to calculate a value in a smarter way

                        A 1 Reply Last reply Reply Quote 0
                        • A Offline
                          Andrew Top contributor @florent
                          last edited by

                          @florent Sure.... but delta backup merge S3 no longer works in master.

                          florentF 1 Reply Last reply Reply Quote 0
                          • florentF Offline
                            florent Vates 🪐 XO Team @Andrew
                            last edited by florent

                            @Andrew yes, we will reduce default concurrency while waiting for the parametrized version

                            here is the branch if you want to test it https://github.com/vatesfr/xen-orchestra/pull/6400

                            also , can you monitor the minio resource usage ? I'm curious to where is the bottleneck during a rename ( cpu / ram or disk usage)

                            fbeauchamp opened this pull request in vatesfr/xen-orchestra

                            closed fix(vhd-lib/merge): reduce concurrency to protect slower backends #6400

                            A 1 Reply Last reply Reply Quote 0
                            • A Offline
                              Andrew Top contributor @florent
                              last edited by

                              @florent Limiting concurrency did not fix my S3 backup problem, but it's working again after updating the build. So I guess it's resolved.

                              1 Reply Last reply Reply Quote 1
                              • First post
                                Last post