XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    S3 backup broken

    Scheduled Pinned Locked Moved Xen Orchestra
    31 Posts 5 Posters 4.9k Views 6 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • O Offline
      odeawan @Andrew
      last edited by

      @Andrew I had a very similar issue over the last couple of weeks producing the same error.
      My coworker had been running similar S3 backups with success on the same ISP from a different site and then again at a site with Starlink. The only differences between the sites were the circuit and edge router.
      I dove a little deeper and found that I had misconfigured a new VLAN/subnet and had IDS touching xen-orchestra and was flagging the port 80 traffic outbound to S3 as malicious. I could see the firewall actively dropping the outbound packets in the packet filter. This would make the S3 backup stop after about 30 minutes and subsequent backups would fail.
      It may not be your exact issue, but this post struck a chord with me.

      A 1 Reply Last reply Reply Quote 0
      • A Offline
        Andrew Top contributor @odeawan
        last edited by

        @odeawan Thanks for the idea, but I know that's not the problem in this case. This test setup that fails is between a XO server and a S3 server on the same LAN, so no firewall... and it only fails during the delta merge phase. My main (older version) XO server that runs backups to off site S3 storage works fine (also to the same local S3 server).

        1 Reply Last reply Reply Quote 0
        • A Offline
          Andrew Top contributor @florent
          last edited by

          @florent I did a git update again (to commit aa261) and it works! Maybe I missed and update?,,,,

          {
            "data": {
              "mode": "delta",
              "reportWhen": "never"
            },
            "id": "1661987889299",
            "jobId": "d6c0a656-62c5-4c39-a57a-f246b39f1cef",
            "jobName": "minio-test",
            "message": "backup",
            "scheduleId": "bd4ef436-fd85-4f16-bf9e-71d1d0c8586f",
            "start": 1661987889299,
            "status": "success",
            "infos": [
              {
                "data": {
                  "vms": [
                    "c45dd52b-fa92-df6f-800a-10853c183c23"
                  ]
                },
                "message": "vms"
              }
            ],
            "tasks": [
              {
                "data": {
                  "type": "VM",
                  "id": "c45dd52b-fa92-df6f-800a-10853c183c23"
                },
                "id": "1661987890202",
                "message": "backup VM",
                "start": 1661987890202,
                "status": "success",
                "tasks": [
                  {
                    "id": "1661987890615",
                    "message": "clean-vm",
                    "start": 1661987890615,
                    "status": "success",
                    "end": 1661987891797,
                    "result": {
                      "merge": false
                    }
                  },
                  {
                    "id": "1661987891992",
                    "message": "snapshot",
                    "start": 1661987891992,
                    "status": "success",
                    "end": 1661987893470,
                    "result": "ec858b82-4f64-c5fe-a258-887ba57c7458"
                  },
                  {
                    "data": {
                      "id": "9890e0c4-ba3a-4810-8245-a49fdf16b16e",
                      "isFull": false,
                      "type": "remote"
                    },
                    "id": "1661987893471",
                    "message": "export",
                    "start": 1661987893471,
                    "status": "success",
                    "tasks": [
                      {
                        "id": "1661987893513",
                        "message": "transfer",
                        "start": 1661987893513,
                        "status": "success",
                        "end": 1661987897713,
                        "result": {
                          "size": 75549184
                        }
                      },
                      {
                        "id": "1661987898186",
                        "message": "clean-vm",
                        "start": 1661987898186,
                        "status": "success",
                        "tasks": [
                          {
                            "id": "1661987899091",
                            "message": "merge",
                            "start": 1661987899091,
                            "status": "success",
                            "end": 1661987902420
                          }
                        ],
                        "end": 1661987902474,
                        "result": {
                          "merge": true
                        }
                      }
                    ],
                    "end": 1661987902480
                  }
                ],
                "end": 1661987902480
              }
            ],
            "end": 1661987902480
          }
          
          florentF 1 Reply Last reply Reply Quote 0
          • florentF Offline
            florent Vates 🪐 XO Team @Andrew
            last edited by

            @Andrew the problem occurs only during resuming a merge

            Does your S3 provider apply some rate limits to queries ? We do a lot of copy / delete query during merge

            A 1 Reply Last reply Reply Quote 0
            • A Offline
              Andrew Top contributor @florent
              last edited by

              @florent For local testing I'm using MinIO and there is no limit or throttling enabled.

              florentF 1 Reply Last reply Reply Quote 0
              • florentF Offline
                florent Vates 🪐 XO Team @Andrew
                last edited by

                @Andrew I am testing on another user with less concurrency during merge ( today there is 16 blocks merged in parallel) , and it seems to solve the problem

                I will make it configurable soon ( today it's a parameter in the code 😰 )

                A 2 Replies Last reply Reply Quote 0
                • A Offline
                  Andrew Top contributor @florent
                  last edited by

                  @florent I see the fix-s3-merge branch is gone. I updated to current master (commit d8e01) and delta backup S3 is not working (same problem as before).

                  florentF 1 Reply Last reply Reply Quote 0
                  • A Offline
                    Andrew Top contributor @florent
                    last edited by

                    @olivierlambert @florent The s3.js program has different code between the fix-s3-merge branch and master.

                    1 Reply Last reply Reply Quote 0
                    • florentF Offline
                      florent Vates 🪐 XO Team @Andrew
                      last edited by

                      @Andrew back to the nosuchkey error ?

                      A 1 Reply Last reply Reply Quote 0
                      • A Offline
                        Andrew Top contributor @florent
                        last edited by

                        @florent 10 minute timeout...

                        {
                          "data": {
                            "mode": "delta",
                            "reportWhen": "never"
                          },
                          "id": "1662038805540",
                          "jobId": "d6c0a656-62c5-4c39-a57a-f246b39f1cef",
                          "jobName": "minio-test",
                          "message": "backup",
                          "scheduleId": "bd4ef436-fd85-4f16-bf9e-71d1d0c8586f",
                          "start": 1662038805540,
                          "status": "failure",
                          "infos": [
                            {
                              "data": {
                                "vms": [
                                  "c45dd52b-fa92-df6f-800a-10853c183c23"
                                ]
                              },
                              "message": "vms"
                            }
                          ],
                          "tasks": [
                            {
                              "data": {
                                "type": "VM",
                                "id": "c45dd52b-fa92-df6f-800a-10853c183c23"
                              },
                              "id": "1662038806415",
                              "message": "backup VM",
                              "start": 1662038806415,
                              "status": "failure",
                              "tasks": [
                                {
                                  "id": "1662038806845",
                                  "message": "clean-vm",
                                  "start": 1662038806845,
                                  "status": "success",
                                  "end": 1662038807088,
                                  "result": {
                                    "merge": false
                                  }
                                },
                                {
                                  "id": "1662038807286",
                                  "message": "snapshot",
                                  "start": 1662038807286,
                                  "status": "success",
                                  "end": 1662038808752,
                                  "result": "03c16279-9af3-e35a-a7a0-7724e62c9cc2"
                                },
                                {
                                  "data": {
                                    "id": "9890e0c4-ba3a-4810-8245-a49fdf16b16e",
                                    "isFull": false,
                                    "type": "remote"
                                  },
                                  "id": "1662038808752:0",
                                  "message": "export",
                                  "start": 1662038808752,
                                  "status": "failure",
                                  "tasks": [
                                    {
                                      "id": "1662038808802",
                                      "message": "transfer",
                                      "start": 1662038808802,
                                      "status": "success",
                                      "end": 1662038832144,
                                      "result": {
                                        "size": 1558597632
                                      }
                                    },
                                    {
                                      "id": "1662038832622",
                                      "message": "clean-vm",
                                      "start": 1662038832622,
                                      "status": "failure",
                                      "tasks": [
                                        {
                                          "id": "1662038832809",
                                          "message": "merge",
                                          "start": 1662038832809,
                                          "status": "failure",
                                          "end": 1662039432907,
                                          "result": {
                                            "chain": [
                                              "/xo-vm-backups/c45dd52b-fa92-df6f-800a-10853c183c23/vdis/d6c0a656-62c5-4c39-a57a-f246b39f1cef/ae8fffde-b2bd-4205-a596-9139ef59193f/20220901T124634Z.alias.vhd",
                                              "/xo-vm-backups/c45dd52b-fa92-df6f-800a-10853c183c23/vdis/d6c0a656-62c5-4c39-a57a-f246b39f1cef/ae8fffde-b2bd-4205-a596-9139ef59193f/20220901T130323Z.alias.vhd"
                                            ],
                                            "message": "operation timed out",
                                            "name": "TimeoutError",
                                            "stack": "TimeoutError: operation timed out\n    at Promise.timeout (/opt/xo/xo-builds/xen-orchestra-202209010921/node_modules/promise-toolbox/timeout.js:11:16)\n    at S3Handler.rename (/opt/xo/xo-builds/xen-orchestra-202209010921/@xen-orchestra/fs/dist/abstract.js:338:37)\n    at Queue.next (/opt/xo/xo-builds/xen-orchestra-202209010921/node_modules/limit-concurrency-decorator/dist/index.js:21:22)\n    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)"
                                          }
                                        }
                                      ],
                                      "end": 1662039432908,
                                      "result": {
                                        "chain": [
                                          "/xo-vm-backups/c45dd52b-fa92-df6f-800a-10853c183c23/vdis/d6c0a656-62c5-4c39-a57a-f246b39f1cef/ae8fffde-b2bd-4205-a596-9139ef59193f/20220901T124634Z.alias.vhd",
                                          "/xo-vm-backups/c45dd52b-fa92-df6f-800a-10853c183c23/vdis/d6c0a656-62c5-4c39-a57a-f246b39f1cef/ae8fffde-b2bd-4205-a596-9139ef59193f/20220901T130323Z.alias.vhd"
                                        ],
                                        "message": "operation timed out",
                                        "name": "TimeoutError",
                                        "stack": "TimeoutError: operation timed out\n    at Promise.timeout (/opt/xo/xo-builds/xen-orchestra-202209010921/node_modules/promise-toolbox/timeout.js:11:16)\n    at S3Handler.rename (/opt/xo/xo-builds/xen-orchestra-202209010921/@xen-orchestra/fs/dist/abstract.js:338:37)\n    at Queue.next (/opt/xo/xo-builds/xen-orchestra-202209010921/node_modules/limit-concurrency-decorator/dist/index.js:21:22)\n    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)"
                                      }
                                    }
                                  ],
                                  "end": 1662039433046
                                }
                              ],
                              "end": 1662039433046
                            }
                          ],
                          "end": 1662039433047
                        }
                        
                        florentF 1 Reply Last reply Reply Quote 0
                        • florentF Offline
                          florent Vates 🪐 XO Team @Andrew
                          last edited by

                          @Andrew yes The timeout are not fixed for now, only the nosuchkey

                          the fix will allow you to at least let you set a custom concurrency limit or maybe to calculate a value in a smarter way

                          A 1 Reply Last reply Reply Quote 0
                          • A Offline
                            Andrew Top contributor @florent
                            last edited by

                            @florent Sure.... but delta backup merge S3 no longer works in master.

                            florentF 1 Reply Last reply Reply Quote 0
                            • florentF Offline
                              florent Vates 🪐 XO Team @Andrew
                              last edited by florent

                              @Andrew yes, we will reduce default concurrency while waiting for the parametrized version

                              here is the branch if you want to test it https://github.com/vatesfr/xen-orchestra/pull/6400

                              also , can you monitor the minio resource usage ? I'm curious to where is the bottleneck during a rename ( cpu / ram or disk usage)

                              fbeauchamp opened this pull request in vatesfr/xen-orchestra

                              closed fix(vhd-lib/merge): reduce concurrency to protect slower backends #6400

                              A 1 Reply Last reply Reply Quote 0
                              • A Offline
                                Andrew Top contributor @florent
                                last edited by

                                @florent Limiting concurrency did not fix my S3 backup problem, but it's working again after updating the build. So I guess it's resolved.

                                1 Reply Last reply Reply Quote 1
                                • First post
                                  Last post