XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    S3 backup broken

    Scheduled Pinned Locked Moved Xen Orchestra
    31 Posts 5 Posters 4.9k Views 6 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • A Offline
      Andrew Top contributor @florent
      last edited by

      @florent Still fails.

      {
        "data": {
          "mode": "delta",
          "reportWhen": "never"
        },
        "id": "1661943144343",
        "jobId": "d6c0a656-62c5-4c39-a57a-f246b39f1cef",
        "jobName": "minio-test",
        "message": "backup",
        "scheduleId": "bd4ef436-fd85-4f16-bf9e-71d1d0c8586f",
        "start": 1661943144343,
        "status": "failure",
        "infos": [
          {
            "data": {
              "vms": [
                "c45dd52b-fa92-df6f-800a-10853c183c23"
              ]
            },
            "message": "vms"
          }
        ],
        "tasks": [
          {
            "data": {
              "type": "VM",
              "id": "c45dd52b-fa92-df6f-800a-10853c183c23"
            },
            "id": "1661943145228:0",
            "message": "backup VM",
            "start": 1661943145228,
            "status": "failure",
            "tasks": [
              {
                "id": "1661943145611",
                "message": "clean-vm",
                "start": 1661943145611,
                "status": "failure",
                "tasks": [
                  {
                    "id": "1661943145847",
                    "message": "merge",
                    "start": 1661943145847,
                    "status": "failure",
                    "end": 1661943745931,
                    "result": {
                      "chain": [
                        "xo-vm-backups/c45dd52b-fa92-df6f-800a-10853c183c23/vdis/d6c0a656-62c5-4c39-a57a-f246b39f1cef/ae8fffde-b2bd-4205-a596-9139ef59193f/20220831T025925Z.alias.vhd",
                        "xo-vm-backups/c45dd52b-fa92-df6f-800a-10853c183c23/vdis/d6c0a656-62c5-4c39-a57a-f246b39f1cef/ae8fffde-b2bd-4205-a596-9139ef59193f/20220831T030535Z.alias.vhd"
                      ],
                      "message": "operation timed out",
                      "name": "TimeoutError",
                      "stack": "TimeoutError: operation timed out\n    at Promise.timeout (/opt/xo/xo-builds/xen-orchestra-202208310649/node_modules/promise-toolbox/timeout.js:11:16)\n    at S3Handler.rename (/opt/xo/xo-builds/xen-orchestra-202208310649/@xen-orchestra/fs/dist/abstract.js:338:37)\n    at Queue.next (/opt/xo/xo-builds/xen-orchestra-202208310649/node_modules/limit-concurrency-decorator/dist/index.js:21:22)\n    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)"
                    }
                  }
                ],
                "end": 1661943745932,
                "result": {
                  "chain": [
                    "xo-vm-backups/c45dd52b-fa92-df6f-800a-10853c183c23/vdis/d6c0a656-62c5-4c39-a57a-f246b39f1cef/ae8fffde-b2bd-4205-a596-9139ef59193f/20220831T025925Z.alias.vhd",
                    "xo-vm-backups/c45dd52b-fa92-df6f-800a-10853c183c23/vdis/d6c0a656-62c5-4c39-a57a-f246b39f1cef/ae8fffde-b2bd-4205-a596-9139ef59193f/20220831T030535Z.alias.vhd"
                  ],
                  "message": "operation timed out",
                  "name": "TimeoutError",
                  "stack": "TimeoutError: operation timed out\n    at Promise.timeout (/opt/xo/xo-builds/xen-orchestra-202208310649/node_modules/promise-toolbox/timeout.js:11:16)\n    at S3Handler.rename (/opt/xo/xo-builds/xen-orchestra-202208310649/@xen-orchestra/fs/dist/abstract.js:338:37)\n    at Queue.next (/opt/xo/xo-builds/xen-orchestra-202208310649/node_modules/limit-concurrency-decorator/dist/index.js:21:22)\n    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)"
                }
              },
              {
                "id": "1661943746504",
                "message": "snapshot",
                "start": 1661943746504,
                "status": "success",
                "end": 1661943748063,
                "result": "af87938d-8f55-e1e9-cb12-6d0954c1bb89"
              },
              {
                "data": {
                  "id": "9890e0c4-ba3a-4810-8245-a49fdf16b16e",
                  "isFull": false,
                  "type": "remote"
                },
                "id": "1661943748064",
                "message": "export",
                "start": 1661943748064,
                "status": "failure",
                "tasks": [
                  {
                    "id": "1661943748108",
                    "message": "transfer",
                    "start": 1661943748108,
                    "status": "success",
                    "end": 1661943774148,
                    "result": {
                      "size": 1600550912
                    }
                  },
                  {
                    "id": "1661943774635",
                    "message": "clean-vm",
                    "start": 1661943774635,
                    "status": "failure",
                    "tasks": [
                      {
                        "id": "1661943774868",
                        "message": "merge",
                        "start": 1661943774868,
                        "status": "failure",
                        "end": 1661944374948,
                        "result": {
                          "chain": [
                            "xo-vm-backups/c45dd52b-fa92-df6f-800a-10853c183c23/vdis/d6c0a656-62c5-4c39-a57a-f246b39f1cef/ae8fffde-b2bd-4205-a596-9139ef59193f/20220831T025925Z.alias.vhd",
                            "xo-vm-backups/c45dd52b-fa92-df6f-800a-10853c183c23/vdis/d6c0a656-62c5-4c39-a57a-f246b39f1cef/ae8fffde-b2bd-4205-a596-9139ef59193f/20220831T030535Z.alias.vhd"
                          ],
                          "message": "operation timed out",
                          "name": "TimeoutError",
                          "stack": "TimeoutError: operation timed out\n    at Promise.timeout (/opt/xo/xo-builds/xen-orchestra-202208310649/node_modules/promise-toolbox/timeout.js:11:16)\n    at S3Handler.rename (/opt/xo/xo-builds/xen-orchestra-202208310649/@xen-orchestra/fs/dist/abstract.js:338:37)\n    at Queue.next (/opt/xo/xo-builds/xen-orchestra-202208310649/node_modules/limit-concurrency-decorator/dist/index.js:21:22)\n    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)"
                        }
                      }
                    ],
                    "end": 1661944374950,
                    "result": {
                      "chain": [
                        "xo-vm-backups/c45dd52b-fa92-df6f-800a-10853c183c23/vdis/d6c0a656-62c5-4c39-a57a-f246b39f1cef/ae8fffde-b2bd-4205-a596-9139ef59193f/20220831T025925Z.alias.vhd",
                        "xo-vm-backups/c45dd52b-fa92-df6f-800a-10853c183c23/vdis/d6c0a656-62c5-4c39-a57a-f246b39f1cef/ae8fffde-b2bd-4205-a596-9139ef59193f/20220831T030535Z.alias.vhd"
                      ],
                      "message": "operation timed out",
                      "name": "TimeoutError",
                      "stack": "TimeoutError: operation timed out\n    at Promise.timeout (/opt/xo/xo-builds/xen-orchestra-202208310649/node_modules/promise-toolbox/timeout.js:11:16)\n    at S3Handler.rename (/opt/xo/xo-builds/xen-orchestra-202208310649/@xen-orchestra/fs/dist/abstract.js:338:37)\n    at Queue.next (/opt/xo/xo-builds/xen-orchestra-202208310649/node_modules/limit-concurrency-decorator/dist/index.js:21:22)\n    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)"
                    }
                  }
                ],
                "end": 1661944375065
              }
            ],
            "end": 1661944375065
          }
        ],
        "end": 1661944375066
      }
      
      O 1 Reply Last reply Reply Quote 0
      • O Offline
        odeawan @Andrew
        last edited by

        @Andrew I had a very similar issue over the last couple of weeks producing the same error.
        My coworker had been running similar S3 backups with success on the same ISP from a different site and then again at a site with Starlink. The only differences between the sites were the circuit and edge router.
        I dove a little deeper and found that I had misconfigured a new VLAN/subnet and had IDS touching xen-orchestra and was flagging the port 80 traffic outbound to S3 as malicious. I could see the firewall actively dropping the outbound packets in the packet filter. This would make the S3 backup stop after about 30 minutes and subsequent backups would fail.
        It may not be your exact issue, but this post struck a chord with me.

        A 1 Reply Last reply Reply Quote 0
        • A Offline
          Andrew Top contributor @odeawan
          last edited by

          @odeawan Thanks for the idea, but I know that's not the problem in this case. This test setup that fails is between a XO server and a S3 server on the same LAN, so no firewall... and it only fails during the delta merge phase. My main (older version) XO server that runs backups to off site S3 storage works fine (also to the same local S3 server).

          1 Reply Last reply Reply Quote 0
          • A Offline
            Andrew Top contributor @florent
            last edited by

            @florent I did a git update again (to commit aa261) and it works! Maybe I missed and update?,,,,

            {
              "data": {
                "mode": "delta",
                "reportWhen": "never"
              },
              "id": "1661987889299",
              "jobId": "d6c0a656-62c5-4c39-a57a-f246b39f1cef",
              "jobName": "minio-test",
              "message": "backup",
              "scheduleId": "bd4ef436-fd85-4f16-bf9e-71d1d0c8586f",
              "start": 1661987889299,
              "status": "success",
              "infos": [
                {
                  "data": {
                    "vms": [
                      "c45dd52b-fa92-df6f-800a-10853c183c23"
                    ]
                  },
                  "message": "vms"
                }
              ],
              "tasks": [
                {
                  "data": {
                    "type": "VM",
                    "id": "c45dd52b-fa92-df6f-800a-10853c183c23"
                  },
                  "id": "1661987890202",
                  "message": "backup VM",
                  "start": 1661987890202,
                  "status": "success",
                  "tasks": [
                    {
                      "id": "1661987890615",
                      "message": "clean-vm",
                      "start": 1661987890615,
                      "status": "success",
                      "end": 1661987891797,
                      "result": {
                        "merge": false
                      }
                    },
                    {
                      "id": "1661987891992",
                      "message": "snapshot",
                      "start": 1661987891992,
                      "status": "success",
                      "end": 1661987893470,
                      "result": "ec858b82-4f64-c5fe-a258-887ba57c7458"
                    },
                    {
                      "data": {
                        "id": "9890e0c4-ba3a-4810-8245-a49fdf16b16e",
                        "isFull": false,
                        "type": "remote"
                      },
                      "id": "1661987893471",
                      "message": "export",
                      "start": 1661987893471,
                      "status": "success",
                      "tasks": [
                        {
                          "id": "1661987893513",
                          "message": "transfer",
                          "start": 1661987893513,
                          "status": "success",
                          "end": 1661987897713,
                          "result": {
                            "size": 75549184
                          }
                        },
                        {
                          "id": "1661987898186",
                          "message": "clean-vm",
                          "start": 1661987898186,
                          "status": "success",
                          "tasks": [
                            {
                              "id": "1661987899091",
                              "message": "merge",
                              "start": 1661987899091,
                              "status": "success",
                              "end": 1661987902420
                            }
                          ],
                          "end": 1661987902474,
                          "result": {
                            "merge": true
                          }
                        }
                      ],
                      "end": 1661987902480
                    }
                  ],
                  "end": 1661987902480
                }
              ],
              "end": 1661987902480
            }
            
            florentF 1 Reply Last reply Reply Quote 0
            • florentF Offline
              florent Vates 🪐 XO Team @Andrew
              last edited by

              @Andrew the problem occurs only during resuming a merge

              Does your S3 provider apply some rate limits to queries ? We do a lot of copy / delete query during merge

              A 1 Reply Last reply Reply Quote 0
              • A Offline
                Andrew Top contributor @florent
                last edited by

                @florent For local testing I'm using MinIO and there is no limit or throttling enabled.

                florentF 1 Reply Last reply Reply Quote 0
                • florentF Offline
                  florent Vates 🪐 XO Team @Andrew
                  last edited by

                  @Andrew I am testing on another user with less concurrency during merge ( today there is 16 blocks merged in parallel) , and it seems to solve the problem

                  I will make it configurable soon ( today it's a parameter in the code 😰 )

                  A 2 Replies Last reply Reply Quote 0
                  • A Offline
                    Andrew Top contributor @florent
                    last edited by

                    @florent I see the fix-s3-merge branch is gone. I updated to current master (commit d8e01) and delta backup S3 is not working (same problem as before).

                    florentF 1 Reply Last reply Reply Quote 0
                    • A Offline
                      Andrew Top contributor @florent
                      last edited by

                      @olivierlambert @florent The s3.js program has different code between the fix-s3-merge branch and master.

                      1 Reply Last reply Reply Quote 0
                      • florentF Offline
                        florent Vates 🪐 XO Team @Andrew
                        last edited by

                        @Andrew back to the nosuchkey error ?

                        A 1 Reply Last reply Reply Quote 0
                        • A Offline
                          Andrew Top contributor @florent
                          last edited by

                          @florent 10 minute timeout...

                          {
                            "data": {
                              "mode": "delta",
                              "reportWhen": "never"
                            },
                            "id": "1662038805540",
                            "jobId": "d6c0a656-62c5-4c39-a57a-f246b39f1cef",
                            "jobName": "minio-test",
                            "message": "backup",
                            "scheduleId": "bd4ef436-fd85-4f16-bf9e-71d1d0c8586f",
                            "start": 1662038805540,
                            "status": "failure",
                            "infos": [
                              {
                                "data": {
                                  "vms": [
                                    "c45dd52b-fa92-df6f-800a-10853c183c23"
                                  ]
                                },
                                "message": "vms"
                              }
                            ],
                            "tasks": [
                              {
                                "data": {
                                  "type": "VM",
                                  "id": "c45dd52b-fa92-df6f-800a-10853c183c23"
                                },
                                "id": "1662038806415",
                                "message": "backup VM",
                                "start": 1662038806415,
                                "status": "failure",
                                "tasks": [
                                  {
                                    "id": "1662038806845",
                                    "message": "clean-vm",
                                    "start": 1662038806845,
                                    "status": "success",
                                    "end": 1662038807088,
                                    "result": {
                                      "merge": false
                                    }
                                  },
                                  {
                                    "id": "1662038807286",
                                    "message": "snapshot",
                                    "start": 1662038807286,
                                    "status": "success",
                                    "end": 1662038808752,
                                    "result": "03c16279-9af3-e35a-a7a0-7724e62c9cc2"
                                  },
                                  {
                                    "data": {
                                      "id": "9890e0c4-ba3a-4810-8245-a49fdf16b16e",
                                      "isFull": false,
                                      "type": "remote"
                                    },
                                    "id": "1662038808752:0",
                                    "message": "export",
                                    "start": 1662038808752,
                                    "status": "failure",
                                    "tasks": [
                                      {
                                        "id": "1662038808802",
                                        "message": "transfer",
                                        "start": 1662038808802,
                                        "status": "success",
                                        "end": 1662038832144,
                                        "result": {
                                          "size": 1558597632
                                        }
                                      },
                                      {
                                        "id": "1662038832622",
                                        "message": "clean-vm",
                                        "start": 1662038832622,
                                        "status": "failure",
                                        "tasks": [
                                          {
                                            "id": "1662038832809",
                                            "message": "merge",
                                            "start": 1662038832809,
                                            "status": "failure",
                                            "end": 1662039432907,
                                            "result": {
                                              "chain": [
                                                "/xo-vm-backups/c45dd52b-fa92-df6f-800a-10853c183c23/vdis/d6c0a656-62c5-4c39-a57a-f246b39f1cef/ae8fffde-b2bd-4205-a596-9139ef59193f/20220901T124634Z.alias.vhd",
                                                "/xo-vm-backups/c45dd52b-fa92-df6f-800a-10853c183c23/vdis/d6c0a656-62c5-4c39-a57a-f246b39f1cef/ae8fffde-b2bd-4205-a596-9139ef59193f/20220901T130323Z.alias.vhd"
                                              ],
                                              "message": "operation timed out",
                                              "name": "TimeoutError",
                                              "stack": "TimeoutError: operation timed out\n    at Promise.timeout (/opt/xo/xo-builds/xen-orchestra-202209010921/node_modules/promise-toolbox/timeout.js:11:16)\n    at S3Handler.rename (/opt/xo/xo-builds/xen-orchestra-202209010921/@xen-orchestra/fs/dist/abstract.js:338:37)\n    at Queue.next (/opt/xo/xo-builds/xen-orchestra-202209010921/node_modules/limit-concurrency-decorator/dist/index.js:21:22)\n    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)"
                                            }
                                          }
                                        ],
                                        "end": 1662039432908,
                                        "result": {
                                          "chain": [
                                            "/xo-vm-backups/c45dd52b-fa92-df6f-800a-10853c183c23/vdis/d6c0a656-62c5-4c39-a57a-f246b39f1cef/ae8fffde-b2bd-4205-a596-9139ef59193f/20220901T124634Z.alias.vhd",
                                            "/xo-vm-backups/c45dd52b-fa92-df6f-800a-10853c183c23/vdis/d6c0a656-62c5-4c39-a57a-f246b39f1cef/ae8fffde-b2bd-4205-a596-9139ef59193f/20220901T130323Z.alias.vhd"
                                          ],
                                          "message": "operation timed out",
                                          "name": "TimeoutError",
                                          "stack": "TimeoutError: operation timed out\n    at Promise.timeout (/opt/xo/xo-builds/xen-orchestra-202209010921/node_modules/promise-toolbox/timeout.js:11:16)\n    at S3Handler.rename (/opt/xo/xo-builds/xen-orchestra-202209010921/@xen-orchestra/fs/dist/abstract.js:338:37)\n    at Queue.next (/opt/xo/xo-builds/xen-orchestra-202209010921/node_modules/limit-concurrency-decorator/dist/index.js:21:22)\n    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)"
                                        }
                                      }
                                    ],
                                    "end": 1662039433046
                                  }
                                ],
                                "end": 1662039433046
                              }
                            ],
                            "end": 1662039433047
                          }
                          
                          florentF 1 Reply Last reply Reply Quote 0
                          • florentF Offline
                            florent Vates 🪐 XO Team @Andrew
                            last edited by

                            @Andrew yes The timeout are not fixed for now, only the nosuchkey

                            the fix will allow you to at least let you set a custom concurrency limit or maybe to calculate a value in a smarter way

                            A 1 Reply Last reply Reply Quote 0
                            • A Offline
                              Andrew Top contributor @florent
                              last edited by

                              @florent Sure.... but delta backup merge S3 no longer works in master.

                              florentF 1 Reply Last reply Reply Quote 0
                              • florentF Offline
                                florent Vates 🪐 XO Team @Andrew
                                last edited by florent

                                @Andrew yes, we will reduce default concurrency while waiting for the parametrized version

                                here is the branch if you want to test it https://github.com/vatesfr/xen-orchestra/pull/6400

                                also , can you monitor the minio resource usage ? I'm curious to where is the bottleneck during a rename ( cpu / ram or disk usage)

                                fbeauchamp opened this pull request in vatesfr/xen-orchestra

                                closed fix(vhd-lib/merge): reduce concurrency to protect slower backends #6400

                                A 1 Reply Last reply Reply Quote 0
                                • A Offline
                                  Andrew Top contributor @florent
                                  last edited by

                                  @florent Limiting concurrency did not fix my S3 backup problem, but it's working again after updating the build. So I guess it's resolved.

                                  1 Reply Last reply Reply Quote 1
                                  • First post
                                    Last post