XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    backblaze b2 / amazon s3 as remote in xoa

    Scheduled Pinned Locked Moved Xen Orchestra
    59 Posts 9 Posters 15.3k Views 3 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • S Offline
      shorian @DustinB
      last edited by

      @dustinb Concur 100%; my current focus is on confirming the error doesn't reoccur and understanding the change in what I'm seeing compared with previous backups, before this goes into production I 100% agree it should be tested for restores. I shall do so myself once I've got confidence that the symptom has been resolved.

      1 Reply Last reply Reply Quote 1
      • S Offline
        shorian @nraynaud
        last edited by

        @nraynaud Unfortunately after a good start I'm now seeing AWS.S3.upload socket hang up errors:

                        "message": "Error calling AWS.S3.upload: socket hang up",
                        "name": "Error",
                        "stack": "Error: Error calling AWS.S3.upload: socket hang up\n    at rethrow (/opt/xo/xo-builds/xen-orchestra-202102171611/node_modules/@sullux/aws-sdk/webpack:/lib/proxy.js:114:1)\n    at tryCatcher (/opt/xo/xo-builds/xen-orchestra-202102171611/node_modules/bluebird/js/release/util.js:16:23)\n    at Promise._settlePromiseFromHandler (/opt/xo/xo-builds/xen-orchestra-202102171611/node_modules/bluebird/js/release/promise.js:547:31)\n    at Promise._settlePromise (/opt/xo/xo-builds/xen-orchestra-202102171611/node_modules/bluebird/js/release/promise.js:604:18)\n    at Promise._settlePromise0 (/opt/xo/xo-builds/xen-orchestra-202102171611/node_modules/bluebird/js/release/promise.js:649:10)\n    at Promise._settlePromises (/opt/xo/xo-builds/xen-orchestra-202102171611/node_modules/bluebird/js/release/promise.js:725:18)\n    at _drainQueueStep (/opt/xo/xo-builds/xen-orchestra-202102171611/node_modules/bluebird/js/release/async.js:93:12)\n    at _drainQueue (/opt/xo/xo-builds/xen-orchestra-202102171611/node_modules/bluebird/js/release/async.js:86:9)\n    at Async._drainQueues (/opt/xo/xo-builds/xen-orchestra-202102171611/node_modules/bluebird/js/release/async.js:102:5)\n    at Immediate.Async.drainQueues [as _onImmediate] (/opt/xo/xo-builds/xen-orchestra-202102171611/node_modules/bluebird/js/release/async.js:15:14)\n    at processImmediate (internal/timers.js:461:21)"
                      }
        

        This is not occurring for all the VMs being uploaded, usually only for one out of the three. The tasks then stays open and runs until the timeout after 3 hours, despite normally taking about 30 minutes for this particular batch.

        After 3 successful runs, this has now occurred each time on the following 3 runs. I am going to clear out the target completely and see if that makes any difference. (Note that I am using BackBlaze B2 not AWS.). Let me know if you want me to send you the full log or an extract of SMlog or anything else.

        To @DustinB's point above, I have tried one restore and it came up fine, but please don't consider this a full and comprehensive test.

        1 Reply Last reply Reply Quote 0
        • nraynaudN Offline
          nraynaud XCP-ng Team
          last edited by

          Thank you all. I would have never guessed that uploading a file over http would be this hard. I'll dig deeper.

          S 1 Reply Last reply Reply Quote 0
          • S Offline
            shorian @nraynaud
            last edited by

            @nraynaud Bizarre isn't it; I'm so very grateful for your efforts.

            Some more news - it seems that one big challenge is around concurrency - things improve dramatically if concurrency is set to 1. As soon as something else is running in parallel, we run into the socket failures. I'm expanding things to try your change on another box to see if the outcomes are different - but in summary what I'm seeing so far:

            • Concurrency = 1 - works fine first time, fails occasionally (20% of the time?) thereafter
            • Concurrency > 1 - almost impossible to get it to run, but sometimes one or two VMs backup ok but not enough to be predictable and never the entire group
            • Anything fails - impossible to get a clean run again until the S3 target has been cleaned entirely

            So it appears that somewhere there perhaps is a lock occurring when more than one stream is running, and additionally there's some kind of conflict when things have terminated prematurely and the target is therefore not in its expected state on the next run.

            nraynaudN 1 Reply Last reply Reply Quote 2
            • nraynaudN Offline
              nraynaud XCP-ng Team @shorian
              last edited by

              @shorian thanks, I'm a bit lost, I will read on the node.js Agent class.

              S 1 Reply Last reply Reply Quote 0
              • S Offline
                shorian @nraynaud
                last edited by

                @nraynaud Removing any concurrency seems to be effective; certainly a substantial improvement upon the original backup prior to your amendments.

                We have managed to get things to run pretty much every time now, by running with concurrency set to '1' and being careful on the timing to ensure no other backups accidentally run in parallel.

                Have checked a couple of restores and they seem to be ok too.

                Only thing I would highlight is that now I am not getting the failures, I cannot tell if the issue on the remote when recovering from a partial/failed backup is resolved. I guess this needs me to pull a plug on the network whilst back up is running but I would need to test this on a different machine in the lab rather than where we're running at the moment.

                S 1 Reply Last reply Reply Quote 0
                • S Offline
                  shorian @shorian
                  last edited by

                  Ok, spent weekend having backups running continuously across a number of boxes.

                  Good news - the fix seems to have solved things, providing one only ever uses concurrency β€œ1” and there are no conflicting or overlapping backups.

                  Restores are working fine for me too.

                  In short @nraynaud - it’s a substantial improvement and for me makes this now usable. A huge thank you.

                  nraynaudN 1 Reply Last reply Reply Quote 1
                  • olivierlambertO Offline
                    olivierlambert Vates πŸͺ Co-Founder CEO
                    last edited by

                    Thanks for the feeback @shorian !

                    1 Reply Last reply Reply Quote 1
                    • nraynaudN Offline
                      nraynaud XCP-ng Team @shorian
                      last edited by

                      @shorian I would like to abuse your patience again, by asking you to test this branch: https://github.com/vatesfr/xen-orchestra/tree/nr-s3-fix-big-backups2

                      The concept is that the backup upload will happen without any sort of smart upload system or queue.

                      Thank you, Nico

                      S 1 Reply Last reply Reply Quote 1
                      • S Offline
                        shorian @nraynaud
                        last edited by

                        @nraynaud Building XO now; will run a few backups overnight.

                        Anything in particular you'd like me to look for / test ?

                        nraynaudN 1 Reply Last reply Reply Quote 0
                        • nraynaudN Offline
                          nraynaud XCP-ng Team @shorian
                          last edited by

                          @shorian thanks. If you could just try no concurrency as a proof that the code work, then your normal concurrency, to check if the issue is fixed that would be great.

                          This code is meant to be more vulnerable to losing the connection to S3, but I think it's a low risk. It's now using straight HTTP POST.

                          S 1 Reply Last reply Reply Quote 1
                          • S Offline
                            shorian @nraynaud
                            last edited by

                            @nraynaud Bit of a hiccup I'm afraid ; errors out with "Not Implemented":

                            {
                              "data": {
                                "mode": "full",
                                "reportWhen": "failure"
                              },
                              "id": "1615570162008",
                              "jobId": "b947a31c-35f7-45d9-af88-628ec027c71e",
                              "jobName": "B2",
                              "message": "backup",
                              "scheduleId": "8481e1b8-3d22-4502-abb7-9f0413aca0a3",
                              "start": 1615570162008,
                              "status": "pending",
                              "infos": [
                                {
                                  "data": {
                                    "vms": [
                                      "25a4e2a6-b116-7cee-b94b-f3553fe001f2",
                                      "24103ce1-e47b-fe12-4029-d643e0382f08",
                                      "60ac886d-8adf-9e21-4baa-b14e0fc1b2bb"
                                    ]
                                  },
                                  "message": "vms"
                                }
                              ],
                              "tasks": [
                                {
                                  "data": {
                                    "type": "VM",
                                    "id": "25a4e2a6-b116-7cee-b94b-f3553fe001f2"
                                  },
                                  "id": "1615570162050",
                                  "message": "backup VM",
                                  "start": 1615570162050,
                                  "status": "pending",
                                  "tasks": [
                                    {
                                      "id": "1615570162064",
                                      "message": "snapshot",
                                      "start": 1615570162064,
                                      "status": "success",
                                      "end": 1615570163265,
                                      "result": "294d65fc-ddc7-b8ad-68b8-d14e36b6838e"
                                    },
                                    {
                                      "data": {
                                        "id": "df07dd6b-753e-40f2-89be-75404bac2c1e",
                                        "type": "remote",
                                        "isFull": true
                                      },
                                      "id": "1615570163286",
                                      "message": "export",
                                      "start": 1615570163286,
                                      "status": "failure",
                                      "tasks": [
                                        {
                                          "id": "1615570164234",
                                          "message": "transfer",
                                          "start": 1615570164234,
                                          "status": "failure",
                                          "end": 1615570164257,
                                          "result": {
                                            "message": "Not implemented",
                                            "name": "Error",
                                            "stack": "Error: Not implemented\n    at S3Handler._createWriteStream (/opt/xo/xo-builds/xen-orchestra-202103121606/@xen-orchestra/fs/src/abstract.js:428:11)\n    at S3Handler._createOutputStream (/opt/xo/xo-builds/xen-orchestra-202103121606/@xen-orchestra/fs/src/abstract.js:412:25)\n    at S3Handler.createOutputStream (/opt/xo/xo-builds/xen-orchestra-202103121606/@xen-orchestra/fs/src/abstract.js:129:12)\n    at RemoteAdapter.outputStream (/opt/xo/xo-builds/xen-orchestra-202103121606/@xen-orchestra/backups/RemoteAdapter.js:513:34)\n    at /opt/xo/xo-builds/xen-orchestra-202103121606/@xen-orchestra/backups/_FullBackupWriter.js:70:7\n    at FullBackupWriter.run (/opt/xo/xo-builds/xen-orchestra-202103121606/@xen-orchestra/backups/_FullBackupWriter.js:69:5)\n    at Array.<anonymous> (/opt/xo/xo-builds/xen-orchestra-202103121606/@xen-orchestra/backups/_VmBackup.js:209:9)"
                                          }
                                        }
                                      ],
                                      "end": 1615570164258,
                                      "result": {
                                        "message": "Not implemented",
                                        "name": "Error",
                                        "stack": "Error: Not implemented\n    at S3Handler._createWriteStream (/opt/xo/xo-builds/xen-orchestra-202103121606/@xen-orchestra/fs/src/abstract.js:428:11)\n    at S3Handler._createOutputStream (/opt/xo/xo-builds/xen-orchestra-202103121606/@xen-orchestra/fs/src/abstract.js:412:25)\n    at S3Handler.createOutputStream (/opt/xo/xo-builds/xen-orchestra-202103121606/@xen-orchestra/fs/src/abstract.js:129:12)\n    at RemoteAdapter.outputStream (/opt/xo/xo-builds/xen-orchestra-202103121606/@xen-orchestra/backups/RemoteAdapter.js:513:34)\n    at /opt/xo/xo-builds/xen-orchestra-202103121606/@xen-orchestra/backups/_FullBackupWriter.js:70:7\n    at FullBackupWriter.run (/opt/xo/xo-builds/xen-orchestra-202103121606/@xen-orchestra/backups/_FullBackupWriter.js:69:5)\n    at Array.<anonymous> (/opt/xo/xo-builds/xen-orchestra-202103121606/@xen-orchestra/backups/_VmBackup.js:209:9)"
                                      }
                                    }
                                  ]
                                }
                              ]
                            }
                            
                            

                            I'll try over the weekend with a fresh install. Backing up to Backblaze B2 rather than S3 - likely to be the issue?

                            nraynaudN 1 Reply Last reply Reply Quote 0
                            • nraynaudN Offline
                              nraynaud XCP-ng Team @shorian
                              last edited by

                              @shorian Ah sorry, I think S3 is completely broken at the moment.

                              S 1 Reply Last reply Reply Quote 2
                              • S Offline
                                shorian @nraynaud
                                last edited by

                                @nraynaud Getting the "not implemented" error on all branches at the moment - is there a functioning, albeit not perfect branch that we can run with as a temporary measure? Thanks chap

                                1 Reply Last reply Reply Quote 0
                                • Y Offline
                                  yomono
                                  last edited by

                                  Is this still broken on all branches?
                                  It's a shame, it was working relatively fine when used carefully

                                  1 Reply Last reply Reply Quote 0
                                  • olivierlambertO Offline
                                    olivierlambert Vates πŸͺ Co-Founder CEO
                                    last edited by

                                    @julien-f and @nraynaud are working on it. That's why it's still called beta πŸ˜‰ But we should have soon something on track.

                                    1 Reply Last reply Reply Quote 1
                                    • S Offline
                                      shorian
                                      last edited by

                                      New S3 function (albeit using Backblaze) seems to be working really well for small backups; I'm now seeing almost 100% success.

                                      However larger backups are consistently failing with

                                      "message": "Error calling AWS.S3.upload: aborted",
                                                  "name": "Error",
                                                  "stack": "Error: Error calling AWS.S3.upload: aborted\n    at rethrow (/opt/xo/xo-builds/xen-orchestra-202105081131/node_modules/@sullux/aws-sdk/index.js:254:24)\n    at runMicrotasks (<anonymous>)\n    at processTicksAndRejections (internal/process/task_queues.js:93:5)\n    at async S3Handler._outputStream (/opt/xo/xo-builds/xen-orchestra-202105081131/@xen-orchestra/fs/dist/s3.js:100:5)\n    at async S3Handler.outputStream (/opt/xo/xo-builds/xen-orchestra-202105081131/@xen-orchestra/fs/dist/abstract.js:250:5)\n    at async RemoteAdapter.outputStream (/opt/xo/xo-builds/xen-orchestra-202105081131/@xen-orchestra/backups/RemoteAdapter.js:509:5)\n    at async /opt/xo/xo-builds/xen-orchestra-202105081131/@xen-orchestra/backups/writers/FullBackupWriter.js:69:7"
                                      

                                      Running with xo-server 5.79.3 , xo-web 5.81.0

                                      I'm having the same issue across different buckets in B2 and from different servers.

                                      Any ideas?

                                      1 Reply Last reply Reply Quote 0
                                      • S Offline
                                        shorian
                                        last edited by shorian

                                        As an aside, using the same build I am also getting problems when running CR over a slower WAN, with the following error. Just mentioning it here in case it's related.

                                        "message": "VDI_IO_ERROR(Device I/O errors)",
                                                    "name": "XapiError",
                                                    "stack": "XapiError: VDI_IO_ERROR(Device I/O errors)\n    at Function.wrap (/opt/xo/xo-builds/xen-orchestra-202105081131/packages/xen-api/dist/_XapiError.js:26:12)\n    at _default (/opt/xo/xo-builds/xen-orchestra-202105081131/packages/xen-api/dist/_getTaskResult.js:24:38)\n    at Xapi._addRecordToCache (/opt/xo/xo-builds/xen-orchestra-202105081131/packages/xen-api/dist/index.js:761:51)\n    at /opt/xo/xo-builds/xen-orchestra-202105081131/packages/xen-api/dist/index.js:789:14\n    at Array.forEach (<anonymous>)\n    at Xapi._processEvents (/opt/xo/xo-builds/xen-orchestra-202105081131/packages/xen-api/dist/index.js:774:12)\n    at Xapi._watchEvents (/opt/xo/xo-builds/xen-orchestra-202105081131/packages/xen-api/dist/index.js:931:14)\n    at runMicrotasks (<anonymous>)\n    at processTicksAndRejections (internal/process/task_queues.js:93:5)"
                                                  }
                                                }
                                              ],
                                              "end": 1620864312824,
                                              "result": {
                                                "message": "all targets have failed, step: writer.transfer()",
                                                "name": "Error",
                                                "stack": "Error: all targets have failed, step: writer.transfer()\n    at VmBackup._callWriters (/opt/xo/xo-builds/xen-orchestra-202105081131/@xen-orchestra/backups/_VmBackup.js:118:13)\n    at runMicrotasks (<anonymous>)\n    at processTicksAndRejections (internal/process/task_queues.js:93:5)\n    at async VmBackup._copyDelta (/opt/xo/xo-builds/xen-orchestra-202105081131/@xen-orchestra/backups/_VmBackup.js:190:5)\n    at async VmBackup.run (/opt/xo/xo-builds/xen-orchestra-202105081131/@xen-orchestra/backups/_VmBackup.js:371:9)"
                                        

                                        CR to a local machine is working fine, the remote one is failing every iteration.

                                        1 Reply Last reply Reply Quote 0
                                        • olivierlambertO Offline
                                          olivierlambert Vates πŸͺ Co-Founder CEO
                                          last edited by

                                          Thanks for your feedback πŸ‘

                                          @julien-f is working on that πŸ™‚

                                          1 Reply Last reply Reply Quote 0
                                          • S Offline
                                            shorian
                                            last edited by

                                            πŸ‘ πŸ‘ πŸ‘

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post