Navigation

    XCP-ng

    • Register
    • Login
    • Search
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    1. Home
    2. shorian
    S
    • Profile
    • Following 0
    • Followers 0
    • Topics 2
    • Posts 52
    • Best 8
    • Groups 0

    shorian

    @shorian

    9
    Reputation
    2417
    Profile views
    52
    Posts
    0
    Followers
    0
    Following
    Joined Last Online

    shorian Follow

    Best posts made by shorian

    • RE: backblaze b2 / amazon s3 as remote in xoa

      @nraynaud Bizarre isn't it; I'm so very grateful for your efforts.

      Some more news - it seems that one big challenge is around concurrency - things improve dramatically if concurrency is set to 1. As soon as something else is running in parallel, we run into the socket failures. I'm expanding things to try your change on another box to see if the outcomes are different - but in summary what I'm seeing so far:

      • Concurrency = 1 - works fine first time, fails occasionally (20% of the time?) thereafter
      • Concurrency > 1 - almost impossible to get it to run, but sometimes one or two VMs backup ok but not enough to be predictable and never the entire group
      • Anything fails - impossible to get a clean run again until the S3 target has been cleaned entirely

      So it appears that somewhere there perhaps is a lock occurring when more than one stream is running, and additionally there's some kind of conflict when things have terminated prematurely and the target is therefore not in its expected state on the next run.

      posted in Xen Orchestra
      S
      shorian
    • RE: Hosted and turnkey XCP-ng!

      Always πŸ˜„

      posted in News
      S
      shorian
    • RE: backblaze b2 / amazon s3 as remote in xoa

      Ok, spent weekend having backups running continuously across a number of boxes.

      Good news - the fix seems to have solved things, providing one only ever uses concurrency β€œ1” and there are no conflicting or overlapping backups.

      Restores are working fine for me too.

      In short @nraynaud - it’s a substantial improvement and for me makes this now usable. A huge thank you.

      posted in Xen Orchestra
      S
      shorian
    • RE: backblaze b2 / amazon s3 as remote in xoa

      @dustinb Concur 100%; my current focus is on confirming the error doesn't reoccur and understanding the change in what I'm seeing compared with previous backups, before this goes into production I 100% agree it should be tested for restores. I shall do so myself once I've got confidence that the symptom has been resolved.

      posted in Xen Orchestra
      S
      shorian
    • RE: VDI_IO_ERROR(Device I/O errors) when you run scheduled backup

      I'll try a fresh install over the next couple of days and see if it reoccurs. Looking at the other boxes, I have the same error on one of the other hosts, but it's not across all of them despite identical installs and hardware.

      Thanks for your efforts and help, shame there wasn't an easy answer but let's see if it reoccurs after a completely fresh install.

      posted in Xen Orchestra
      S
      shorian
    • RE: VDI_IO_ERROR(Device I/O errors) when you run scheduled backup

      @stormi I confess we're now encountering the same error message on nearly all our backups, including CR to a local host. Started from fresh install and cleaned SRs; to avoid memory being the culprit we have upped the memory for Dom0 to 16gb (128gb machine) and XO is running with 16gb of mem of which 12gb is allocated to node.

      We've got the same problem occurring across all our hosts. Over 90% of backups error out with the VDI_IO_ERROR, however (weirdly) looking at the target end, I'd say that 75% of the backups 'seem' to complete successfully. Need to restore a couple to find out for sure but confess I've been concentrating on finding out what triggers the error rather than whether it is misleading.

      I've gone through the logs in detail and unfortunately nothing jumps out, I'm going to take time to extract the relevant sections from them all to see if you can see something that I can't, but apart from lots of memory messages from squeezed there aren't any obvious errors.

      Bizarrely SMlog is pretty clean - it's almost like it receives a termination signal from somewhere rather than erroring out of its own accord - for example tapdisk shuts down with "Closing after 0 errors" and no further explanation. I have found some talk that tapdisk can trigger a target reset after excessive i/o activity but I've not managed to prove that yet.

      I'll keep digging into things ; in short it's not something only experienced by @fachex but I haven't yet recreated it in XOA - it's on my todo list.

      (If you want me to download all the logs and send them across directly, or to do anything under the covers of XOA, please do let me know. I'm dipping into this when I get time so its not a continuous effort I'm afraid.)

      posted in Xen Orchestra
      S
      shorian
    • RE: backblaze b2 / amazon s3 as remote in xoa

      Update to my earlier post - We found the connection timeout issue was solved by allocating more memory to XO. Even though above I said that the memory didn't appear to be a problem, it turned out that Debian was swapping out so as to keep a chunk of free memory available, so we mistakenly assumed that not using all the memory meant we had sufficient. However being memory restricted combined with a slow disk meant that the swap was growing faster than it was being processed.

      Substantially increasing the XO VM memory (4->16gb) seems to have solved the timeout issues (so yes, root cause was user error), and we're now finding that the S3 api to B2 (a lot cheaper than Amazon) is working really well for us.

      Well done to the XO dev team - the S3 api has completely changed how we use backups and freed up a lot infrastructure that we previously had had to dedicate to this; thank you πŸ™‚

      posted in Xen Orchestra
      S
      shorian
    • RE: DR backup - Error: IMPORT_ERROR_PREMATURE_EOF()

      For anyone else who comes across the same issue, we had this occur with XCP-ng 8.2 , XO-server 5.71.2 which we isolated to using Zstd compression. It was solved by reducing concurrency and assigning a couple of extra vCPUs.

      In response to @olivierlambert's point above - it is 100% reproducible if you have an under-resourced XO and fast target remote so the bottleneck becomes the CPU rather than iowait or memory acting as a throttle, but in our instance only for VMs of a reasonable size (>50gb) containing complex databases with lots of incompressible data.

      posted in Xen Orchestra
      S
      shorian

    Latest posts made by shorian

    • RE: Move VM to a host containing a CR vm

      @danp Sorry, since it seems to be a design item rather than a bug I didn't think to put the logs in. Here you go:

      vm.migrate
      {
        "vm": "24103ce1-e47b-fe12-4029-d643e0382f08",
        "mapVifsNetworks": {
          "7457d175-8d01-613e-7b47-fb1714693074": "b62c7a9a-222a-e8e9-754e-982839e00d0e"
        },
        "migrationNetwork": "ef24440c-fda5-d88b-ce4a-fd12b7ad1d4d",
        "sr": "cf2dbaa3-21f3-903b-0fd1-fbe68539f897",
        "targetHost": "98da99c3-4ec2-4db8-ab1b-a1cb6ffd329a"
      }
      {
        "code": 21,
        "data": {
          "objectId": "24103ce1-e47b-fe12-4029-d643e0382f08",
          "code": "DUPLICATE_VM"
        },
        "message": "operation failed",
        "name": "XoError",
        "stack": "XoError: operation failed
          at factory (/opt/xo/xo-builds/xen-orchestra-202102171611/packages/xo-common/src/api-errors.js:21:32)
          at /opt/xo/xo-builds/xen-orchestra-202102171611/packages/xo-server/src/api/vm.js:487:15
          at runMicrotasks (<anonymous>)
          at processTicksAndRejections (internal/process/task_queues.js:97:5)
          at runNextTicks (internal/process/task_queues.js:66:3)
          at processImmediate (internal/timers.js:434:9)
          at Object.migrate (/opt/xo/xo-builds/xen-orchestra-202102171611/packages/xo-server/src/api/vm.js:474:3)
          at Api.callApiMethod (/opt/xo/xo-builds/xen-orchestra-202102171611/packages/xo-server/src/xo-mixins/api.js:304:20)"
      } 
      
      posted in Xen Orchestra
      S
      shorian
    • RE: Move VM to a host containing a CR vm

      @jedimarcus That is correct; issue ceases to be a problem with shared storage.

      posted in Xen Orchestra
      S
      shorian
    • Move VM to a host containing a CR vm

      Have come across a few occasions where a VM needs to migrate to a different host that already holds a CR replica but the VM cannot be moved until the (one or more) CR replicas have been removed. The replica is given a different name as part of the CR process (ie suffixed with date and name of CR process) yet the move always fails with a 'DUPLICATE_VM' error.

      I presume this is due to the UUIDs matching hence the fail, except the conflict of redundant VMs is not universal since I can have two or more replicas of the same VM on the same secondary server.

      A common Use Case would be to have two operational hosts, one as primary and one as secondary, where the primary has a continuous replication of its VMs to the secondary in case of outage or corruption. When the User needs to update (patch) the master, the normal approach would be to migrate the live VMs to the secondary host, upgrade the master, then move the VMs back.

      However in the current scenario this fails with the above error, so instead one has to delete the replicas on the secondary before the VM migration can be undertaken (thus removing the safety net and the redundancy until the process is completed and the restarted CR process has successfully run again).

      Why can't I suspend the CR process and migrate my production VM across to the secondary host without deleting all the VM's replicas first?

      (If I have misunderstood something or made some stupid error, I beg forgiveness in advance πŸ™‚ )

      posted in Xen Orchestra
      S
      shorian
    • RE: backblaze b2 / amazon s3 as remote in xoa

      Ok, spent weekend having backups running continuously across a number of boxes.

      Good news - the fix seems to have solved things, providing one only ever uses concurrency β€œ1” and there are no conflicting or overlapping backups.

      Restores are working fine for me too.

      In short @nraynaud - it’s a substantial improvement and for me makes this now usable. A huge thank you.

      posted in Xen Orchestra
      S
      shorian
    • RE: backblaze b2 / amazon s3 as remote in xoa

      @nraynaud Removing any concurrency seems to be effective; certainly a substantial improvement upon the original backup prior to your amendments.

      We have managed to get things to run pretty much every time now, by running with concurrency set to '1' and being careful on the timing to ensure no other backups accidentally run in parallel.

      Have checked a couple of restores and they seem to be ok too.

      Only thing I would highlight is that now I am not getting the failures, I cannot tell if the issue on the remote when recovering from a partial/failed backup is resolved. I guess this needs me to pull a plug on the network whilst back up is running but I would need to test this on a different machine in the lab rather than where we're running at the moment.

      posted in Xen Orchestra
      S
      shorian
    • RE: backblaze b2 / amazon s3 as remote in xoa

      @nraynaud Bizarre isn't it; I'm so very grateful for your efforts.

      Some more news - it seems that one big challenge is around concurrency - things improve dramatically if concurrency is set to 1. As soon as something else is running in parallel, we run into the socket failures. I'm expanding things to try your change on another box to see if the outcomes are different - but in summary what I'm seeing so far:

      • Concurrency = 1 - works fine first time, fails occasionally (20% of the time?) thereafter
      • Concurrency > 1 - almost impossible to get it to run, but sometimes one or two VMs backup ok but not enough to be predictable and never the entire group
      • Anything fails - impossible to get a clean run again until the S3 target has been cleaned entirely

      So it appears that somewhere there perhaps is a lock occurring when more than one stream is running, and additionally there's some kind of conflict when things have terminated prematurely and the target is therefore not in its expected state on the next run.

      posted in Xen Orchestra
      S
      shorian
    • RE: backblaze b2 / amazon s3 as remote in xoa

      @nraynaud Unfortunately after a good start I'm now seeing AWS.S3.upload socket hang up errors:

                      "message": "Error calling AWS.S3.upload: socket hang up",
                      "name": "Error",
                      "stack": "Error: Error calling AWS.S3.upload: socket hang up\n    at rethrow (/opt/xo/xo-builds/xen-orchestra-202102171611/node_modules/@sullux/aws-sdk/webpack:/lib/proxy.js:114:1)\n    at tryCatcher (/opt/xo/xo-builds/xen-orchestra-202102171611/node_modules/bluebird/js/release/util.js:16:23)\n    at Promise._settlePromiseFromHandler (/opt/xo/xo-builds/xen-orchestra-202102171611/node_modules/bluebird/js/release/promise.js:547:31)\n    at Promise._settlePromise (/opt/xo/xo-builds/xen-orchestra-202102171611/node_modules/bluebird/js/release/promise.js:604:18)\n    at Promise._settlePromise0 (/opt/xo/xo-builds/xen-orchestra-202102171611/node_modules/bluebird/js/release/promise.js:649:10)\n    at Promise._settlePromises (/opt/xo/xo-builds/xen-orchestra-202102171611/node_modules/bluebird/js/release/promise.js:725:18)\n    at _drainQueueStep (/opt/xo/xo-builds/xen-orchestra-202102171611/node_modules/bluebird/js/release/async.js:93:12)\n    at _drainQueue (/opt/xo/xo-builds/xen-orchestra-202102171611/node_modules/bluebird/js/release/async.js:86:9)\n    at Async._drainQueues (/opt/xo/xo-builds/xen-orchestra-202102171611/node_modules/bluebird/js/release/async.js:102:5)\n    at Immediate.Async.drainQueues [as _onImmediate] (/opt/xo/xo-builds/xen-orchestra-202102171611/node_modules/bluebird/js/release/async.js:15:14)\n    at processImmediate (internal/timers.js:461:21)"
                    }
      

      This is not occurring for all the VMs being uploaded, usually only for one out of the three. The tasks then stays open and runs until the timeout after 3 hours, despite normally taking about 30 minutes for this particular batch.

      After 3 successful runs, this has now occurred each time on the following 3 runs. I am going to clear out the target completely and see if that makes any difference. (Note that I am using BackBlaze B2 not AWS.). Let me know if you want me to send you the full log or an extract of SMlog or anything else.

      To @DustinB's point above, I have tried one restore and it came up fine, but please don't consider this a full and comprehensive test.

      posted in Xen Orchestra
      S
      shorian
    • RE: backblaze b2 / amazon s3 as remote in xoa

      @dustinb Concur 100%; my current focus is on confirming the error doesn't reoccur and understanding the change in what I'm seeing compared with previous backups, before this goes into production I 100% agree it should be tested for restores. I shall do so myself once I've got confidence that the symptom has been resolved.

      posted in Xen Orchestra
      S
      shorian
    • RE: backblaze b2 / amazon s3 as remote in xoa

      @nraynaud Ok, so far so good; haven't had a timeout yet. However the backups are reported as being much much smaller ; overall cumulative size has dropped from over 20gb to under 5gb, which would avoid the problem in any case.

      There are no changes to the settings (zstd, normal snapshot without memory), all I can think of is that maybe there were a lot of resends resulting in the large data size being reported within XO, but unless I've picked up a much improved algorithm by building that commit compared with the released branch, I'm a little confused.

      I will take a look at the size of the actual backups as held on the remote (Backblaze B2) compared with the reported size in XO to see if I can substantiate the above paragraph.

      Meanwhile, I'll keep running backups to soak test it but so far we're looking good!

      posted in Xen Orchestra
      S
      shorian
    • RE: backblaze b2 / amazon s3 as remote in xoa

      @nraynaud superb! I can’t get into this until tomorrow evening but will be on it as soon as I can. Bear with....

      posted in Xen Orchestra
      S
      shorian