XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    VDI Chain on Deltas

    Scheduled Pinned Locked Moved Backup
    7 Posts 3 Posters 57 Views 3 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • nvossN Offline
      nvoss
      last edited by

      Hi All!

      We were having problems with our backup remotes not working (on-site Synology, off-site Wasabi) with vdi chain issues. Checked the logs per the referenced article and didn't get anywhere with anything obvious. Noticed we were using an older encryption method with the encrypted remotes. So decide to purge the remotes, setup fresh, and off we go.

      Fast forward a week. Full backups went ok on a forced full. We'll see this weekend if it goes well automatically. However deltas continue to fail when run on schedule with the new remotes. I figured for sure that new backups, new snapshots, etc. wouldn't have a coalescing issue. What we've found so far though is "force run" results in a successful backup for the deltas too.

      At a bit of a loss on troubleshooting. Anyone else seeing this? Both remotes are encrypted.

      Thanks!
      Nick

      1 Reply Last reply Reply Quote 0
      • olivierlambertO Offline
        olivierlambert Vates 🪐 Co-Founder CEO
        last edited by

        Hi,

        Can you be more specific on what's going on exactly?

        nvossN 1 Reply Last reply Reply Quote 0
        • nvossN Offline
          nvoss @olivierlambert
          last edited by

          @olivierlambert Sure I can try.

          I can confirm now though that with both my full and my delta jobs that they fail with every single VM on the "Job canceled to protect the VDI chain" error.

          30ff134e-b88f-40a6-9d81-d46e806f12e3-image.png

          If we do a standard restart then it fails the same way. If we use the "force restart" option then it does work properly and backups seem to finish without issue.

          The remote configuration is brand new with encrypted remotes with the multiple data block option selected. The backup job itself is not new, it's been in place for about a year. The job uses VM tags to determine which VMs to backup. The full is a weekly run with 6 retained backups, it remotes to both the external and local. The delta only goes to the local synology and is set with 14 retained backups.

          The storage for the VMs is on a Synology NAS. The VMs live on one of 3 hosts with similar vintage hardware.

          Per the backup troubleshooting article:
          cat /var/log/SMlog | grep -i exception : no results
          cat /var/log/SMlog | grep -i error : no results
          grep -i coales /var/log/SMlog : lots of messages that say "UNDO LEAF-COEALESCE"

          b73ccb49-d5db-4da6-8137-48e5a5f98245-image.png

          The host I ran those commands on is the one which houses the Xen Orchestra VM (whose backup also fails).

          The synology backup remote has 10TB assigned to it with 8.7TB free. The VDI disk volume has 5.4TB of 10TB free.

          Status on the hosts patch-wise shows 6 patches are needed currently, though they were up-to-date last week.

          XO is on commit 9ed55.

          Other specifics I can provide?

          Thanks!
          Nick

          dthenotD 1 Reply Last reply Reply Quote 0
          • dthenotD Offline
            dthenot Vates 🪐 XCP-ng Team @nvoss
            last edited by

            @nvoss Hello, The UNDO LEAF-COEALESCE usually has a cause that is listed in the error above it. Could you share this part please? 🙂

            nvossN 1 Reply Last reply Reply Quote 0
            • nvossN Offline
              nvoss @dthenot
              last edited by

              @dthenot when I grep looking for coalesce I don't see any errors. Everything is the undo message.

              Looking at the line labeled 3680769 in this case corresponding with one of those undo's I see lock opens, variety of what looks like successful mounts and subsequent snapshot activity then at the end the undo. After the undo message I see something not super helpful.

              Attached is that entire region. Below an excerpt.

              887f1018-e6f0-4722-a6e9-324c08ecd9a2-image.png

              It's definitely confusing as to why a force on the job works instead of the regular run?

              Errored Coalesce.txt

              dthenotD 1 Reply Last reply Reply Quote 0
              • dthenotD Offline
                dthenot Vates 🪐 XCP-ng Team @nvoss
                last edited by

                @nvoss Could you try to run vhd-util check -n /var/run/sr-mount/f23aacc2-d566-7dc6-c9b0-bc56c749e056/3a3e915f-c903-4434-a2f0-cfc89bbe96bf.vhd?

                nvossN 1 Reply Last reply Reply Quote 0
                • nvossN Offline
                  nvoss @dthenot
                  last edited by

                  @dthenot sure, here you go!

                  0b424abf-68ef-4fb3-a8d3-b81001f0f314-image.png

                  1 Reply Last reply Reply Quote 0
                  • First post
                    Last post