XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    VDI Won't Coalesce (shows orphaned but isn't)

    Scheduled Pinned Locked Moved Xen Orchestra
    14 Posts 3 Posters 4.6k Views 1 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • planedropP Offline
      planedrop Top contributor
      last edited by olivierlambert

      So this is what I'm seeing related to this VDI:

      Aug 27 14:14:29 xcp-ng-1 SMGC: [17466] coalesce: EXCEPTION <class 'util.CommandException'>, Invalid argument
      Aug 27 14:14:29 xcp-ng-1 SMGC: [17466]   File "/opt/xensource/sm/cleanup.py", line 1753, in coalesce
      Aug 27 14:14:29 xcp-ng-1 SMGC: [17466]     self._coalesce(vdi)
      Aug 27 14:14:29 xcp-ng-1 SMGC: [17466]   File "/opt/xensource/sm/cleanup.py", line 1942, in _coalesce
      Aug 27 14:14:29 xcp-ng-1 SMGC: [17466]     vdi._doCoalesce()
      Aug 27 14:14:29 xcp-ng-1 SMGC: [17466]   File "/opt/xensource/sm/cleanup.py", line 766, in _doCoalesce
      Aug 27 14:14:29 xcp-ng-1 SMGC: [17466]     self.parent._increaseSizeVirt(self.sizeVirt)
      Aug 27 14:14:29 xcp-ng-1 SMGC: [17466]   File "/opt/xensource/sm/cleanup.py", line 969, in _increaseSizeVirt
      Aug 27 14:14:29 xcp-ng-1 SMGC: [17466]     self._setSizeVirt(size)
      Aug 27 14:14:29 xcp-ng-1 SMGC: [17466]   File "/opt/xensource/sm/cleanup.py", line 984, in _setSizeVirt
      Aug 27 14:14:29 xcp-ng-1 SMGC: [17466]     vhdutil.setSizeVirt(self.path, size, jFile)
      Aug 27 14:14:29 xcp-ng-1 SMGC: [17466]   File "/opt/xensource/sm/vhdutil.py", line 237, in setSizeVirt
      Aug 27 14:14:29 xcp-ng-1 SMGC: [17466]     ioretry(cmd)
      Aug 27 14:14:29 xcp-ng-1 SMGC: [17466]   File "/opt/xensource/sm/vhdutil.py", line 102, in ioretry
      Aug 27 14:14:29 xcp-ng-1 SMGC: [17466]     errlist = [errno.EIO, errno.EAGAIN])
      Aug 27 14:14:29 xcp-ng-1 SMGC: [17466]   File "/opt/xensource/sm/util.py", line 330, in ioretry
      Aug 27 14:14:29 xcp-ng-1 SMGC: [17466]     return f()
      Aug 27 14:14:29 xcp-ng-1 SMGC: [17466]   File "/opt/xensource/sm/vhdutil.py", line 101, in <lambda>
      Aug 27 14:14:29 xcp-ng-1 SMGC: [17466]     return util.ioretry(lambda: util.pread2(cmd),
      Aug 27 14:14:29 xcp-ng-1 SMGC: [17466]   File "/opt/xensource/sm/util.py", line 227, in pread2
      Aug 27 14:14:29 xcp-ng-1 SMGC: [17466]     return pread(cmdlist, quiet = quiet)
      Aug 27 14:14:29 xcp-ng-1 SMGC: [17466]   File "/opt/xensource/sm/util.py", line 190, in pread
      Aug 27 14:14:29 xcp-ng-1 SMGC: [17466]     raise CommandException(rc, str(cmdlist), stderr.strip())
      Aug 27 14:14:29 xcp-ng-1 SMGC: [17466]
      Aug 27 14:14:29 xcp-ng-1 SMGC: [17466] *~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*
      Aug 27 14:14:29 xcp-ng-1 SMGC: [17466] Coalesce failed, skipping
      

      Is this maybe an issue with the resizing I tried on this VDI (and appeared successful) when I did the migration?

      1 Reply Last reply Reply Quote 0
      • planedropP Offline
        planedrop Top contributor
        last edited by

        To add some additional detail:

        Once XOA stops showing the VDI being coalesced, the cycle starts again, first it'll show this VDI has a depth of 1 that needs to be coalesced, then it will change to 2 a few minutes later, and then it will fail again and the loop restarts.

        1 Reply Last reply Reply Quote 0
        • olivierlambertO Online
          olivierlambert Vates 🪐 Co-Founder CEO
          last edited by

          So you have a problem on coalesce on the host 🙂

          planedropP DanpD 2 Replies Last reply Reply Quote 0
          • planedropP Offline
            planedrop Top contributor @olivierlambert
            last edited by

            @olivierlambert Any idea what that problem is? It works 100% perfectly for all other VMs on this host, never a single issue with coalesce, so not sure why it's happening with this one.

            Or does this look like it should be a host wide issue?

            1 Reply Last reply Reply Quote 0
            • DanpD Offline
              Danp Pro Support Team @olivierlambert
              last edited by

              @olivierlambert You don't say!

              d9761359-5b50-492a-b881-d068cef5cf43-image.png

              😝

              1 Reply Last reply Reply Quote 0
              • planedropP Offline
                planedrop Top contributor
                last edited by

                So I was able to confirm that other VMs for sure are coalescing just fine.

                While I was digging through the logs for that, I noticed something, a little background might help.

                Originally this VDI was 160GB when transferred as VHD from Hyper-V, I then migrated to another SR and back to the local one, then resized it to 180GB.

                The SR VDI Chain shows this:
                Aug 27 15:00:51 xcp-ng-1 SMGC: [14113] *2d12ea03(160.000G/142.467G)
                Aug 27 15:00:51 xcp-ng-1 SMGC: [14113] *156a132d(180.000G/41.425G)
                Aug 27 15:00:51 xcp-ng-1 SMGC: [14113] fb2d9abb(180.000G/10.383M)

                Which leads me to wonder if the resize caused some issues or something. Additionally I did have a 160GB orphaned VDI from the other SR which I deleted from the Health page.

                planedropP 1 Reply Last reply Reply Quote 0
                • planedropP Offline
                  planedrop Top contributor @planedrop
                  last edited by

                  @planedrop said in VDI Won't Coalesce (shows orphaned but isn't):

                  So I was able to confirm that other VMs for sure are coalescing just fine.

                  While I was digging through the logs for that, I noticed something, a little background might help.

                  Originally this VDI was 160GB when transferred as VHD from Hyper-V, I then migrated to another SR and back to the local one, then resized it to 180GB.

                  The SR VDI Chain shows this:
                  Aug 27 15:00:51 xcp-ng-1 SMGC: [14113] *2d12ea03(160.000G/142.467G)
                  Aug 27 15:00:51 xcp-ng-1 SMGC: [14113] *156a132d(180.000G/41.425G)
                  Aug 27 15:00:51 xcp-ng-1 SMGC: [14113] fb2d9abb(180.000G/10.383M)

                  Which leads me to wonder if the resize caused some issues or something. Additionally I did have a 160GB orphaned VDI from the other SR which I deleted from the Health page.

                  If someone can clarify what the above means when it comes to the VDI chain that'd be awesome. Like the 41.425G one, I'm not sure what that means. Is this indicating that the original size used was 142.467G (which is correct) and then after the increase to 180GB it's only using 41.425G? Or is that 41.425G a reference of some sort?

                  Thanks again for any help, this one is really tripping me up.

                  1 Reply Last reply Reply Quote 0
                  • olivierlambertO Online
                    olivierlambert Vates 🪐 Co-Founder CEO
                    last edited by

                    As long as you have "exception" displayed in the SMlog, you have coalesce issues on that SR. Could be the SR itself or a broken VDI.

                    You could check the problematic VHD and its parents with vhd-util to see if there's header or footer issues. Alternatively, you can migrate it to another SR, check if coalesce is back on track, then migrate it back.

                    planedropP 2 Replies Last reply Reply Quote 0
                    • planedropP Offline
                      planedrop Top contributor @olivierlambert
                      last edited by

                      @olivierlambert I'm trying the migration option right now, if that doesn't work I'll do some digging with vhd-util. I don't think there is an issue with the SR as a whole though, as I've had a very large amount of successful snapshots and coalesce's on other VDIs on this SR, none of them ever came back up with exceptions or anything like that, so I'd guess a broken VDI.

                      I'll report back my findings and go from there, if I don't have it figured out this weekend I'll submit an official support ticket about it.

                      Thanks for the help here!!

                      1 Reply Last reply Reply Quote 0
                      • planedropP Offline
                        planedrop Top contributor @olivierlambert
                        last edited by

                        @olivierlambert So I may have figured out what happened, wanted to see if this sounds possible.

                        I think I mistakenly snapshotted this VDI after moving it to another SR, then moved it back without first deleting that snapshot, THEN resized the VDI.

                        So I don't think it was able to merge the snapshots.

                        After moving it to that other SR and then back to our main SR, it hasn't tried to coalesce at all and I'm not seeing any exceptions in the SMLog.

                        Going to boot back up this VM and see if the issue comes back later or not but it's been an entire day now with no Exceptions, and it was having those about every 30 minutes.

                        1 Reply Last reply Reply Quote 0
                        • planedropP Offline
                          planedrop Top contributor
                          last edited by

                          So another odd thing I'm seeing with this VDI, it's showing the size incorrectly. It shows 180GB of 180GB used up (on thin provisioned SR, both the old and new are), however the VM is only using 140GB of that 180GB.

                          Something definitely went wrong with this VDI during transfer, just not sure what.

                          I will say that I increased the VDI size again and now it displays more accurately, showing 180GB of 185GB used (both in XOA and with vhd-util). Almost behaving as if this was at one point on a thick provisioned SR or something.

                          Just to avoid issues I'm maybe tempted to create a fresh VHD, copy data to that, then delete this one.

                          1 Reply Last reply Reply Quote 0
                          • First post
                            Last post