XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    [Solved] VM Disk Missing

    Scheduled Pinned Locked Moved Management
    8 Posts 3 Posters 320 Views 3 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • S Offline
      stevezemlicka
      last edited by stevezemlicka

      I was doing some maintenance and noticed that the disk section of a running VM was empty. I have verified that somehow the primary vhd for this VM has the "is-a-snapshot" attribute set as true with no parent. It also seems that it is listed as a snapshot to itself:

      xe vdi-param-list uuid=766d1995-19ba-420f-95e4-30e42dcbc698
      uuid ( RO)                    : 766d1995-19ba-420f-95e4-30e42dcbc698
                    name-label ( RW): rss01 0
              name-description ( RW): 
                 is-a-snapshot ( RO): true
                   snapshot-of ( RO): 766d1995-19ba-420f-95e4-30e42dcbc698
                     snapshots ( RO): 6714273a-444b-4a21-ad58-b72abb85d6a7; 766d1995-19ba-420f-95e4-30e42dcbc698; 7b3e6fe8-a9f5-4cc1-8f13-d52f474bf7ab
                 snapshot-time ( RO): 20241228T06:08:27Z
            allowed-operations (SRO): snapshot; clone
            current-operations (SRO): 
                       sr-uuid ( RO): e5eda81e-540b-029b-f180-20124f81163e
                 sr-name-label ( RO): HDD RAID1
                     vbd-uuids (SRO): 47e78473-706f-8e36-a017-c0983fdf2560
               crashdump-uuids (SRO): 
                  virtual-size ( RO): 214748364800
          physical-utilisation ( RO): 426496
                      location ( RO): 766d1995-19ba-420f-95e4-30e42dcbc698
                          type ( RO): System
                      sharable ( RO): false
                     read-only ( RO): false
                  storage-lock ( RO): false
                       managed ( RO): true
           parent ( RO) [DEPRECATED]: <not in database>
                       missing ( RO): false
                  is-tools-iso ( RO): false
                  other-config (MRW): 
                 xenstore-data (MRO): 
                     sm-config (MRO): vhd-parent: 03b82421-c7a0-4c13-8d02-52aae2831674; read-caching-enabled-on-0250c976-1a99-4ee3-8b4b-27840941d478: true; host_OpaqueRef:24473335-4516-4675-aba9-ece2b4a46fef: RW
                       on-boot ( RW): persist
                 allow-caching ( RW): false
               metadata-latest ( RO): false
              metadata-of-pool ( RO): <not in database>
                          tags (SRW): 
                   cbt-enabled ( RO): false
      

      I found a couple of related posts. The first seems to have no resolution and it seems in the second, the solution was to export and import the VM:
      https://xcp-ng.org/forum/topic/6981/vmguest-disk-is-missing-in-xen-orchestra/11
      https://xcp-ng.org/forum/topic/6336/vm-missing-disk/26

      I checked all SRs and none seem to have any coalesce locks so I'm not sure I have any current problems related to coalescence.

      I'd like to resolve this in place if possible. Is there a way to set the "is-a-snapshot" flag to false? Is there something else I should check? Or is it best to export and import?

      K 1 Reply Last reply Reply Quote 0
      • K Offline
        kagbasi-ngc @stevezemlicka
        last edited by

        @stevezemlicka I don't necessarily have a solution, however, I was experiencing similar symptoms in a lab environment (running 3 hosts at v8.3.0) - where the primary disk of several VMs were disappearing at a fairly regular interval. Working with the Vates Support team, we narrowed it down to the Garbage Collection task's interval. Essentially, what I was seeing when looking at the Disks tab of the SR in question, was a set of VDIs (with no name or description) disappearing then re-appearing. Strangely enough, when a disk disappeared from a VM, the VM continued to run as if nothing had happened. Even stranger, during the time when the VDI had disappeared, I could still access the filesystem and manipulate files & folders from inside the VM (which I thought was absolutely magical).

        Anyway, while troubleshooting another issue with migrating VDIs to another SR, @olivierlambert asked me to list the contents of the SR in question with the following command: ls -latrh /run/sr-mount/<UUID-of-SR>. When I submitted the output, he noticed a VDI with the text "OLD_" prepended to its name. He explained that perhaps this was an indication of a failed coalescing, and asked me to move (not delete) that file and rescan the SR. I did that, but when I didn't see my issue get resolved, I went ahead and restarted the Xen Toolstack on the master host. As soon as I did this, I noticed my issue got resolved, and as a plus, I observed that the "magical" disappearing of the VDIs was no longer happening. It's been several days now, and I haven't seen any VDIs disappear from any VMs.

        So, like I said, not sure if this is a solution for you but perhaps could point you in the right direction.

        1 Reply Last reply Reply Quote 0
        • olivierlambertO Offline
          olivierlambert Vates 🪐 Co-Founder CEO
          last edited by

          Adding @ronan-a and @dthenot as it's maybe related to an issue we are fixing in the future patch available upstream in few days

          1 Reply Last reply Reply Quote 0
          • S Offline
            stevezemlicka
            last edited by

            Awesome, great info! I don't recall seeing any atypical VDI names but I don't think I was looking specifically for that. I will re-check the SR and also restart the toolstack. The toolstack restart definitely holds some promise since this is a single server and, as a result, does not get restarted/patched often.

            I should get a second server to be able to move critical workloads to (without downtime) and develop a more healthy patch/restart practice even if this is just a homelab. Thanks for that info!

            @olivierlambert, if it sounds like this may be related to an issue being worked on and I can provide any helpful info, I'd be happy to gather anything that you think might be relevant.

            S 1 Reply Last reply Reply Quote 0
            • S Offline
              stevezemlicka @stevezemlicka
              last edited by

              No "OLD_" VHDs in any SR found. Toolstack restart also did not resolve the issue. I will fully patch the server as the next step (probably should have done this earlier).

              S 1 Reply Last reply Reply Quote 0
              • S Offline
                stevezemlicka @stevezemlicka
                last edited by

                No change after full xcp (8.2.1) patch and reboot. The VM in question started without issue as well.

                For reference, I'm using XO commit d044d (currently 11 commits behind) on Master commit 6d34c.

                I will leave as is since everything is functional to see if the future commit helps. I should be able to revisit at the end of the week to rebuild on the latest XO commit and test.

                S 1 Reply Last reply Reply Quote 0
                • S Offline
                  stevezemlicka @stevezemlicka
                  last edited by stevezemlicka

                  Just rebuilt XO up to commit a4986. The disk seems to be in the same state. Of course, if this commit includes the fix but the fix specifically prevents this issue, then I'll likely need to take additional steps to correct it to get back into a good state. Or, maybe that fix isn't yet in the commit (or maybe I have a situation that isn't addressed by the fix).

                  With regards to correcting the issue, I think there are a few options:

                  1. Export and import VM
                  2. Create 2nd disk, clone the disk from within the VM to the new disk, and boot to the good disk.
                  3. Rebuild the VM
                  4. Manually flip the disk's "is-a-snapshot" flag to false. Not sure of the implications of doing this.
                  S 1 Reply Last reply Reply Quote 0
                  • S Offline
                    stevezemlicka @stevezemlicka
                    last edited by

                    I rebuilt XO to build 7579b with no change. I performed an export and import of the VM and the imported VM functioned normally and displayed the disk in the UI appropriately.

                    1 Reply Last reply Reply Quote 0
                    • First post
                      Last post