VM Disk Missing
-
I was doing some maintenance and noticed that the disk section of a running VM was empty. I have verified that somehow the primary vhd for this VM has the "is-a-snapshot" attribute set as true with no parent. It also seems that it is listed as a snapshot to itself:
xe vdi-param-list uuid=766d1995-19ba-420f-95e4-30e42dcbc698 uuid ( RO) : 766d1995-19ba-420f-95e4-30e42dcbc698 name-label ( RW): rss01 0 name-description ( RW): is-a-snapshot ( RO): true snapshot-of ( RO): 766d1995-19ba-420f-95e4-30e42dcbc698 snapshots ( RO): 6714273a-444b-4a21-ad58-b72abb85d6a7; 766d1995-19ba-420f-95e4-30e42dcbc698; 7b3e6fe8-a9f5-4cc1-8f13-d52f474bf7ab snapshot-time ( RO): 20241228T06:08:27Z allowed-operations (SRO): snapshot; clone current-operations (SRO): sr-uuid ( RO): e5eda81e-540b-029b-f180-20124f81163e sr-name-label ( RO): HDD RAID1 vbd-uuids (SRO): 47e78473-706f-8e36-a017-c0983fdf2560 crashdump-uuids (SRO): virtual-size ( RO): 214748364800 physical-utilisation ( RO): 426496 location ( RO): 766d1995-19ba-420f-95e4-30e42dcbc698 type ( RO): System sharable ( RO): false read-only ( RO): false storage-lock ( RO): false managed ( RO): true parent ( RO) [DEPRECATED]: <not in database> missing ( RO): false is-tools-iso ( RO): false other-config (MRW): xenstore-data (MRO): sm-config (MRO): vhd-parent: 03b82421-c7a0-4c13-8d02-52aae2831674; read-caching-enabled-on-0250c976-1a99-4ee3-8b4b-27840941d478: true; host_OpaqueRef:24473335-4516-4675-aba9-ece2b4a46fef: RW on-boot ( RW): persist allow-caching ( RW): false metadata-latest ( RO): false metadata-of-pool ( RO): <not in database> tags (SRW): cbt-enabled ( RO): false
I found a couple of related posts. The first seems to have no resolution and it seems in the second, the solution was to export and import the VM:
https://xcp-ng.org/forum/topic/6981/vmguest-disk-is-missing-in-xen-orchestra/11
https://xcp-ng.org/forum/topic/6336/vm-missing-disk/26I checked all SRs and none seem to have any coalesce locks so I'm not sure I have any current problems related to coalescence.
I'd like to resolve this in place if possible. Is there a way to set the "is-a-snapshot" flag to false? Is there something else I should check? Or is it best to export and import?
-
@stevezemlicka I don't necessarily have a solution, however, I was experiencing similar symptoms in a lab environment (running 3 hosts at v8.3.0) - where the primary disk of several VMs were disappearing at a fairly regular interval. Working with the Vates Support team, we narrowed it down to the Garbage Collection task's interval. Essentially, what I was seeing when looking at the Disks tab of the SR in question, was a set of VDIs (with no name or description) disappearing then re-appearing. Strangely enough, when a disk disappeared from a VM, the VM continued to run as if nothing had happened. Even stranger, during the time when the VDI had disappeared, I could still access the filesystem and manipulate files & folders from inside the VM (which I thought was absolutely magical).
Anyway, while troubleshooting another issue with migrating VDIs to another SR, @olivierlambert asked me to list the contents of the SR in question with the following command:
ls -latrh /run/sr-mount/<UUID-of-SR>
. When I submitted the output, he noticed a VDI with the text "OLD_" prepended to its name. He explained that perhaps this was an indication of a failed coalescing, and asked me to move (not delete) that file and rescan the SR. I did that, but when I didn't see my issue get resolved, I went ahead and restarted the Xen Toolstack on the master host. As soon as I did this, I noticed my issue got resolved, and as a plus, I observed that the "magical" disappearing of the VDIs was no longer happening. It's been several days now, and I haven't seen any VDIs disappear from any VMs.So, like I said, not sure if this is a solution for you but perhaps could point you in the right direction.
-
-
Awesome, great info! I don't recall seeing any atypical VDI names but I don't think I was looking specifically for that. I will re-check the SR and also restart the toolstack. The toolstack restart definitely holds some promise since this is a single server and, as a result, does not get restarted/patched often.
I should get a second server to be able to move critical workloads to (without downtime) and develop a more healthy patch/restart practice even if this is just a homelab. Thanks for that info!
@olivierlambert, if it sounds like this may be related to an issue being worked on and I can provide any helpful info, I'd be happy to gather anything that you think might be relevant.
-
No "OLD_" VHDs in any SR found. Toolstack restart also did not resolve the issue. I will fully patch the server as the next step (probably should have done this earlier).
-
No change after full xcp (8.2.1) patch and reboot. The VM in question started without issue as well.
For reference, I'm using XO commit d044d (currently 11 commits behind) on Master commit 6d34c.
I will leave as is since everything is functional to see if the future commit helps. I should be able to revisit at the end of the week to rebuild on the latest XO commit and test.