Imported VM Starts but Does Not Initialize the Display

kagbasi-ngc

@olivierlambert So I managed to whip up a v8.2.1 instance and attempted to import the VM, but the import wouldn't start. I simply get the message in the top-right - "Starting import" - then it would go away.

I couldn't find any errors on the host logs, but I found this error within XO itself (i.e., Settings > Logs) :

Screenshot 2024-12-11 025730.png

My takeaway from this is that I cannot export a VM from a newer host and import it into an older one.

olivierlambert

Note: you cannot import a VM from a more recent XCP-ng to an older one (the opposite works though)

kagbasi-ngc

@olivierlambert Hmm, so why would you encourage me to try it, if you knew it wouldn't work....lol.

olivierlambert

I encouraged you to test if nested XCP-ng 8.2 worked better than 8.3, that's it.

kagbasi-ngc

@olivierlambert Okay, so I'm at fault for reading too much into your guidance, eh? Okay.

In that case then, I think we'd already established that nested virtualization works - because I have been able to add an SR, create VMs, add VDIs, etc., in the nested virtualization environment on 8.3. The only thing that was failing for me, which is the primary reason for standing it up, is the import of a VM.

So is it safe to say this is not likely an issue with nested virtualization but with something else in the codebase that's preventing the imported VM from successfully initializing the display when in UEFI mode?

stormi

@kagbasi-ngc I'd check the output of xl dmesg and the contents of /var/log/daemon.log and /var/log/xensource.log after trying to start the VM.

kagbasi-ngc

@stormi @olivierlambert

As requested, please find the outputs of the logs:

Output of xl dmesg - https://gist.github.com/kismetgerald/403edf28d5fd358722d2bc36b52f38f1
Output of /var/log/daemon.log - https://gist.github.com/kismetgerald/8965863047eee26b815dcfcfe4faabae
Output of /var/log/xensource.log - https://gist.github.com/kismetgerald/4ed62e999d9920d697f06c8e42de9873

For time reference, I started the VM at approximately Dec 12 05:08 - don't recall the precise second. If I can provide other logs, please let me know.

stormi

@kagbasi-ngc I was inviting you to have a look first, but I'll try to give it a look. Still, try to find something relevant in them, it's a good troubleshooting exercise.

stormi

Relevant logs, from daemon.log:

Dec 12 05:08:18 VMH01 qemu-dm-1[8691]: SyncPcrAllocationsAndPcrMask!
Dec 12 05:08:18 VMH01 qemu-dm-1[8691]: Set PcdTpm2Hash Mask to 0x0000000F
Dec 12 05:08:19 VMH01 qemu-dm-1[8691]: AllocatePages failed: No 0x8400 Pages is available.
Dec 12 05:08:19 VMH01 qemu-dm-1[8691]: There is only left 0x3AA8 pages memory resource to be allocated.
Dec 12 05:08:19 VMH01 qemu-dm-1[8691]: ERROR: Out of aligned pages
Dec 12 05:08:19 VMH01 qemu-dm-1[8691]: ASSERT /builddir/build/BUILD/edk2-20220801/MdeModulePkg/Core/DxeIplPeim/X64/VirtualMemory.c(814): BigPageAddress != 0

ERROR: Out of aligned pages does not look good to me.

kagbasi-ngc

@stormi My apologies. Normally I relish the opportunity to dive deep into logs, however, in this case I am also the treasurer of my church and I have year-end accounting tasks to accomplish, so haven't had much time to dedicate to this.

Additionally, since I don't really know what I'm looking for in the logs, I figured it's be best to share it on hear for the community to take a look...lol.

kagbasi-ngc

@stormi Yikes! If you're spooked by this error, then what does this mean for me?

By the way, this very VM imported and started up without any issues on a physical XCP-ng host. This issue seems to be happening in a nested virtualization setting - where XCP-ng is the nested hypervisor (a guest of VMware Workstation Pro).

manilx

@kagbasi-ngc Don't do nesting! It's not to be trusted even when it seems to work.

kagbasi-ngc

@manilx Yeah, not doing it in production, just in a test environment. Plus I only did it because I actually found a write-up by the Vates team on how to get it done, which confirmed to me that while they don't recommend it for production, they understand that it has some merit in a testing environment and want to see it working.

That said, I definitely hear your admonition and would never do it in a production setting.

stormi

I'm not spooked, I just think it's the relevant error message we need to understand :).

Since this is a test pool, could you make it use the debug version of OVMF?

ln -sf OVMF-debug.fd /usr/share/edk2/OVMF.fd

Then attempt to start the VM again, and get /var/log/daemon.log and /var/log/xensource.log once again.

kagbasi-ngc

@stormi Glad you're not spooked....

I'll enable the debug mode as you'd instructed and grab the logs for you shortly.

kagbasi-ngc

@stormi Hi there, as requested, here are the two logs:

I started the VM at timestamp Dec 13 13:54:59 and stopped it at Dec 13 13:55:59.

Now, is deleting the symbolic link created above all that is necessary to reverse the debug mode?

Mefosheez

@stormi @kagbasi-ngc I have this same error in a separate thread, however, there is no nesting involved. from my limited knowledge, it seems like xcp is not seeing the available memory resources during boot and fails. if I migrate a running VM to the same host that gives me this error, it will operate as expected. but, after reboot... failure to launch.

https://xcp-ng.org/forum/topic/10083/uefi-guests-not-loading-console?_=1734014269146

@olivierlambert Thanks for your help so far.

stormi

@kagbasi-ngc Looks like we missed something, as the log indicates it loaded OVMF-release.fd:

Dec 13 13:54:59 VMH01 xenguest-2-build[8345]: Loaded OVMF from /usr/share/edk2/OVMF-release.fd

We'll come back to you with a better test procedure.

For now I just have a quick and dirty one, consisting in overwriting the release file with the debug one:

cp /usr/share/edk2/OVMF-debug.fd /usr/share/edk2/OVMF-release.fd

You can revert this change with yum reinstall edk2.

kagbasi-ngc

@stormi Sorry for the delayed reply. I did as you requested, and below are the resulting log files. I started the VM at timestamp Dec 17 14:43:34 and stopped it about 2 minutes later.

If I can be of assistance, don't hesitate to ask please. Thank you.

stormi

@kagbasi-ngc @Mefosheez We're trying to find developer time to diagnose the cause, but it's not the best time of the year, so I can't promise anything on delays.