Nested Virtualization of Windows Hyper-V on XCP-ng
I'm unsure about the Xen ML because previous tests in this thread have shown that it does work on Vanilla then, if I followed things correctly. So if I'm not mistaken we have a Citrix Hypervisor / XCP-ng issue at hand. Maybe we'll still get help from Xen developers in terms of guidance though.
In any case, a nice post by @XCP-ng-JustGreat to Xen's user mailing list (rather than the development mailing list, which would be for issues that we can reproduce in latest vanilla Xen), that summarizes the situation and tests done would probably be useful to help them help us.
I missed the last few messages, so as it looks like Vanilla Xen unpatched doesn't work, that's a case for the xen-devel mailing list
@stormi @olivierlambert All, something stormi mentioned yesterday made me double-check the version of vanilla Xen that was packaged with the Debian 10 test distro. Turns out, it's older than the version used in XCP-ng 8.2. We really do need to see whether or not nested Hyper-V works in Xen 4.15 (the latest) before bringing it to the attention of the Xen dev ML. Toward that end, I astonished myself last night by compiling Xen 4.15 from the source code! Most of the time was spent identifying and installing the many prerequisites--now documented--so subsequent builds will be quite fast. One packaging issue remains: the final make install command installed the xen kernel etc., but did not add the grub entry to boot it. What is the proper way to add the grub menu Debian with Xen boot choice? I considered doing it in a hacky fashion using the leftover grub menu entry from the packaged version in Debian 10. Can you tell me the right way? Please let me know and I'll give it a try this weekend. Thank you.
@olivierlambert @stormi It was quite an odyssey getting everything to run with pure vanilla Xen 4.15 compiled from source on Debian 10.10, but I finally accomplished it. (Learned a lot in the process too!) The final sticking point was that the Windows VM xl config file previously built and working on the older version of Xen packaged with Debian 10, wouldn't boot. Something wasn't working with the guest UEFI support so I switched to BIOS boot and that worked. The net result is that nested Hyper-V installs fine as before, but still won't activate on reboot. I also note that the x2apic CPU capability is now present in the guest as it is with VMware ESXi. That flag is missing when running nested Windows under XCP-ng 8.2 on my Intel i7-6700 processor-based system. Now that we know for sure it is still not working in the very latest Xen kernel, what next steps should we take for getting this issue to the attention of the Xen developers?
@XCP-ng-JustGreat the next step would be to send a detailed bug report to the
xen-develmailing list (see https://xenproject.org/help/mailing-list/). You don't need to subscribe to it to post, and all answers should put your address in CC.
If you want, you can first write your e-mail here for proofreading.
@stormi OK. I'll put it together here first.
Thanks a lot @XCP-ng-JustGreat !
By working together like that, I'm sure we'll be able to point the exact issue
SUBJECT: Nested Virtualization of Hyper-V on Xen Not Working
RATIONALE: Features in recent versions of Windows now REQUIRE Hyper-V support to work. In particular, Windows Containers, Sandbox, Docker Desktop and the Windows Subsystem for Linux version 2 (WSL2). Running Windows in a VM as a development and test platform is currently a common requirement for various user segments and will likely become necessary for production in the future. Nested virtualization of Hyper-V currently works on VMware ESXi, Microsoft Hyper-V and KVM-based hypervisors. This puts Xen and its derivatives at a disadvantage when choosing a hypervisor.
WHAT IS NOT WORKING? Provided the requirements set forth in: https://wiki.xenproject.org/wiki/Nested_Virtualization_in_Xen have been met, an hvm guest running Windows 10 PRO Version 21H1 x64 shows that all four requirements for running Hyper-V are available using the msinfo32.exe or systeminfo.exe commands. More granular knowledge of the CPU capabilities exposed to the guest can be observed using the Sysinternals Coreinfo64.exe command. CPUID flags present appear to mirror those on other working nested hypervisor configurations. Enabling Windows Features for Hyper-V, Virtual Machine Platform, etc. all appear to work without error. However, after the finishing reboot, Hyper-V is simply not active. This--despite the fact that vmcompute.exe (Hyper-V host compute service) is running and there are no errors in the logs. In addition, all four Hyper-V prerequisites continue to show as available.
By contrast, after the finishing reboot of an analogous Windows VM running on ESXi, the four prerequisites are reversed: hypervisor is now active; vmx, ept and urg (unrestricted guest) are all off as viewed with the Coreinfo64.exe –v command. Furthermore, all functions requiring Hyper-V are now active and working as expected.
This deficiency has been observed in two test setups running Xen 4.15 from source and XCP-ng 8.2, both running on Intel with all of the latest, generally available patches. We presume that the same behavior is present on Citrix Hypervisor 8.2 as well.
Clearly, much effort has already been expended to support the Viridian enlightenments that optimize running Windows on Xen. It also looks like a significant amount of effort has been put forth to advance nested virtualization in general.
Therefore, if it would be helpful, I am willing to perform testing and provide feedback and logs as appropriate in order to get this working. While my day job is managing a heterogeneous collection of systems running on various hypervisors, I have learned the rudiments of integrating patches and rebuilding Xen from source so could no doubt be useful in assisting you with this worthwhile endeavor.
While it is widely understood that nested virtualization is officially unsupported in production scenarios
I'm not sure about this. It is clearly unsupported in Citrix Hypervisor, but I'm not sure such statement is true for the Xen project.
Is the capability to run fully-functional nested Hyper-V on Xen a priority that Xen's developers expect to get working?
I'd change this part, assume there's no need to ask about priorities here, and orient it directly towards troubleshooting. You're talking to developers, mainly:
- you're ready to do any tests and provide any logs to help debugging
- you can rebuild Xen with any additional patches (now that you learned how to do it).
@stormi Yes. That is better. I'll update it.
@stormi @olivierlambert It looks like we now have the attention of Andrew Cooper at Citrix. For anyone interested in following or participating in the Xen developer list nested virtualization thread we originated, it begins here: https://lists.xenproject.org/archives/html/xen-devel/2021-07/msg01269.html (Just click Thread Next to go through it sequentially.) For the purposes of that list, I have become Xentrigued. Cooper readily admits that nested virtualization in Xen is "a disaster" and has suffered from neglect. With the upcoming launch of Windows 11 and Server 2022, nested virtualization of Hyper-V and, likely, vTPM 2.0 support will become "musts" for hypervisor certification by Microsoft so there are some strong tail-winds that may aid in pushing this forward beyond the XCP-ng community. I will try to be of some use toward that end.
So your post was great and its seriousness provided traction to move forward. Congrats!
Andy is one of the best (likely the best) Xen expert I know. He's also a bit "pessimistic" sometimes, one would say realistic (the word "disaster" is from his point of view, I mean it's far from being great, but it's enough for some basic use cases).
Anyway, with more and more urgency coming to get things sorted in Windows world, I'm pretty sure more resources will be attached to fix some Xen parts on that. As he said, an incremental approach will be the solution, testing some patches for Xen devs will be a tremendous help.
That's a very good example on why cooperation is a powerful way to move forward. Thanks a lot for your efforts @XCP-ng-JustGreat
AlexanderK last edited by
Everything is on the public mailing list, I suggest you ask there
@alexanderk @olivierlambert Sorry to have not responded sooner to your question. It has been a very long, slow slog so far and I haven't been able to devote as much time as I'd like to working on this. Here's what I've done so far: Based on Andrew Cooper's recommendation, I installed a fully patched Windows Server 2008 R2 VM to Xen. (Hyper-V was initially released with Server 2008 so this is almost as far back as you can go.) Using the current unmodified Xen source code, the VM will permit Hyper-V to be enabled in the Windows Server 2008 R2 guest, but--as with newer versions of Windows--once you perform the finishing reboot, Hyper-V is not actually active. Adding the two recommended source-code patches, recompiling and performing the same test causes the VM to hang following the enablement of Hyper-V. I know that I need to set up a serial console for the VM in order to view any logging that might provide a clue as to what's failing during the boot, but I haven't worked that out just yet.
I've also spent some considerable time reading through the Xen Dev email posts on the history of the development of nested virtualization in Xen. One very significant learning from that reading is that nested virtualization on Xen was initially developed by an AMD developer. Development of the NV feature-set for Intel came later after the AMD-focused design die had been cast. As far as I can tell given that I'm running Server 2008 R2, this never worked on Intel. (Maybe it did on an older Intel processor, but I am currently working with SkyLake i7-6700s so have no way to test older hardware.) Unfortunately, I also don't have appropriate AMD hardware on which to perform the same test to see whether or not it might work on AMD.
On the Microsoft Hyper-V side, it seems as though the opposite evolution happened. Nested virtualization was developed on Intel first, then (very recently) AMD. This makes me suspect that it doesn't work on AMD either. In other words, I don't know that nested virtualization of Windows on Xen ever worked such that Hyper-V was actually active in the guest. I would be delighted to have somebody prove me wrong.