@gb.123 The GPU presents two devices on the PCIe bus, a video controller and an audio device (86:000.0 and 86:00.1 in my case). Make sure you have both excluded from dom0 and pass both to the VM as well. I believe you get issues if you are only passing one of the two onboard devices. Also, are you using recent NVidia drivers from NVidia itself? Older drivers from NVidia and those downloaded from Windows Update used to detect that you were running in a VM and blocked use of consumer grade GPUs in an attempt to force folks to spend extra on a data center GPU if they wanted a GPU in their virtual machine. But NVidia discontinued that practice roughly 4 years ago. Out of everything I wanted to pass through to my VM, the GPU is the one that just worked right out of the box. After installing the drivers I was able to switch to using a display connected to the GPU and run some benchmarks to test it without any issues.
Latest posts made by MattRC
-
RE: XCP-NG 8.3 PCI Passthrough Trials and Tribulations
-
XCP-NG 8.3 PCI Passthrough Trials and Tribulations
I apologize in advance for the novel. This got a little long...
I recently got my hands on a HPE DL380 Gen10 server w/ dual Xeon Gold 6246s at no cost (annual HW refresh recycle pile rescue) and since it came with dual 1600 watt power supplies I decided to not only use it to upgrade my existing home virtual server (XCP-NG 8.2 based), but also replace my failing 2017 Alienware gaming laptop with a virtualized gaming VM by passing through the necessary hardware to the VM. I had no issues installing XCP-NG 8.3 onto the server and a lot of things just worked as expected, but some things didn’t and I thought I’d present some of the things I ran across in case someone else goes looking for the same answers I did. For this "gaming" VM I wanted to pass through a Gigabyte RTX 5070, a PCIe USB 3.2 controller card, and a Samsung 990 PRO NVMe SSD plugged into an M.2-to-PCIe adapter card.
The first issue I ran into, which I solved, was with the PCIe USB controller and removing it from dom0. The XCP-NG documentation indicates that the “xe pci-disable-dom0-access” command should be used to remove a device from dom0 with XCP-NG 8.3, but it requires a pci uuid for the device. The “xe pci-list” wouldn’t list the USB controller. I even tried with a second USB 3.0 PCIe card I had lying around and it too wouldn’t show in the list. It just wasn’t there while both the SSD and the two devices (video and audio) presented by the GPU were listed and easily excluded to dom0 via the documented command. The USB card was shown on the “lspci” list, so it was present.
12:00.0 USB controller: ASMedia Technology Inc. ASM3242 USB 3.2 Host Controller 86:00.0 VGA compatible controller: NVIDIA Corporation Device 2f04 (rev a1) 86:00.1 Audio device: NVIDIA Corporation Device 2f80 (rev a1) d8:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller S4LV008[Pascal]
But for whatever reason, xe could not see it (and by extension, neither could XOA).
uuid ( RO) : a0826aa7-c132-f4f2-f338-b7cdcdd70a8c vendor-name ( RO): Samsung Electronics Co Ltd device-name ( RO): NVMe SSD Controller S4LV008[Pascal] pci-id ( RO): 0000:d8:00.0 uuid ( RO) : 62d8de6b-c17c-ef56-410f-ef627a94dcaa vendor-name ( RO): NVIDIA Corporation device-name ( RO): Device 2f80 pci-id ( RO): 0000:86:00.1 uuid ( RO) : c11db184-6c96-1a74-ca0e-93588d2e8d12 vendor-name ( RO): NVIDIA Corporation device-name ( RO): Device 2f04 pci-id ( RO): 0000:86:00.0
This is where I had to blend commands from XCP-NG 8.2 and 8.3 to get it to work. I had noticed that the new “xe” commands seemed to be creating the “xen-pciback.hide” boot parameter so after using the “xe pci-disable-dom0-access” command for the three devices it could see, I used “xen-cmdline” to update the “xen-pciback.hide” parameter to include the USB device id with the three it now had…
/opt/xensource/libexec/xen-cmdline --set-dom0 “xen-pciback.hide=(0000:d8:00.0)(0000:86:00.1)(0000:86:00.0)(0000:12:00.0)”
After a reboot, dom0 no longer had a stranglehold on the USB card and “xl pci-assignable-list” now had all four expected devices listed. However, XOA itself still could not see the USB controller for assignment to a VM via GUI, so command line to the rescue! Adding all four devices to my target VM was a snap via the command line by setting the “other-config:pci” vm parameter to
0/0000:d8:00.0,0/0000:86:00.0,0/0000:86:00.1,0/0000:12:00.0
This is how I got the USB controller to pass through and I hope the information helps save someone else some time.Now for the second issue I ran into which is still ongoing. Since this is a gaming VM, I’m going with Windows 11 so I originally created a new VM from the Windows 11 template and added the 4 passthrough devices via command line as mentioned previously. It appears the current guest UEFI BIOS doesn’t know how to boot from a passthrough NVMe device. The Windows install media had no issues finding the SSD and installing to it, but when it reboots I’m always dropped into a UEFI shell instead of booting off the SSD. No problem, since the guest can see my hardware I just added a small virtual disk to the VM for the OS and will use the passthrough NVMe as a data drive on which all the games will be stored. This works and I was able to get Windows 11 installed and running. However, installing Xen tools (for the virtio drivers) breaks everything, which is where I’m currently stuck. If I install Xen tools, or enable driver updates through Windows Update, I get stuck with an
INACCESSIBLE BOOT DEVICE
crash when trying to start the VM. The only way around this issue is to remove the passthrough NVMe device, after which the VM boots up just fine. Even if I completely set up Windows 11 without the NVMe passthrough device, tools and all, reboot a few times just to be sure, then add the SSD device back to the VM afterwards, it still gets stuck in the same crash until I remove the SSD from the VM again. I can boot into safe mode, but that only works because it falls back to the default QEMU drivers instead of the ones provided by Xen tools. I suspect the Xen tools virtio drivers are getting confused between the passthrough SSD and the virtual disk also connected to the VM and just breaks, but I don’t have any means to further troubleshoot unless someone here has any ideas?And yes, I can give the SSD back to dom0, create a SR on it, create a virtual disk for the VM on it, and sort of use it that way. But there is a sizable IO performance hit utilizing the SSD that way even with Xen tools running.
I know my use case falls into the rather remote and obscure, but that just makes it all the more fun to tinker with!
Anyways, I figured I’d throw out what I’d learned thus far in case anyone else is trying the same thing and see if anyone could offer any insight on how I might be able to overcome this final sticking point with the passthrough SSD.
Thanks in advance for anything you can offer!