Hiding hypervisor from guest to prevent Nvidia Code 43

smithereens

I'll look for update on this topic... but for now I'm going to drop XCP-NV and go (back) to KVM.

olivierlambert

It's XCP-ng not "XCP-NV" We'll continue to track if it's possible to do so one day.

smithereens

@olivierlambert oops.. typo. Apologies

r1

@olivierlambert @imtrobin I had worked on CPUID for exposing temperature data of CPU to Guest (Dom0). May be we can use same.

# cpuid | grep -I hypervisor_id
   hypervisor_id = "Microsoft Hv"

Is this the one which should be hidden?

olivierlambert

I'm not entirely sure about that, I need to ask one Xen dev.

ziomario

This post is deleted!

imtrobin

@r1 said in Hiding hypervisor from guest to prevent Nvidia Code 43:

@olivierlambert @imtrobin I had worked on CPUID for exposing temperature data of CPU to Guest (Dom0). May be we can use same.
# cpuid | grep -I hypervisor_id
   hypervisor_id = "Microsoft Hv"
Is this the one which should be hidden?

I'm have no idea if that is enough, but maybe. How can we try this? For kvm , they do this

https://wiki.archlinux.org/index.php/PCI_passthrough_via_OVMF#.22Error_43:_Driver_failed_to_load.22_on_Nvidia_GPUs_passed_to_Windows_VMs

eangulus

What is the current status on this? We are hoping for this soonish, and if it isn't coming at all or not for a long time, then we need to plan a migration to another platform.

Our use case is for business, but we are not a very big business and do not wish to spend thousands on buying Quadro cards when much cheaper GTX and other consumer cards will do the job perfectly if it wasn't for Nvidia's artificial limitation.

We run a custom program for Kitchen Design as RemoteApps under an RDS setup. At this point, we are running under Quadro 4000 cards, as we already had them, but due to this, I cannot upgrade to Server 2019 as there are no drivers. To upgrade we have to upgrade cards, and the best bang for buck cards in our use, are P4000 which do run under Server 2019, but are $1000+ each. WE want to have 2 cards in each server, totalling 4 cards. We can get even better bang for buck on some RTX 2070's or the like for nearly half the cost.

The only thing stopping us is this stupid Code 43 "bug".

We are willing to wait as we haven't quite hit the limits of our current setup, but we are heading close to it, and we need to plan our next move. I love XCP-ng and honestly cannot fault it for anything, other than having the capability to hide the fact the VM is a vm.

We have already ran a test setup on Proxmox and the Code 43 fix works perfectly.

olivierlambert

If it's for business, why not trying to pour some resources to make this work? It's Open Source with pro support backed, so there's 2 approaches:

contribute to speed up the process
pay to get someone (individual or company) to do it

One priority can't be a priority for everyone. But you can have a influence on that with 2 different levers.

Pyro2677

I've been using Xcp-NG and Xen for awhile now and unfortunately i have had to change everything from Xen to Esxi because this feature is not there. hypervisor.cpuid.v0=false is the only line I have to enter for it to hide the GPU's in all our servers.

It's a shame as i understand it isn't a priority but this has been asked for many years from the google searches i have done. When this feature is available I will come back to Xen unfortunately until then I have to use Esxi

olivierlambert

Hi!

Thanks for the feedback, my previous message still apply

popescunsergiu

@olivierlambert Sorry to disturb you with the same problem
I have an HP DL360 Gen9 with xcp-ng 8.2. I added a quadro p600 GPU and I get error 43 when I tried to passthrough it to a Windows 10 VM

olivierlambert

Quadro cards are banned from NV driver to be used in passthrough?

imtrobin

I can passthrough M60 fine so I don't think Quadro can't passthrough. Its likely you installed the wrong driver.

eangulus

@olivierlambert Your points are valid, but, I guess just like many others:

(contribute to speed up the process), don't have the skills, time and or resources to do it ourselves, and besides wouldn't know where to start to code this in.
Again, money. Just because we are business, doesn't mean we have money to spend on things like this. It's not so much as we won't spend money, its more like there are other priorities, especially considering that we have only just come out of a very severe and long drought, and COVID-19, which has severely impacted our business. So on one hand, having this feature would be a large help, as we can cut corners a little on GPU's, but on the other hand, sorting this feature out ourselves is resources and such that we simply don't have spare.

In our case, we have plenty of redundant systems, to the point I can easily, and comfortably move everything over to VMWare so we have this feature. Like others, we simply have to take the path of least resistance.

Personally not sure if we will move platforms yet as we can get by on some M4000 cards for now.

olivierlambert

It's an Open Source platform, you have the choice Obviously, if there's not enough traction here (community and/or business), I suppose you understand why it's not a priority for us

TheFrisianClause

@eangulus Sorry for bumping this topic, but I just got my Quadro M4000 card configured and passed through but for some reason I still get o devices were found when doing 'nvidia-smi'. How did you manage to get the M4000 working in XCP-NG?

abufrejoval

@TheFrisianClause

From my experiments using a GTX1080ti you'll have to follow the instructions for generic pcie-passthrough to the letter, which mostly means that the passthrough needs to be done on both sides, on the Dom0 for relinquishing device control and for the DomU to pick up the device (section 5). Perhaps now that the restrictions from Nvidia have gone, Vates will kindly include some GUI support for those operations in XOA.

Note, that if your dGPU has multiple devices (e.g. my GTX 1080ti also has a USB-C controller on-board), both entries need to be added in a single 'xe vm-param-set' statement, otherwise only the latter device (USB in my case) will wind up in the VM.... (yeah, at least 30 minutes of puzzling on that one)

Of course, if the dGPU is your console device, it means flying blind afterwards, but I'm getting used to that with all the recent iGPUs as well (then again, I have some DisplayLink hardware that's currently unused and EL7/8 drivers for those have popped up recently...)

Thankfully the dreaded error 43 issues have gone away with the more recent Nvidia drivers, sadly Kepler support has been retired (got a lot of those still around), so you may want to preserve the latest CUDA 11.4 release as one that offers both, but for Maxwell you should still be fine.

Before trying to diagnose with the Nvidia drivers, you should be able to see the device transition via lspci on both sides, Dom0 and DomU.