@stormi Thanks for the links and I will start to dig in to them more to learn how it can be done. In a quick review of the links, this may be what I need since I do not think that I will actually need to re-compile XCP-ng, but just add for the setup of some additional installation steps via rpms and yum as well as some configurations at this point.
This is really an experimental endeavor for the moment while it also allows me to learn more about the whole installation and setup of XCP-ng along the way.
In some of the latest read that I have been doing, it seems that the XenServer folks were also investigating the ideas of so called "Sub Domains" or "Driver Domains" via Mini-OS and MirageOS as minimal OS's to deaggregate Dom0 to take drivers out and make them independently sufficient such that if something happens to one of them then the rest of the system remains unaffected. This links into the ideas of Unikernels but I think that a lot of this is still in the R&D phase although some great ideas, to be sure.
I could see a whole set of Sub-Domains to handle just about all of the functionality that Dom0 currently manages in perhaps a future evolution.
The GPU device must be isolated on the host with the vfio kernel driver. To ensure this, the vfio driver must load first, prior to any vendor or open source driver.
GPU must be connected to the guest VM via PCI pass-through. No surprise.
The CPU must not be identified as a virtual one, it must have some other identity when probed. This appears to be the key to preventing the dread NVidia Error 43; it suggests the driver is just examining the CPU assigned to it, although some documentation mentions a "vendor" setting. The work-around is to make it into a string it doesn't match against, and it just works. Even a setting of "unknown" is shown to work. I don't know if there is a way to specify in a XCP guest "please don't identify yourself as virtual".
For cards that are CUDA capable but "unsupported" by NVidia, you install the software in a difference sequence (CUDA first, then driver).
Disclaimer: I'm just compiling a list to get an idea about what to do; I haven't done the actual install, nor do I have the hardware. Hopefully this helps.
From my admin-view:
I wonder about the benefits of that (and if work shouldn't be put into other usefull things). If no monitoring is connected / some autonotify via E-Mail or such, to me it's rather useless. I don't really care since when a device is broken, when it already happened $somewhen ago - I need ASAP notify of it, when it happens. Like some health monitoring for XCP, however realized (via XenCenter, central Mail service, SNMP, Nagios/Montoring plugins...)
Don't be sorry: there is some tool designed to do something, and other for other things. You don't expect that your everyday car will fly. It would be the same thing if XO was working in the browser only. We made that choice (to have a server part) because we knew we'll push for features requiring to run 24/7.
You can boot a VM on PXE by selecting the right boot order (enabling "Network" in boot order in XO for example)
Then you need to setup a PXE server, you can fine various guides by googling "PXE server setup", eg this one)