@l1c May I ask, what amd gpu you are using, and maybe also what brand and make of motherboard/cpu?
msi x570 unify
So there is a huge difference in our scenarios. AMD vs INTEL.
IOMMU vs VT-d.
It may be that my issue lies with the VT-d implementation on my MSI big bang xpower ii mainboard. Its quite old, so is the cpu.
Its basicly from the time that VT-d functionality was just starting to make its way to consumer grade hardware, and not only 6k usd xeons with equallu expensive supermicro motherboards and alike.
Not many needed this on consumer hardware at the time, and the mobo vendors cut many corners. The list of consumer mobos with broken VT-d implementations from then is quite extensive.
Maybe I should just get a used old supermicro mobo from ebay instead. Or maybe its the VT-d implementation of the old i7-3930k cpu, and just picking up an old xeon from ebay would work better. I just like the idea of repurposing this now very old gaming rig to something usefull. Its still quite beefy tbh.
I havent found any official documentation, but Ive been poking around a bit and so far I have found some information that seems interesting.
It looks like xen allways starts vms using the quem.
65540 23960 1.5 0.2 261356 14740 ? SLl Sep11 11:09 qemu-dm-5 -machine pc-0.10,accel=xen,max-ram-below-4g=4026531840,allow-unassigned=true,trad_compat=true -vnc unix:/var/run/xen/vnc-5,lock-key-sync=off -monitor null -xen-domid 5 -m size=4096 -boot order=cdn -usb -device usb-tablet,port=2 -smp 4,maxcpus=4 -serial pty -display none -nodefaults -trace enable=xen_platform_log -sandbox on,obsolete=deny,elevateprivileges=allow,spawn=deny,resourcecontrol=deny -S -global PIIX4_PM.revision_id=0x1 -global ide-hd.ver=0.10.2 -global piix3-ide-xen.subvendor_id=0x5853 -global piix3-ide-xen.subsystem_id=0x0001 -global piix3-usb-uhci.subvendor_id=0x5853 -global piix3-usb-uhci.subsystem_id=0x0001 -global rtl8139.subvendor_id=0x5853 -global rtl8139.subsystem_id=0x0001 -parallel null -qmp unix:/var/run/xen/qmp-libxl-5,server,nowait -qmp unix:/var/run/xen/qmp-event-5,server,nowait -device xen-platform,addr=3,device-id=0x0001,revision=0x2,class-id=0x0100,subvendor_id=0x5853,subsystem_id=0x0001 -drive file=,if=ide,index=3,media=cdrom,force-lba=off -drive file=/dev/sm/backend/2c466dab-0424-67c4-3a1b-3257cab0cf54/695c548c-5600-4c9b-b785-5de42747b0a5,if=ide,index=0,media=disk,force-lba=on,format=raw -device rtl8139,netdev=tapnet0,mac=0a:c2:0e:56:dd:ba,addr=4 -netdev tap,id=tapnet0,fd=7 -device VGA,vgamem_mb=8,rombar=1,romfile=,subvendor_id=0x5853,subsystem_id=0x0001,addr=2,qemu-extended-regs=false -vnc-clipboard-socket-fd 4 -xen-domid-restrict -chroot /var/xen/qemu/root-5 -runas 65540.998
I then looked around inside xcp-ng for different quem binarys and config files, and what struck me first is this:
[root@localhost ~]# /usr/lib64/xen/bin/qemu-system-i386 -machine help
Supported machines are:
pc-i440fx-2.9 Standard PC (i440FX + PIIX, 1996)
pc-i440fx-2.8 Standard PC (i440FX + PIIX, 1996)
pc-i440fx-2.7 Standard PC (i440FX + PIIX, 1996)
pc-i440fx-2.6 Standard PC (i440FX + PIIX, 1996)
pc-i440fx-2.5 Standard PC (i440FX + PIIX, 1996)
pc-i440fx-2.4 Standard PC (i440FX + PIIX, 1996)
pc-i440fx-2.3 Standard PC (i440FX + PIIX, 1996)
pc-i440fx-2.2 Standard PC (i440FX + PIIX, 1996)
pc Standard PC (i440FX + PIIX, 1996) (alias of pc-i440fx-2.10)
pc-i440fx-2.10 Standard PC (i440FX + PIIX, 1996) (default)
pc-i440fx-2.1 Standard PC (i440FX + PIIX, 1996)
pc-i440fx-2.0 Standard PC (i440FX + PIIX, 1996)
pc-i440fx-1.7 Standard PC (i440FX + PIIX, 1996)
pc-i440fx-1.6 Standard PC (i440FX + PIIX, 1996)
pc-i440fx-1.5 Standard PC (i440FX + PIIX, 1996)
pc-i440fx-1.4 Standard PC (i440FX + PIIX, 1996)
pc-1.3 Standard PC (i440FX + PIIX, 1996)
pc-1.2 Standard PC (i440FX + PIIX, 1996)
pc-1.1 Standard PC (i440FX + PIIX, 1996)
pc-1.0 Standard PC (i440FX + PIIX, 1996)
pc-0.15 Standard PC (i440FX + PIIX, 1996)
pc-0.14 Standard PC (i440FX + PIIX, 1996)
pc-0.13 Standard PC (i440FX + PIIX, 1996)
pc-0.12 Standard PC (i440FX + PIIX, 1996)
pc-0.11 Standard PC (i440FX + PIIX, 1996)
pc-0.10 Standard PC (i440FX + PIIX, 1996)
pc-q35-2.9 Standard PC (Q35 + ICH9, 2009)
pc-q35-2.8 Standard PC (Q35 + ICH9, 2009)
pc-q35-2.7 Standard PC (Q35 + ICH9, 2009)
pc-q35-2.6 Standard PC (Q35 + ICH9, 2009)
pc-q35-2.5 Standard PC (Q35 + ICH9, 2009)
pc-q35-2.4 Standard PC (Q35 + ICH9, 2009)
q35 Standard PC (Q35 + ICH9, 2009) (alias of pc-q35-2.10)
pc-q35-2.10 Standard PC (Q35 + ICH9, 2009)
isapc ISA-only PC
none empty machine
xenfv Xen Fully-virtualized PC
xenpv Xen Para-virtualized PC
This is the extensive list of supported -machine arguments that the xens qem binary claims is supported. And as I suspected that i440 is the default. Its used as -machine pc-0.10 when its called by xen.
I havent found anyway, yet, to make xen switch that -machine paramater. But I tried the most obvious wich was to add machine type as a key to the platform paramater inside a template like this:
xe template-param-set uuid=552bce37-51b2-445d-84f2-5f33fa112d7e platform:machine=pc-q35-2.10
verified that it was added:
[root@localhost ~]# xe template-param-list uuid=552bce37-51b2-445d-84f2-5f33fa112d7e | grep platform
platform (MRW): machine: pc-q35-2.10; hpet: true; nx: true; device-model: qemu-upstream-compat; pae: true; apic: true; viridian: true; acpi: 1
Then made a vm using that template. It didnt do anything, quem was still spawned using -machine pc-0.10
There is another paramater in the template that could be
hardware-platform-version ( RO): 0
But its readonly, so cant be changed using xe template-param-set
Anyway, I dont know if the machine paramater used is hardcoded or not. There are also different quem binarys available:
The one spawned by xen when launching a VM is the quem-dm or maybe the wrapper. And that binary does not have the same machine support list:
[root@localhost ~]# /usr/libexec/xen/bin/qemu-dm -M help
Supported machines are:
xenfv Xen Fully-virtualized PC (default)
xenpv Xen Para-virtualized PC
Anyway, the fact that there is a quem binary claiming to support q35, i am getting my hopes up that it can infact be done. We allready know that the normal quem has the support. And I belive it may just be a matter of making a template that can choose wich binary and paramters to use.
Maybe even add kernel support, i havent looked at the xcp-ng kernel source to see what virtualization is enabled or not.
If I have time later this weekend, i will download a build-env docker and start to poke around the xcp-ng kernel and build one with any missing virtualizations missing.
I was planning to do that anyway to see if i can figure out what changes was made in regards to my problem using any xcp-ng version above 7.6 with sucess.
Maybe I will look at some of the other sources as well to see if I can find if the quem binary used is hardcoded, and the if maybe some of the paramaters passed to it, like -machine, is hardcoded as well.
Thats it for now. Sorry for the long and messy post.
I havent spent very much time on this yet.
And I have never looked at xen sources or dived into how it works. So please bare with me if my poking around will take some time.
But for now, i dont really see any reason why xcp-ng shouldnt be able to run the normal quem, instead of the xen modified quem and connect to xcp-centers console with vnc, as it does with the xen modified quem.
correct me if Im completly wrong, please:)
a couple ideas to try later is to just modify a template.json and reinstall the templates. Modifying only this part:
to the q35.
However, that alone will not be enough, next would be to
modify this script to make it use the qemu-system-i386 binary: /usr/libexec/xenopsd/qemu-dm-wrapper
Maybe make it look for the -m switch, and run the i386 binary only incase of -m q35, to keep it all running compatible with how it is now. That would only call the i386 binary if q35 switch is detected.
But before I start wasting any time on this, is there any chance this could work?
Or would I first need to enable anything virtualization-wise in the dom0 kernel or maybe even in the xen kernel?
I took some time to test Q35 on qemu+kvm on ubuntu.
Q35 is amazing. Not only can I now use the newest amd driver(20.9.2) as opposed to 18.4.1 that was the highest driver I could make work with xen/i440, but Q35 also seem to be handling FLR perfectly. Using i440 I would often have to shutdown dom0 to make the GPU reset and be able to use it in again in a vm after a vm reboot. With Q35 I have not had a single reset issue, and I have tried hard. I have rebooted the vm repeatedly both soft and hard, and not once has it locked up on reset.
I also did a test with i440 on KVM to see if my issue with driver and reset bug was related to kvm/xen or the emulated chipset i440 vs q35. And the issues are the same with i440 on kvm, would not accept driver above 18.4.1 and hard reset of VM would cause gpu reset issue, only fixable by power cycling host(reboot is not enough. complete shutdown is needed).
I therefore conclude the chipset is the key factor.
I do have to wonder why xen is having a hardtime making Q35 work tho, concidering Xen basicly is/was using modified qemu. I am thinking making Q35 work for Xen has not been a priority. But it really should!
This works so well, Ive decided to build a new desktop to use this.
It's not that simple to get q35 working in Xen, however, it's in our TODO list for 2021. It's indeed not a priority for Xen project right now, because security issues are using 99% of main Xen devs.
No rush, I am not even expecting it to ever happen.
Xen works well as-is for server usage. And you know, maybe thats great too.
Kvm/Qemu for desktop, and Xen for servers.
Better devs focus on keeping Xen secure and updated, then spreading thin to add features most of Xen's userbase dont even need and devs cant keep updated.
On the other hand, GPGPU workloads is quickly gaining ground, so I actually believe this feature could be very important for Xen in the future.
Consumer grade GPUs are super cheap, and having good support within Xen could potentially make it a lot easier for startups and small company's to use GPUs on a larger scale.
I am afraid Kvm will "win" users if it keeps lagging behind Xen on features.
I guess we allready seen it happen(amazon aws comes to mind...).
I will still be choosing Xen wherever applicable
keep up the great and important work you all do, Xcp-ng has my love.
No need to convince me, that's why here at Vates, we already planned to work on q35.
I was just talking about biggest Xen contributors, who might have other priorities.