XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Nvidia Quadro P400 not working on Ubuntu server via GPU/PCIe passthrough

    Scheduled Pinned Locked Moved Compute
    106 Posts 8 Posters 28.7k Views 5 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • T Offline
      TheFrisianClause @olivierlambert
      last edited by

      @olivierlambert Great let me know how it goes! As I have bought an WX2100 but unfortunately this one cannot be used for transcoding in plex... So I have to get back to Proxmox again with the Quadro P400.

      1 Reply Last reply Reply Quote 0
      • T Offline
        TheFrisianClause
        last edited by

        Now back on Proxmox, although I also had some RMINIT errors on here, but these were related to the Hypervisor which were resolved pretty quick.

        Also on Proxmox I have to change some Grub parameters and such, isn't this something that has to be done on Xen as well? And then on hypervisor level?

        1 Reply Last reply Reply Quote 0
        • olivierlambertO Offline
          olivierlambert Vates 🪐 Co-Founder CEO
          last edited by

          Like what parameter exactly? I think Xen doesn't support yet hiding the hypervisor information.

          T 1 Reply Last reply Reply Quote 0
          • T Offline
            TheFrisianClause @olivierlambert
            last edited by

            @olivierlambert

            Parameters such as these in /etc/default/grub:

            GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on"
            GRUB_CMDLINE_LINUX="textonly video=astdrmfb video=efifb:off"

            Also someone replied to the topic I created on Nvidia forums:
            https://forums.developer.nvidia.com/t/xcp-ng-ubuntu-vm-error-quadro-p400/199084

            1 Reply Last reply Reply Quote 0
            • olivierlambertO Offline
              olivierlambert Vates 🪐 Co-Founder CEO
              last edited by

              That answer is incorrect. Passing an entire PCIe device shouldn't make a diff.

              Maybe it's a problem on the IOMMU side, I don't know. It will be easier to work on it with an actual card.

              T 1 Reply Last reply Reply Quote 0
              • T Offline
                TheFrisianClause @olivierlambert
                last edited by

                @olivierlambert In that case, lets hope you can resolve it once the P400 is delivered. Curious what you find and how there is a way to resolve the issue...

                1 Reply Last reply Reply Quote 0
                • olivierlambertO Offline
                  olivierlambert Vates 🪐 Co-Founder CEO
                  last edited by

                  I have no idea and not great hope due to our other priorities but we'll see.

                  T 1 Reply Last reply Reply Quote 0
                  • T Offline
                    TheFrisianClause @olivierlambert
                    last edited by

                    @olivierlambert I think the conclusion we can make is that I need to hide the Hypervisor from the Nvidia driver which is also mentioned here: https://www.reddit.com/r/XenServer/comments/r12p0q/pci_passthrough_quadro_p400_to_ubuntucentos_vm/

                    So it is an Error 43, as I think this is plausible as for Proxmox I do this as well by adding this into the vm .conf file

                    cpu: host,hidden=1,flags=+pcid

                    Which hides the hypervisor from the VM in KVM perspective.
                    Is there an equivalent for XCP-NG?

                    1 Reply Last reply Reply Quote 0
                    • olivierlambertO Offline
                      olivierlambert Vates 🪐 Co-Founder CEO
                      last edited by

                      1. No equivalent in Xen yet
                      2. Nvidia changed its policty recently to avoid blocking virt in their drivers. So the problem should not be here.
                      T 1 Reply Last reply Reply Quote 0
                      • T Offline
                        TheFrisianClause @olivierlambert
                        last edited by

                        @olivierlambert Could it be something with the 'VFIO' modules maybe in KVM? I honestly have no clue anymore... So I think my best guess is to wait your research on this out...

                        1 Reply Last reply Reply Quote 0
                        • olivierlambertO Offline
                          olivierlambert Vates 🪐 Co-Founder CEO
                          last edited by

                          I just got the card, but my agenda is very very busy ATM. I'll try to do the PCI passthrough on my spare time (which is not very often either)

                          T 1 Reply Last reply Reply Quote 0
                          • T Offline
                            TheFrisianClause @olivierlambert
                            last edited by

                            @olivierlambert No problem take your time, I will check in regularly to see if there has been an update or some sorts... 🙂

                            1 Reply Last reply Reply Quote 0
                            • olivierlambertO Offline
                              olivierlambert Vates 🪐 Co-Founder CEO
                              last edited by

                              Okay so doing some tests now, I can reproduce the issue. So the questions are:

                              1. Did it work before? (older versions of XCP-ng?) -> removing regressions from the equation
                              2. Is P400 limited for PCI passthough by NV driver? It's still not clear. If it's the problem, this require a code change in upstream Xen to be able to hide it.
                              T 1 Reply Last reply Reply Quote 0
                              • T Offline
                                TheFrisianClause @olivierlambert
                                last edited by

                                @olivierlambert

                                I have no idea if it worked on the earlier versions of XCP-NG, I don't think so as I have I believe tested this on XCP-NG 8.0 and 8.1 if I remember correctly. (Also created forum posts about this in 2019/2020).

                                I dont think it is limited for PCI passthrough as I am using an NVidia driver on the VM within proxmox without any issues.

                                I am pasting a screenshot of the Nvidia driver I am currently using on the VM inside proxmox:
                                96f5b776-37e8-4212-b345-83e9664f7804-image.png

                                1 Reply Last reply Reply Quote 0
                                • olivierlambertO Offline
                                  olivierlambert Vates 🪐 Co-Founder CEO
                                  last edited by

                                  That might be because Proxmox is hiding the hypervisor underneath. Hard to tell because of this fracking drivers 😕

                                  T 1 Reply Last reply Reply Quote 0
                                  • T Offline
                                    TheFrisianClause @olivierlambert
                                    last edited by TheFrisianClause

                                    @olivierlambert
                                    Hmm yeah its quite a hassle with these drivers for some reason.... If you need some extra information which could help let me know... I can send some other details which are now on the proxmox host and the Ubuntu VM?

                                    How about the VFIO modules which I also mentioned earlier? Is this something that has to be added to XCP-NG maybe? As I also have a topic on Reddit and this person has the same problem but then with an T400.
                                    https://www.reddit.com/r/XenServer/comments/r12p0q/pci_passthrough_quadro_p400_to_ubuntucentos_vm/hrlaqxl/?context=3

                                    1 Reply Last reply Reply Quote 0
                                    • olivierlambertO Offline
                                      olivierlambert Vates 🪐 Co-Founder CEO
                                      last edited by

                                      If it's hypervisor detection, the "only" thing needed is a Xen modification, but this is not trivial (if it's really that). I can assume it's the case.

                                      In the meantime, can you double check if XCP-ng 7.6 is affected too? (last hope to check if it's not a regression).

                                      T 1 Reply Last reply Reply Quote 0
                                      • T Offline
                                        TheFrisianClause @olivierlambert
                                        last edited by

                                        @olivierlambert Can try that on my spare server, will try and see if I can do it today. I will update this once I am finished.

                                        T 1 Reply Last reply Reply Quote 0
                                        • T Offline
                                          TheFrisianClause @TheFrisianClause
                                          last edited by

                                          Currently I have no time to test this as the machine itself is also heavily used by other users.... But I believe the 7.6 version has this issue as well, as I remember testing this on version 7.x.

                                          1 Reply Last reply Reply Quote 0
                                          • T Offline
                                            TheFrisianClause
                                            last edited by

                                            Alright tested it with 7.6
                                            Seems to not work as well...

                                            [  165.594038] [drm] [nvidia-drm] [GPU ID 0x00000006] Loading driver
                                            [  165.594040] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:00:06.0 on minor 1
                                            [  171.958377] NVRM: GPU 0000:00:06.0: RmInitAdapter failed! (0x22:0x56:667)
                                            [  171.958424] NVRM: GPU 0000:00:06.0: rm_init_adapter failed, device minor number 0
                                            [  171.963805] NVRM: GPU 0000:00:06.0: RmInitAdapter failed! (0x22:0x56:667)
                                            [  171.963848] NVRM: GPU 0000:00:06.0: rm_init_adapter failed, device minor number 0
                                            

                                            Same error....

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post