XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Firepro S7150x2 SR-IOV Errors

    Scheduled Pinned Locked Moved Compute
    27 Posts 6 Posters 4.5k Views 1 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • T Offline
      tuxen Top contributor @tbluml
      last edited by tuxen

      @tbluml did you try the pci=realloc workaround, as stated in the RHEL link?

      # /opt/xensource/libexec/xen-cmdline --set-dom0 pci=realloc
      

      Edit: reboot the host after applying the change.

      T 1 Reply Last reply Reply Quote 0
      • T Offline
        tbluml @tuxen
        last edited by

        @tuxen Just tried it (from the terminal), and rebooted with the same result unfortunately. Does the command need to be appended to a file, or should it work just from the terminal?

        83d2c07f-180f-42a1-aa12-e44d72e6d8a0-image.png

        T 1 Reply Last reply Reply Quote 0
        • T Offline
          tuxen Top contributor @tbluml
          last edited by

          It's from the terminal/CLI. Alternatively, you can verify/change the boot options in /boot/grub/grub.cfg (for dom0 boot, see module2 /boot/vmlinuz entries).

          Found this Citrix KB adding one more pci option, take a look:
          https://support.citrix.com/article/CTX250121

          1 Reply Last reply Reply Quote 0
          • T Offline
            tbluml
            last edited by

            For the moment, I took the S7150x2 out of the R720 and put it in a Supermicro X10DRH-CT-O with E5-2620v3's for testing. After everything was set up, (BIOS, OS, and driver), I found that MxGPU did work. (Good to know that if all else fails, I have a machine that will work for what I need!)

            I will take a look at that, @tuxen! Thank you!

            1 Reply Last reply Reply Quote 0
            • R Offline
              r1 XCP-ng Team
              last edited by

              @tbluml Do you want to give a try to open source gim driver on your Dell machine? We may know more from it.

              1 Reply Last reply Reply Quote 0
              • T Offline
                tbluml
                last edited by

                I had the chance to try the rest of the commands linked by @tuxen today, and now I can successfully run a VM with MxGPU enabled and started! It looks like adding "pci=assign-busses" to this command did it.

                /opt/xensource/libexec/xen-cmdline --set-dom0 "pci=realloc pci=assign-busses" 
                

                Thank you all for you assistance!

                P 1 Reply Last reply Reply Quote 2
                • olivierlambertO Offline
                  olivierlambert Vates 🪐 Co-Founder CEO
                  last edited by

                  That's interesting! Maybe you can add this to the documentation?

                  T 1 Reply Last reply Reply Quote 0
                  • T Offline
                    tbluml @olivierlambert
                    last edited by

                    @olivierlambert I would be happy to. Is there a post or link to posting guidelines? (So I can make sure that what I write is in line with what has already been written?)

                    1 Reply Last reply Reply Quote 0
                    • olivierlambertO Offline
                      olivierlambert Vates 🪐 Co-Founder CEO
                      last edited by olivierlambert

                      Here: https://xcp-ng.org/docs/compute.html#mxgpu-amd-vgpu

                      There's a link on the bottom off the page (called "Help us to improve this page!") to contribute to it and add what you did 🙂

                      1 Reply Last reply Reply Quote 0
                      • P Offline
                        pigeon @tbluml
                        last edited by pigeon

                        @tbluml I'm trying to make a MxGPU setup with similar hardware (dell r720, 2x E5-2650).
                        I got the same SR-IOV errors as you. I added the pci=realloc pci=assign-busses params.
                        Unfortunately the the system does not manage to boot when adding pci=assign-busses.
                        Root disk in not discovered and dracut shell is started.
                        Did you run into the same issue and if so how did you fix it?

                        Edit:
                        If anyone else stumbles upon this.
                        I reinstalled on (usb) disk that is not connected to the raid controller and it seems to work now.
                        I speculate that since the controller is a PCI device and pci=assign-busses allows the kernel to override pci numbers the raid device cannot be found using the predetermined data in the initramfs. But that might be complete nonsense (no expert in these matters).

                        1 Reply Last reply Reply Quote 0
                        • M Offline
                          m6
                          last edited by m6

                          I have the same problem with xcp-ng-8.2. I'm trying to start with mxgpu with HPE ML380p Gen8 E5-2620V2. Inserting pci=realloc pci=assign-busses the server cannot boot. Below the point of boot where it crashes.
                          The log in images seems to recall a known bug --> "choose an explicit smt=(bool) setting. See XSA-297"
                          It's the pci=assign-busses that cannot permit to boot but without it "modprobe gim" has not inserted. Also using usb disk avoiding PCI disk system crashs during startup. Firmware bios is really recent ( 2019 ) , the last one. Someone has resolved this issue ?crash-assign-busses.jpg

                          1 Reply Last reply Reply Quote 0
                          • First post
                            Last post