XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Issue with SR-IOV mxGPU after changing CPU

    Scheduled Pinned Locked Moved Solved Compute
    5 Posts 2 Posters 683 Views 1 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Spunky_SurveyorS Offline
      Spunky_Surveyor
      last edited by

      I have a 2 XCP-NG servers which each has 2 AMD FirePro S7150x2 cards (which have 2 gpus per card) in it and have been happily using vgpu for a few months now, recently I changed the CPU one of them so that I could create a pool for easier management and now for some reason I can only utilize one of the 1 gpu on each of the cards in the server. Whenever I try to boot a VM using more than those 2, I get the following error.

      INTERNAL_ERROR(xenopsd internal error: Cannot_add(0000:0d:02.0, Device_common.QMP_Error(22, "{\"error\":{\"class\":\"GenericError\",\"desc\":\"Mapping machine irq 0 to pirq -1 failed: Operation not permitted\",\"data\":{}},\"id\":\"qmp-000019-22\"}")))
      

      What I have tried so far is unassigned the PCI devices and tool-stack restart but there is no difference. I do not think I will be able to restart the hosts until the weekend.

      Has anyone worked with vgpu in XCP-NG see this before? Thanks!

      TheNorthernLightT 1 Reply Last reply Reply Quote 0
      • TheNorthernLightT Offline
        TheNorthernLight @Spunky_Surveyor
        last edited by

        @spunky_surveyor Random question, if you put the old CPU back, does it suddenly work?

        Spunky_SurveyorS 1 Reply Last reply Reply Quote 0
        • Spunky_SurveyorS Offline
          Spunky_Surveyor @TheNorthernLight
          last edited by

          @thenorthernlight I did not try that as yet, but i was able to do a host reboot and now the host doesn't recognize one of the cards. I have hunch the GPU or its riser card perhaps was not properly seated after the service. I will update again after reseating the risers and GPUS to see if that works. If not then I will try putting the old ones back in.

          TheNorthernLightT 1 Reply Last reply Reply Quote 1
          • TheNorthernLightT Offline
            TheNorthernLight @Spunky_Surveyor
            last edited by

            @spunky_surveyor Definitely sounds like an install issue since other items are being affected. Dont forget to check your bios for voltage settings. This causes ALL SORTS of problems if you dont do a reset and re-learn on your BIOS. I dont know what hardware you have, but on most modern Dell's they have an option to re-run performance testing on boot when hardware changes. This fixes issues like wrong voltage settings, etc.

            Spunky_SurveyorS 1 Reply Last reply Reply Quote 0
            • Spunky_SurveyorS Offline
              Spunky_Surveyor @TheNorthernLight
              last edited by

              @thenorthernlight So I was able to open up the host and remove all PCIE cards and riser cards. At a glace everything appears to have made proper contact, however after reseating the cards and booting again and sure enough it appears to work again. I did not have to run any testing in the bios although the server does do a pre boot inventory check each time for changes. Host is HPE Proliant DL380 G9.

              1 Reply Last reply Reply Quote 0
              • Spunky_SurveyorS Spunky_Surveyor has marked this topic as solved on
              • First post
                Last post