XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Passthru GPUs disappearing

    Scheduled Pinned Locked Moved Hardware
    1 Posts 1 Posters 112 Views 1 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • L Offline
      lightspeed
      last edited by

      I have raised this topic before but with no real resolution but I am hoping to readdress this issue.

      • We're on 8.2.1 XCP-ng with latest XOA etc
      • In this particular environment servers can have a maximum of 9 PCI cards that are 16x, and 1 card that can run 8x.
      • Physical servers are 100% patched, firmware patched etc, everything that can be updated is updated

      What we're seeing in host servers is that each server essentially can lose 1-2 of their GPUs. We're using NVIDIA Quadro T1000 (8GB) cards with 1 card being assigned to 1 VM using Passthru.

      What will happen is that a user is working and then poof their GPU disappears from windows, they get an alert etc. That GPU will be "gone" until I reboot the physical host server, it will come back and be useable but then within 24 hours of use it will disappear again.

      This issue doesn't happen on ALL cards, just a few. I have done some digging to see what the chances are that there's a physical card problem but the cards are all showing in the OS and lspci. I can see those cards are there, but they essentially get locked and are no longer assignable even if I restart the toolstack.

      I am at a loss, it's puzzling and causing a lot of issues lol

      1 Reply Last reply Reply Quote 0
      • First post
        Last post