XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Using ipmitool locally in a VM?

    Scheduled Pinned Locked Moved Hardware
    6 Posts 2 Posters 408 Views 2 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • gskgerG Offline
      gskger Top contributor
      last edited by gskger

      I am using a Dell Poweredge R720 with two Nvidia P40 to learn about Large Language Models (LLMs) and machine learning in general. XCP-ng is installed and both P40s are pass-throughed (if that word exists) to a Debian 12 VM that runs Ollama in Docker. I do powerlimit both GPUs, but during inference the passively cooled GPUs sometimes still get hot anyway.

      On a bare metal Debain 12 install, I can use nvdia-smi to read the max. GPU temperatur and ipmitool to read the max. inlet/outlet/CPU temperatur and set the fan speed dynamicly, again with ipmitool. That works reliably.

      I don't want to use ipmitool -H {ip_address} -U {username} -P {password} {command} (which seems to be the #1 recommendation) because I don't want to punch a hole in the firewall to give this VM access to the management network. Another control VM that queries both temperature sources (iDRAC via ipmitool, Ollama Debian VM with P40 Paththrough via SSH and nvidia-smi) would also work, but feels to complicated.

      Any idea if something like running ipmitool locally in a Debain 12 VM can be achieved with XCP-ng?

      1 Reply Last reply Reply Quote 0
      • olivierlambertO Online
        olivierlambert Vates 🪐 Co-Founder CEO
        last edited by

        Hi,

        Just to recap: you want some sever hardware info on one end, and GPU info on the other, right? And both are isolated from different OS. So you seek a way to consolidate that? Because by design, it's isolated, so you cannot get GPU info from the Dom0 if it's passed to a VM, and you cannot get host/server info in the Debian VM itself.

        gskgerG 1 Reply Last reply Reply Quote 0
        • gskgerG Offline
          gskger Top contributor @olivierlambert
          last edited by gskger

          Yes, that is my challenge.

          In the Debian VM, I can use nvidia-smi to get GPU info from the GPUs that are passed through by XCP-ng to the VM. I can not use ipmitool localy in the Debian VM to get host/server info and control the fan speed (most likely because /dev/ipmi0 is visible in DOM0 on my Dell R720 but not visible in the VM). One option would be to use IPMI Over LAN to give the VM access to the iDRAC interface, but that is in the management VLAN.

          My thought is to dynamically control the fan speeds from within the VM that creates the thermal load, or even turn the VM off when the load exceeds a certain critical threshold.

          1 Reply Last reply Reply Quote 0
          • olivierlambertO Online
            olivierlambert Vates 🪐 Co-Founder CEO
            last edited by olivierlambert

            What you can do: recording this info in the xenstore (from inside the Debian 12 VM) and fetch the data in the Dom0 by reading the XenStore. That's the right way to communicate outside the VM 🙂 (and between VMs).

            Then, from the dom0 (or any XAPI capable tool eg via HTTP, like XO 😄 ) you can take global decisions.

            gskgerG 1 Reply Last reply Reply Quote 0
            • gskgerG Offline
              gskger Top contributor @olivierlambert
              last edited by

              Mh, that sounds interesting, but I never done this. Can you suggest a starting point, example or documentation to get started?

              1 Reply Last reply Reply Quote 0
              • olivierlambertO Online
                olivierlambert Vates 🪐 Co-Founder CEO
                last edited by olivierlambert

                You should have a command available in your Debian VM with tools, called "xenstore". You can use xenstore-write.

                For example:

                xenstore-write vm-data/gputemp 82
                

                This will write the value 82 in a key gputemp. This key/value can be seen in the VM object then, eg with a :

                xe vm-param-get param-name=xenstore-data  param-key=vm-data/gputemp uuid=<VM UUID>
                

                This will return 82. Now you can do whatever script in your Dom0 (or even from your XOA, since you can fetch all data in XAPI)

                As you can see, it's very simple and efficient 🙂

                1 Reply Last reply Reply Quote 1
                • First post
                  Last post