Using ipmitool locally in a VM?
-
I am using a Dell Poweredge R720 with two Nvidia P40 to learn about Large Language Models (LLMs) and machine learning in general. XCP-ng is installed and both P40s are pass-throughed (if that word exists) to a Debian 12 VM that runs Ollama in Docker. I do powerlimit both GPUs, but during inference the passively cooled GPUs sometimes still get hot anyway.
On a bare metal Debain 12 install, I can use
nvdia-smi
to read the max. GPU temperatur andipmitool
to read the max. inlet/outlet/CPU temperatur and set the fan speed dynamicly, again withipmitool
. That works reliably.I don't want to use
ipmitool -H {ip_address} -U {username} -P {password} {command}
(which seems to be the #1 recommendation) because I don't want to punch a hole in the firewall to give this VM access to the management network. Another control VM that queries both temperature sources (iDRAC viaipmitool
, Ollama Debian VM with P40 Paththrough via SSH andnvidia-smi
) would also work, but feels to complicated.Any idea if something like running
ipmitool
locally in a Debain 12 VM can be achieved with XCP-ng? -
Hi,
Just to recap: you want some sever hardware info on one end, and GPU info on the other, right? And both are isolated from different OS. So you seek a way to consolidate that? Because by design, it's isolated, so you cannot get GPU info from the Dom0 if it's passed to a VM, and you cannot get host/server info in the Debian VM itself.
-
Yes, that is my challenge.
In the Debian VM, I can use
nvidia-smi
to get GPU info from the GPUs that are passed through by XCP-ng to the VM. I can not useipmitool
localy in the Debian VM to get host/server info and control the fan speed (most likely because/dev/ipmi0
is visible in DOM0 on my Dell R720 but not visible in the VM). One option would be to use IPMI Over LAN to give the VM access to the iDRAC interface, but that is in the management VLAN.My thought is to dynamically control the fan speeds from within the VM that creates the thermal load, or even turn the VM off when the load exceeds a certain critical threshold.
-
What you can do: recording this info in the xenstore (from inside the Debian 12 VM) and fetch the data in the Dom0 by reading the XenStore. That's the right way to communicate outside the VM (and between VMs).
Then, from the dom0 (or any XAPI capable tool eg via HTTP, like XO ) you can take global decisions.
-
Mh, that sounds interesting, but I never done this. Can you suggest a starting point, example or documentation to get started?
-
You should have a command available in your Debian VM with tools, called "xenstore". You can use
xenstore-write
.For example:
xenstore-write vm-data/gputemp 82
This will write the value
82
in a keygputemp
. This key/value can be seen in the VM object then, eg with a :xe vm-param-get param-name=xenstore-data param-key=vm-data/gputemp uuid=<VM UUID>
This will return
82
. Now you can do whatever script in your Dom0 (or even from your XOA, since you can fetch all data in XAPI)As you can see, it's very simple and efficient