@gduperrey Installed on some R720s with GPUs and everything works as expected. Looks good
Edit: also installed the update on my playlab hosts and a Dell PowerEdge R730. Again no issues and all hosts and VMs are up again.
@gduperrey Installed on some R720s with GPUs and everything works as expected. Looks good
Edit: also installed the update on my playlab hosts and a Dell PowerEdge R730. Again no issues and all hosts and VMs are up again.
@jasonnix Not an expert on this topic, but my understanding is that 1 vCPU can be understood as 1 CPU core or - when available - 1 CPU thread. Since vCPU's are used for resource allocation by the hypervisor, vCPU overprovisioning can make things more complicated.
VMs or memory-sensitive applications sometimes make memory locality (NUMA) decisions based on the topology of the vCPUs (sockets/cores) presented by the hypervisor, which is why you can choose different topologies. You can read more on NUMA affinity in the XCP-ng documentation. In a homelab, you rarely have to worry about this and as a rule of thumb, you can use virtual sockets with 1 core for the amount of vCPUs you need (which is the standard for XCP-ng and VMware ESXi).
Sometimes it still makes sense to keep the number of sockets low and the number of cores high, as sockets or cores can be a licensing metric that determine license costs. However, most manufacturers already take this into account in their terms and conditions.
In a somewhat (over-) simplified form: socket/core topologies can be used in some special scenarios to optimize memory efficiency and performance or to optimize licensing costs.
@danieeljosee If your XCP-ng pool is set up correctly, live migration with Xen Orchestra works straight away. Compatibility is key in a pool (CPU manufacturer, CPU architecture, XCP-ng version, network cards/ports). There is a lot of leeway for CPU architecture, but I would recommend not mixing new and older CPU generations.
For the network settings, it depends on your network architecture. You can run all network traffic via one NIC or you can separate the network traffic from management, VM migration, VM and/or storage/backup using more NICs. In a homelab, one NIC is usually sufficient.
While shared storage is very helpful for VM live migration, you can also live migrate the VM with its disks attached on local storage. Just select the destination host and destination storage repository. For local storage, VMs and attached disks must of course be located on the same destination host.
Check out Tom's YT channel as @ph7 suggested.
@Vinylrider Cool modification, but definitely not for the faint-hearted. Is it difficult to cut through the metal? Interestingly, in your setup, the end with the 4x black wires connects to the riser card. With my setup / cable, it's the 5x black wire end that plugs into the riser card. Mh, I saw that on youtube but never gave it much thought until now.
Since my R720s canโt natively monitor GPU temperatures of this GPUs to adjust fan speeds, Iโm planning to create a script that reads both CPU and GPU temperatures and dynamically controls the fan speed through iDRAC. My R720s, equipped with two Intel Xeon CPU E5-2640 v2 processors (TDP of 95W), donโt typically run too hot though, even under load.
Hereโs a quick temperature chart Iโve put together for my setup, showing CPU and GPU temperatures at various fan speeds (adjusted via iDRAC) - both at idle and under load:
With the automatic fan control at 45%, both CPU and GPU temperatures remain comfortably below 60ยฐC, even under load.
@CodeMercenary Did some tests with the A2000 and as expected, the 12GB VRAM is the biggest limitation. Used vanilla installations of Ollama and ComfyUI with no tweaking or optimization. Especially in stable diffusion, the A2000 is about three times faster compared to the P40, but that is to be expected. I have added some results below.
Stable Diffusion tests
A2000
1024x1024, batch 1, iterations 30, cfg 4.0, euler 1.4s/it
1024x1024, batch 4, iterations 30, cfg 4.0, euler 2.5s/it = 0.6s/it
P40
1024x1024, batch 1, iterations 30, cfg 4.0, euler 2.8s/it
1024x1024, batch 4, iterations 30, cfg 4.0, euler 12.1s/it = 3s/it
Inference tests
A2000
qwen2.5:14b 21 token/sec
qwen2.5-coder:14b 21 token/sec
llama3.2:3b-Q8 50 token/sec
llama3.2:3b-Q4 60 token/sec
P40
qwen2.5:14b 17 token/sec
qwen2.5-coder:14b 17 token/sec
llama3.2:3b-Q8 40 token/sec
llama3.2:3b-Q4 48 token/sec
During heavy testing, the A2000 reached 70ยฐC and the P40 reached 60ยฐC both with the Dell R720 set to automatic fan control.
@gduperrey Update some Dell R720s with GPUs and a Dell R730. Update worked without any problem and VMs operate as expected. Will update this post if that changes during day-to-day operation. Great work!
@CodeMercenary I got my hands on a Nvidia RTX A2000 12GB (around 310โฌ used on Ebay) which might be an option, depending on what you want to do. It is a dual slot low profile GPU with 12GB VRAM, a max power consumption of 70W and active cooling. With a compute capability of 8.3 (P40: 6.1, M40: 5.2) it is fully supported by ollama
. While 12GB is only 50% of one P40 with 24GB VRAM, it runs small LLMs nicely and with a high token per second rate. It can almost run the Llama3.2 11b vision model (11b-instruct-q4_K_M
) using the GPU with only 3% offloaded to CPU. I will start testing this card during the weekend and can share some results if that would help.
@CodeMercenary I didn't know there was a midboard hard drive option for the R730xd - cool. You could always install a couple Nvidia T4s, but they only come with 16GB of VRAM and are much more expensive compared to the P40s. For reference, I've added a top view of the R720 with two P40s installed.
@manilx Maybe this post on Intel iGPU passthough gives some ideas? You probably loose the video output on the Protectli when you assign the iGPU to a VM.
@CodeMercenary The M40 is a server card and (physically) compatible with the R730, so no extra cooling is required (and possible). The downside is that the R730 most likely will still go full blast on all fans regardless of the actual power consumption since the server can not read the GPUs temperature. But there are scripts to manage the fan speeds based on server or GPU temperature. And once you have ollama installed, you can ask it how to write that code
@CodeMercenary Probably not insane if you want to learn using ollama or other LLM frameworks for inference. But the M40 is an ageing GPU with a low compute capability (v5.2), so with time, it might not be supported any more by platforms like ollama, vLLM, llama.cpp or aprhodite (did not check if they actually support that GPU, but Ollama has support for the M40). I doubt that you get an acceptable performance for stable diffusion (image generation) or training/fine-tuning. But what could you expect for $90?
The card has a power consumption (TDP) of 250W which is compatible with the 16x PCIe slot of riser #2. You have to be extra careful with the cable as it is not a standard cable. While most would suggest power supplies of 1100W for the Dell R730 to be on the save side, I run two P40s with 750W power supplies in a Dell R720. But I also power limit the card to 140W with little effect on the performance and have light workloads and no batch processing.
@olivierlambert People might be a little crazy, but they also have trust in the test and RC lifecycle of XCP-ng, which has proven to be very reliable thanks to the dedication of the Vates team and also the community. But I agree, for production use it's better to wait for a GA announcement. Anyway, congratulation to this important milestone in the development of XCP-ng .
@bleader There is a xcp-ng-8.3.0.iso
on the ISO repository. Is that the release of XCP-ng 8.3 ? Looking forward to an official announcement .
@McHenry Maybe this old post helps Error importing large vhd file. It also links to the documentation on how to Migrate to XCP-ng from Hyper-V, but I guess you already read that.
@bleader Update worked well on my two node homelab and everything looks and works normal after reboot. I did some basic stuff like VM and Storage migration, but nothing in depth. Let's see how things work out.
@marvine I am using two Nvidia P40 with passthrough to Debian VMs with XCP-ng 8.3 RC1. I had no issues with passthrough, installing and runing the P40. For Windows VMs, this older post might give some more information.
Mh, that sounds interesting, but I never done this. Can you suggest a starting point, example or documentation to get started?
Yes, that is my challenge.
In the Debian VM, I can use nvidia-smi
to get GPU info from the GPUs that are passed through by XCP-ng to the VM. I can not use ipmitool
localy in the Debian VM to get host/server info and control the fan speed (most likely because /dev/ipmi0
is visible in DOM0 on my Dell R720 but not visible in the VM). One option would be to use IPMI Over LAN to give the VM access to the iDRAC interface, but that is in the management VLAN.
My thought is to dynamically control the fan speeds from within the VM that creates the thermal load, or even turn the VM off when the load exceeds a certain critical threshold.
I am using a Dell Poweredge R720 with two Nvidia P40 to learn about Large Language Models (LLMs) and machine learning in general. XCP-ng is installed and both P40s are pass-throughed (if that word exists) to a Debian 12 VM that runs Ollama in Docker. I do powerlimit both GPUs, but during inference the passively cooled GPUs sometimes still get hot anyway.
On a bare metal Debain 12 install, I can use nvdia-smi
to read the max. GPU temperatur and ipmitool
to read the max. inlet/outlet/CPU temperatur and set the fan speed dynamicly, again with ipmitool
. That works reliably.
I don't want to use ipmitool -H {ip_address} -U {username} -P {password} {command}
(which seems to be the #1 recommendation) because I don't want to punch a hole in the firewall to give this VM access to the management network. Another control VM that queries both temperature sources (iDRAC via ipmitool
, Ollama Debian VM with P40 Paththrough via SSH and nvidia-smi
) would also work, but feels to complicated.
Any idea if something like running ipmitool
locally in a Debain 12 VM can be achieved with XCP-ng?
@bleader Installed on my playlab. Everything looks normal, let's see how it goes.