n100 based hosts? AMD 5800h?

Greg_E

Thanks. I probably should have specified that I "need" three hosts in a pool, it just mimics real life better than a single host. Rolling pool upgrades in a production system are just fantastic!

I'm leaning towards the AMD because of the more cores per dollar, and I'm not sure I can trust Intel right now. I have 27 HP Z2 SFF computers that I just put in our classrooms and they have been "buggy", at least until the BIOS patches came out a few weeks ago. I still have one that I think may be damaged. The 10th gen i5 computers we already had are just stable and fast compared to these 14th gen i7 (windows 11 Education).

Thankfully the production servers I bought are using a slightly older Xeon Silver and should be free from the processor self destruction in the gen13 and 14 processors.

I may go down to an AMD 4800h which still has 8c16t but lower frequency so lower power and heat. Still shopping. Hard to find them without RAM, I know I'm going to want at least 32gb but probably 64gb would be better.

My last choice might be an HP T740 which is 4c8t and only because I already have two of them that I was using for OPNsense and other testing. Need ram and bigger drives. But I'd like to reduce the power a bit, these are up to 65 watts total system power and I'd really like to be under 10 watts operating. I may have to build a 2 host pool out of these and see how it performs, might be good enough.

CJ

I started on XCPng with AMD mini PCs for the same reasons you're looking at them. While compute and memory were fine, the lack of faster networking made me switch away from them.

If you're managing things proxmox style with all local storage, then they work well. But that means that you lose out on a lot of the nicety of XCPng. Rebooting a host means that you either wait for a migration to another node or shutdown/suspend the VMs. Using shared storage typically limits you to 1G or 2.5G speeds, which mean your VMs have half the speed of SATA.

I've since moved to using older desktops for my hosts. I can max out their memory relatively cheaply and easily add a 25G or better NIC. While the setup uses more power than the mini PCs, it's actually not bad. IIRC, typical usage was around 20W. But as I mentioned, I'm not doing any heavy compute workloads.

Greg_E

@CJ

Yes, I'm finding the speed limits on shared storage to be a factor when I move my production system around for work on one of the storage machines. Hoping there's a fix somewhere for this.

When on local storage, and assuming the management agent is installed, can the system automatically migrate VMs to other hosts during a rolling pool upgrade or rolling pool reboot? This is something I need to look into, I tried in the past but had a problem. If I could get everything on local storage and still have the rolling upgrades, I'd probably install drives in my hosts to handle this and turn off my oldest Truenas system. Especially if it was faster during migrations. I'll have to try this with my current lab, there's enough left on the system drive to handle one VM for testing.

CJ

@Greg_E What do you mean by speed limits on shared storage? With a shared SR you're just moving memory from host to host and not the disk.

Automatic migration, rolling updates, etc won't work with local storage. That's a part of the reason I moved to shared storage.

I don't have any local storage configured on the new pool. The only drive in each host is a small and cheap m2 sata to run XCPng. That's part of what allows the low power usage.

Greg_E

@CJ

I've been playing in my lab and finding that the "maximum" I can get is gigabit speeds when I migrate a VM from host to host with shared storage. More than 1 VM gets more than gigabit.

Migrating from host to host with local storage was very slow, under 100mbps slow.

And this is all over 10gbps networking. But it does prove out that I'll need to add a Truenas to whatever I build.

CJ

@Greg_E The current storage API has some limitations which mean that you won't saturate a 10g connection with any single disk operation. Vates has a new version that should fix those issues but it's not here yet.

Even if the migrations happened at the same speed, shared storage will always be faster since it's just memory and not the disk as well. The other way to keep migrations fast is to keep your VMs as small as possible, using network shares for data as much as you can.

TS79

Hi @Greg_E. I've setup a few homelabs with XCP-ng using older and newer mini PCs, so thought I'd share some of my experiences.
First pass, I used the Lenovo Tiny M710q PCs, bought for around £100 each on eBay. They had either the i5-6400T or i5-6500T processor. I added 32GB of Crucial RAM, added the SATA drive tray for a boot drive, and added a 1TB NVMe in each for storage. Since I don't use Wifi on these, I removed the M.2 wifi card and added in a cheap 2.5GbE NIC (https://www.amazon.co.uk/gp/product/B09YG8J7BP)
XCP-ng 8.2.1 works perfectly, no customisation or challenges. I did see the exact same storage performance trends as you, and see that @CJ has already correctly pointed out the current limitation in the current storage API (SMAPIv1).

I've also built a homelab with the Trigkey G5 N100 mini PCs. Again, XCP-ng 8.2.1 works perfectly on the 4-core E-cores of the N100. This G5 model has dual 2.5GbE NICs which is perfect for giving VMs a 2.5GbE link to the world, and a separate 2.5GbE link for the host to use for storage. Be aware, if you split networking this way, Xen Orchestra needs to be present on both networks (management to talk to the XCP-ng hosts over HTTPS, and storage to talk to NFS and/or CIFS for backups/replication).

I've not measured the power draw much, but typically the Lenovos are using around 15-25W, and the Trigkey G5s about 10-18W. Fan noise on both are very low - I have them on a shelf in my desk, so I sit next to them all day. My daily driver is a dead-silent Mac Mini M2, so I'm very aware of surrounding noise, and there's nearly none.

The only challenge I had with the N100 was that Windows VMs seemed to think they only had a clock speed of 800MHz - so performance was poor. I did not get around to trying any performance settings in the BIOS to force higher clock speeds : in my view this would trigger additional power usage, unwanted additional heat and additional fan noise.

If you build a homelab with 3 XCP-ng hosts, slap a 1TB NVME in each and trial the XOSTOR as an alternative to network shared storage. In my case, I went down to running my workloads on a single Lenovo M710q, stored locally on NVME. Xen Orchestra (VM on the Lenovo) which backs up and replicates VMs to an NFS hosts (another Trigkey G5 with Ubuntu Server, a 4TB NVME, and running Ubuntu-native NFS)

Typical network performance during backups / DR is around 150-200MB/sec on the 2.5GbE.

Hope that helps!

Greg_E

@TS79

Thanks, I'm still thinking about this and how I might solve my riddle.

Burned down my current lab storage and building it back from fresh, not going great at the moment.

TS79

Hope you don't lose too much sleep thinking about it! There are so many right ways of doing it

My short'n'sweet advice: keep it as simple as possible while providing what you actually need.

Full resilience at every level to tackle every potential fault often brings more complexity than it's worth. Hence why I've boiled my homelab down to a single host, all VMs stored on local NVME, with regular backups and replicas. Worst case: boot up another Lenovo host, restore, and carry on. Even when I used 3x Lenovo hosts in a pool, I found that the shared storage performance was not worth needing 4 hosts sucking electricity

Greg_E

@TS79

One thing I feel I need to be able to simulate is updates/upgrades and really rolling pool updates being the focus. Most of the rest of the features that you get with multiple hosts I can probably live without. HA might be another consideration for multiple hosts.

I will say one thing, I need to find a PCIe to m.2 card that allows bifurcated PCI lanes. 4 m.2 drives on a single PCIe card that exposes them as individual drives to Truenas would be really nice for a lot of things. Price being the consideration and reason I don't have any. I may need to drop back to having a single big drive shared out, ran for years like this on a different lab.

TS79

@Greg_E if the motherboard supports PCIe lane bifurcation, these cards work well: https://www.amazon.co.uk/GLOTRENDS-Platform-Bifurcation-Motherboard-PA41/dp/B0BHWN7WKD

If no bifurcation support on your mobo, then you need a PCIe card with a PCIe switch on it - lthey're much more expensive but typically solve the problem. I've used this one: https://www.amazon.co.uk/GLOTRENDS-PA40-Adapter-Bifurcation-Function/dp/B0CCNL7YD8

Just remember to add heatsinks and if possible, additional active cooling. I ended up wedging a rubber-edged Noctua 80mm fan inside my DIY NAS to blow directly onto the NVMEs and dropped them from 60-70 degrees C down to 30-40 degrees.

buldamoosh

Currently I have 6 hosts -
AMD Ryzen 7 4800U with Radeon Graphics
AMD Ryzen 7 5800U with Radeon Graphics
AMD Ryzen 7 4800U with Radeon Graphics
AMD Ryzen 7 7840HS w/ Radeon 780M Graphics
AMD Ryzen 7 7840HS w/ Radeon 780M Graphics
AMD Ryzen 7 4800U with Radeon Graphics
I'm using a QNAP NAS for my storage and the 2.5G NICs are all used for storage. It all works very well for my HomeLab.

Greg_E

@TS79

I had not seen that adapter, finding a main board that supports bifurcation is probably easier than a card that bifurcates. I'll have to look at some of the devices I have and see if this is supported.