Can't boot a VM with 1TB memory / 128 CPUs
-
@olivierlambert Debian 12 template with PXE boot, local storage
Host has 2 x AMD EPYC 9534 64-Core Processor / 1.5 TiB of memory (24 x 64 GiB)The issue comes from the number of CPUs, not the memory. I've made more testing. I've created some VMs from XOA 5.107.1 with template Debian Bookworm 12, PXE boot, local storage (EXT4)
no boot means console with "Guest has not initialized the display"
boot means : Tianocore logo and PXE boot64x / 1 TiB => boot => upgrade to 128x => no boot
128x / 1TiB => no boot => downgrade to 64x => no boot
96x / 256 GiB => boot => upgrade to 102x => no boot => downgrade to 96x => no boot
97x / 256 GiB => no boot
99x / 512 GiB => no bootThere is a limitation of 96 cores, I've tried different CPU topology without any success
-
Yeah that makes sense, beyond 64 there's potential topology issues. But I remember we tested internally up to 128 and it worked, probably on a single socket though.
The official limit for strict security support is 32 though: https://docs.xcp-ng.org/installation/requirements/#xcp-ng-83-lts-1
I'm pretty sure we managed to get 128 working. But still, there's a lot of topology work to be done in Xen upstream to go beyond. The real complex limit should be 256. So fixing <256 is probably more bug fixing than huge rework.
-
I remember we tested 128vCPU and it worked (Ubuntu VM, but Debian should be similar).
It was on a 2CRSi machine (1 socket however).
-
Can you try the guest in BIOS mode in case that could change anything?
-
@olivierlambert it works in BIOS mode. The system is Dell Poweredge R7625 with AMD EPYC
I can't go beyond 128 cores : xenopsd internal error: Xenctrl.Error("22: Invalid argument")
-
Okay so there's 2 things:
- UEFI 96 cores limitation is hard coded here: https://github.com/xcp-ng-rpms/edk2/blob/5af940036ace555e5315a78a301912d294277ec0/SPECS/edk2.spec#L138 (to be invetigated)
- BIOS hasn't such limitation, but then you are maxing the next barrier, 128 cores, due to something else (up to 255 vCPUs)
- Then the "final" boss is beyond that
-
@olivierlambert Thanks a lot, the limitations of 128 cores and BIOS are fine for my usage.
-
Thanks, feel free to reach us if you need more details on the point 2 and our progress, we can prioritize if your org is needing it.
-
@olivierlambert it's for a rare usecase and we can wait for a proper fix
-
I can't even imagine how long a memtest would take on a box with this much ram. Ha
-
We are testing some machines with 6TiB RAM
-
@olivierlambert said in Can't boot a VM with 1TB memory / 128 CPUs:
We are testing some machines with 6TiB RAM
My god