@indyj said in Centos 8 is EOL in 2021, what will xcp-ng do?:
@jefftee I prefer Alpine Linux.
+1
Low resource footprint, no bloatware... They even have a pre-built Xen Hypervisor ISO flavor
@indyj said in Centos 8 is EOL in 2021, what will xcp-ng do?:
@jefftee I prefer Alpine Linux.
+1
Low resource footprint, no bloatware... They even have a pre-built Xen Hypervisor ISO flavor
I liked as well. Easy to find the topics and good layout
@olivierlambert congrats to the team and also to this great community!
@sasha It's worth notice that the BIOS (from 2019) is relatively old/outdated. It's recommended to update the BIOS to a more recent version.
@fred974 Yep, see the docs about NUMA/core affinity (soft/hard pinning):
@Forza Take a look:
https://xcp-ng.org/forum/post/49400
At the time of this topic, I remember asking a coworker to boot a CentOS 7.9 with more than 64 vcpus on a 48C/96T Xeon server. The VM started normally, but it didn't recognizes the vcpus > 64.
I've not tested that VM param platform:acpi=0
as a possible solution and the trade-offs. In the past, some old RHEL 5.x VMs without acpi support would simply power off (like pulling the power cord) instead of a clean shutdown on a vm-shutdown command.
Regarding that CFD software, does it support a worker/farm design? vGPU offload? I'm not a HPC expert but considering the EPYC MCM architecture, instead of a big VM, spreading the workload across many workers pinned to each CCD (or each numa nodes on a NPS4 confg) may be interesting.
Before buying those monsters, I would ask AMD to deploy a PoC using the target server model. For such demands, it's very important to do some sort of certification/validation.
@erfant probably not because the nvme driver is loaded and there're no nvme errors in the logs.
@olivierlambert thank you and your team for this great project and community! It's a nice place to share knowledge and learn new stuff. I learn a lot here!
@erfant after seeing your uploaded dmesg
, the steps 2 & 3 boot options can be put aside for while because the error isn't the same as the other topics.
The log is showing MxGPU driver probe/initialization errors. After some digging, could be the case of a GPU firmware being incompatible with UEFI. Do you have any spare server for testing XCP-ng boot in legacy/BIOS with this GPU?
[ 119.418930] gim error:(gim_probe:123) gim_probe(08:00.0)
[ 121.145663] gim error:(wait_cmd_complete:2387) wait_cmd_complete -- time out after 0.003044131 sec
[ 121.145719] gim error:(wait_cmd_complete:2390) Cmd = 0x17, Status = 0x0, cmd_Complete=0
[ 121.145984] gim error:(init_register_init_state:4643) Failed to INIT PF for initial register 'init-state'
Edited for clarification.
@Appollonius said in Strange issue with booting XCP-NG:
Its only when I install the GPU and dont connect it to a monitor that it will not boot properly.
Maybe because, when there's a GPU installed but no monitor attached, the motherboard POST fails at EDID probe? As stated, some boards/BIOS require an explicit configuration in order to boot without a monitor/keyboard/mouse plugged, eg.:
https://www.supermicro.com/support/faqs/faq.cfm?faq=11902
@sasha It's worth notice that the BIOS (from 2019) is relatively old/outdated. It's recommended to update the BIOS to a more recent version.
@fred974 Yep, see the docs about NUMA/core affinity (soft/hard pinning):
@ptunstall when the GPU was pushed back to dom0, did you also remove the PCI address from the VM config?
What's the output of:
xe vm-param-get uuid=<...> param-name=other-config
?
@sluflyer06 In order to persist across reboots, you must set the cpufreq
boot option. There's no need to rebuild grub because the change will occur at Xen level (instead of dom0):
/opt/xensource/libexec/xen-cmdline --set-xen cpufreq=xen:ondemand
After that, change the System power profile to Performance Per Watt (OS)
in BIOS.
Verifying the config:
Check if the attribute current_governor
is set to ondemand
:
xenpm get-cpufreq-para
Check the clock scaling:
xenpm start 1|grep "Avg freq"
@Forza Take a look:
https://xcp-ng.org/forum/post/49400
At the time of this topic, I remember asking a coworker to boot a CentOS 7.9 with more than 64 vcpus on a 48C/96T Xeon server. The VM started normally, but it didn't recognizes the vcpus > 64.
I've not tested that VM param platform:acpi=0
as a possible solution and the trade-offs. In the past, some old RHEL 5.x VMs without acpi support would simply power off (like pulling the power cord) instead of a clean shutdown on a vm-shutdown command.
Regarding that CFD software, does it support a worker/farm design? vGPU offload? I'm not a HPC expert but considering the EPYC MCM architecture, instead of a big VM, spreading the workload across many workers pinned to each CCD (or each numa nodes on a NPS4 confg) may be interesting.
Before buying those monsters, I would ask AMD to deploy a PoC using the target server model. For such demands, it's very important to do some sort of certification/validation.
It could be. For an user point of view, a single host pool wouldn't make any sense, so they created the "implicit/explicit" concept and treated everything as a pool internally.
That's a question for the Citrix dev team
Just FYI guys, XenCenter/XCP-ng Center have the menu option Pool > Make into standalone server. As pointed out by other members, every standalone host is in a pool, but that option reverts to an "implicit" one.
Hope this helps.
@jeff In order to create a virtual NUMA topology and expose it to the guest, the vNUMA feature needs to be implemented at hypervisor level and accessible through XAPI. I'm not sure if that feature is fully supported at the moment. Maybe @olivierlambert can confirm this?
You could try adding the cores-per-socket
attribute following the physical NUMA topology (96 / 4 nodes = 24):
xe vm-param-set platform:cores-per-socket=24 uuid=<VM UUID>
Let me know if it works.
@indyj said in Centos 8 is EOL in 2021, what will xcp-ng do?:
@jefftee I prefer Alpine Linux.
+1
Low resource footprint, no bloatware... They even have a pre-built Xen Hypervisor ISO flavor