tjkreidl

tjkreidl

Hi, everyone. Nice to see this project turning into reality. I will try to spend time here as possible, which is hard with already being spread thinly. I've been a XenServer user for around a decade and am as interesting in learning as well as contributing whatever knowledge might be helpful to the community.

Best regards,
-=Tobias

tjkreidl

And from the CLI:

xe host-list (to get the UUID of the host)
xe pool-eject host-uuid=<host_UUID>

tjkreidl

@robyt It depends on (1) licensing, if any, as some licenses go by cores vs. sockets, and (2) NUMA/VNUMA depending on how critical the performance is depending on how the VCPUs get allocated between sockets or on a single socket. Best way IMO is to try all and test with benchmarks. See, for example, this article and the previous two articles, as well as articles by Frank Denneman and others: https://blogs.mycugc.org/2019/04/30/a-tale-of-two-servers-part-3-the-influence-of-numa-cpus-and-sockets-cores-persocket-plus-other-vm-settings-on-apps-and-gpu-performance/

tjkreidl

@olivierlambert said in NUMA-impact - Xeon/Epyc - 1P vs 2P:

There is no universal answer (because it's mostly depending on your VM load and what do you expect). As usual, my advice is to keep it simple if you don't have a problem with it (ie: you are satisfied by the perf.). Even a default EPYC configuration will be likely always better than a Xeon one.

After that, if you want to go deeper and learn the details, it's OK, let me just ping @tjkreidl who did a remarkable job (if I remember correctly) on this very topic.

Thanks for the mention, @olivierlambert ! Here's a link to part 3, which contains links back to parts 1 and 2. Note that NUMA will affect EPYC processors differently as they changed the die configuration at one point with the number of cores. I'm open for any questions on this topic. https://blogs.mycugc.org/2019/04/30/a-tale-of-two-servers-part-3-the-influence-of-numa-cpus-and-sockets-cores-persocket-plus-other-vm-settings-on-apps-and-gpu-performance/

tjkreidl

@epretorious I would add that you have to be careful about overprovisioning when NUMA/vNUMA kicks in, that is when you allocate more VCPUs to exceed the number of physical CPUs of a bank of them as well as the associated physical memory (assume, for the sake of argument, you have two banks of physical CPUs and each has directly accessible to it one of two banks of memory) then things get inefficient because a CPU may need to go across to a different bank of memory to access data and there is additional overhead involved. See for example this article and the two preceding it:
https://blogs.mycugc.org/2019/04/30/a-tale-of-two-servers-part-3-the-influence-of-numa-cpus-and-sockets-cores-persocket-plus-other-vm-settings-on-apps-and-gpu-performance/

-=Tobias

tjkreidl

@MichaelCropper CPUs can be over-provisoned, but not memory. You can use DMC (dynamic memory control) to regulate how much memory a VM will actually use, but in total, you still cannot exceed the total amount of physical memory available on a server.

CPU over-provisioning is very common, especially if loads change significantly over time (day/night weekday/weekend, special event and holidays/regular days, etc.).

Watching the load with top and xentop will give you an idea about overall performance of dom0 and all VMs, respectively.

As to a VM powred off, it will use up neither memory nor CPU resources.

There are a lot of subtleties involved that would entail a much longer discussion, but hopefully this will help for starters. You can google a lot of information about memory and VCPU allocation; there is a lot of information out there.

tjkreidl

@olivierlambert Agreed. The Citrix forum used to be very active, but especially since Citrix was taken over, https://community.citrix.com has had way less activity, sadly.
It's still gratifying that a lot of the functionality still is common to both platforms, although as XCP-ng evolves, there will be continually less commonality.

tjkreidl

@Chrome Cheers -- always glad to help out. I put in many thousands of posts on the old Citrix XenServer site, and am happy to share whatever knowledge I still have, as long as it's still relevant! In a few years, it probably won't be, so carpe diem!

tjkreidl

@Chrome Fantastic! Please mark my post as helpful if you found it as such. Was traveling much of today, hence the late response.

BTW, it's always good to make a backup and/or archive of your LVM configuration anytime you change it, as the restore option is the cleanest way to deal with connectivity issues if there is some sort of corruption. It's saved my rear end before, I can assure you!

Yeah, if the SSD drive got wiped, there's no option to get those back unless you made a backup somewhere of all that before you installed XCP-ng onto it.

BTW, another very useful command for LVM is "vgchange -ay" which will attempt to renew VG information if a VG seems missing or the like.

tjkreidl

@Chrome As M. Lambert says, you may be able to sue pbd-plug to re-attach the SR if you can sr-introduce the old SR back into the system.
If not, and if your LVM configuration has not been wiped out, here are some steps t try to recover it (it's an ugly process!):

Identify the LVM configuration:
Check for Backups: Look for LVM metadata backups in /etc/lvm/archive/ or /etc/lvm/backup/.
Use vgscan: This command will search for volume groups and their metadata.
Use pvscan: This command will scan for physical volumes.
Use lvs: This command will list logical volumes and their status.
Use vgs: This command will list volume groups.
Restore from Backup (if available):
Find the Backup: Locate the LVM metadata backup file (e.g., /etc/lvm/backup/<vg_name>).
Boot into Rescue Mode: If you're unable to access the system, boot into a rescue environment.
Restore Metadata: Use vgcfgrestore to restore the LVM configuration.
Recreate LVM Configuration (if no backup):
Identify PVs: Use pvscan to list available physical volumes.
Identify VGs: Use vgscan to identify volume groups if they are present.
Recreate PVs: If necessary, use pvcreate to create physical volumes.
Create VGs: If necessary, use vgcreate to create a new volume group.
Create LVs: If necessary, use lvcreate to create logical volumes.
Mount and Verify:
Mount the Logical Volumes: Mount the restored LVM volumes to their respective mount points.
Verify Data: Check the integrity of the data on the restored LVM volumes.
Extend LVM (if adding capacity):
Add a new disk: Ensure the new disk is recognized by the system.
Create PV: Use pvcreate on the new disk.
Add PV to VG: Use vgextend to add the PV to the volume group.
Extend LV: Use lvextend to extend the size of an existing logical volume.
Extend Filesystem: Use e2resize (for ext4) or resize2fs (for ext3) to extend the filesystem on the LV.

tjkreidl

@Aleksander WIth standard install 8.2 and associated drivers, it's reported that support exists for the following:
Tesla M6/M10/M60, P4/P6/P40/P100, V100, T4, A2/A10/A16/A40, and RTX A5000/A6000/6000/8000 series.
The appripriate NVIDIA licensing must of course also be obtained and installed, including a license server.

tjkreidl

@Andrew I thought dom0 had to have some sort of graphics card; is this a recent development? My understanding is that dom0 needed one, albeit not necessarily any with GPU capabilities. Thanks in advance for any clarifications.

tjkreidl

@Johny It would help knowing how the two GPUs were configured on the host. Is this host part of a pool or standalone?

tjkreidl

@vaewyn There is an emergency transition to new master xe command. Also, make sure all your hosts are properly time syncronized to each other or there can be pool issues.
Try first:
xe pool-designate-new-master host-uuid=<new-master-uuid>
If that fails, you will need to run on the slave server:
xe pool-emergency-transition-to-master

tjkreidl

@olivierlambert Good idea. Also, they should make sure all hosts are at the same update/patch levels, the network is set up properly among the three or more hosts, there is a compatible HA shared storage properly set up, etc.
You folks have a good guide at: https://docs.xcp-ng.org/management/ha/

tjkreidl

@nikade Interesting, as that at some point used to be the case, at least with XenServer!
I stand corrected and learned something new.

tjkreidl

Note also that if HA is turned on or off, the host must be restarted for that change to take effect, if I recall correctly.

tjkreidl

@olivierlambert I've always preferred Intel NICs.

tjkreidl

@kagbasi-ngc See if this thread can help you out:
https://xcp-ng.org/forum/topic/6618/how-to-remove-this-sr-nfs-storage/

tjkreidl

@umbradark Maybe too obvious, but is your boot configuration set up to be BIOS or EUFI mode?

tjkreidl

@tjkreidl

Best posts made by tjkreidl

Latest posts made by tjkreidl