tjkreidl

tjkreidl

Hi, everyone. Nice to see this project turning into reality. I will try to spend time here as possible, which is hard with already being spread thinly. I've been a XenServer user for around a decade and am as interesting in learning as well as contributing whatever knowledge might be helpful to the community.

Best regards,
-=Tobias

tjkreidl

And from the CLI:

xe host-list (to get the UUID of the host)
xe pool-eject host-uuid=<host_UUID>

tjkreidl

@robyt It depends on (1) licensing, if any, as some licenses go by cores vs. sockets, and (2) NUMA/VNUMA depending on how critical the performance is depending on how the VCPUs get allocated between sockets or on a single socket. Best way IMO is to try all and test with benchmarks. See, for example, this article and the previous two articles, as well as articles by Frank Denneman and others: https://blogs.mycugc.org/2019/04/30/a-tale-of-two-servers-part-3-the-influence-of-numa-cpus-and-sockets-cores-persocket-plus-other-vm-settings-on-apps-and-gpu-performance/

tjkreidl

@olivierlambert said in NUMA-impact - Xeon/Epyc - 1P vs 2P:

There is no universal answer (because it's mostly depending on your VM load and what do you expect). As usual, my advice is to keep it simple if you don't have a problem with it (ie: you are satisfied by the perf.). Even a default EPYC configuration will be likely always better than a Xeon one.

After that, if you want to go deeper and learn the details, it's OK, let me just ping @tjkreidl who did a remarkable job (if I remember correctly) on this very topic.

Thanks for the mention, @olivierlambert ! Here's a link to part 3, which contains links back to parts 1 and 2. Note that NUMA will affect EPYC processors differently as they changed the die configuration at one point with the number of cores. I'm open for any questions on this topic. https://blogs.mycugc.org/2019/04/30/a-tale-of-two-servers-part-3-the-influence-of-numa-cpus-and-sockets-cores-persocket-plus-other-vm-settings-on-apps-and-gpu-performance/

tjkreidl

@epretorious I would add that you have to be careful about overprovisioning when NUMA/vNUMA kicks in, that is when you allocate more VCPUs to exceed the number of physical CPUs of a bank of them as well as the associated physical memory (assume, for the sake of argument, you have two banks of physical CPUs and each has directly accessible to it one of two banks of memory) then things get inefficient because a CPU may need to go across to a different bank of memory to access data and there is additional overhead involved. See for example this article and the two preceding it:
https://blogs.mycugc.org/2019/04/30/a-tale-of-two-servers-part-3-the-influence-of-numa-cpus-and-sockets-cores-persocket-plus-other-vm-settings-on-apps-and-gpu-performance/

-=Tobias

tjkreidl

@olivierlambert Agreed. The Citrix forum used to be very active, but especially since Citrix was taken over, https://community.citrix.com has had way less activity, sadly.
It's still gratifying that a lot of the functionality still is common to both platforms, although as XCP-ng evolves, there will be continually less commonality.

tjkreidl

@Chrome Cheers -- always glad to help out. I put in many thousands of posts on the old Citrix XenServer site, and am happy to share whatever knowledge I still have, as long as it's still relevant! In a few years, it probably won't be, so carpe diem!

tjkreidl

@Chrome Fantastic! Please mark my post as helpful if you found it as such. Was traveling much of today, hence the late response.

BTW, it's always good to make a backup and/or archive of your LVM configuration anytime you change it, as the restore option is the cleanest way to deal with connectivity issues if there is some sort of corruption. It's saved my rear end before, I can assure you!

Yeah, if the SSD drive got wiped, there's no option to get those back unless you made a backup somewhere of all that before you installed XCP-ng onto it.

BTW, another very useful command for LVM is "vgchange -ay" which will attempt to renew VG information if a VG seems missing or the like.

tjkreidl

@Chrome As M. Lambert says, you may be able to sue pbd-plug to re-attach the SR if you can sr-introduce the old SR back into the system.
If not, and if your LVM configuration has not been wiped out, here are some steps t try to recover it (it's an ugly process!):

Identify the LVM configuration:
Check for Backups: Look for LVM metadata backups in /etc/lvm/archive/ or /etc/lvm/backup/.
Use vgscan: This command will search for volume groups and their metadata.
Use pvscan: This command will scan for physical volumes.
Use lvs: This command will list logical volumes and their status.
Use vgs: This command will list volume groups.
Restore from Backup (if available):
Find the Backup: Locate the LVM metadata backup file (e.g., /etc/lvm/backup/<vg_name>).
Boot into Rescue Mode: If you're unable to access the system, boot into a rescue environment.
Restore Metadata: Use vgcfgrestore to restore the LVM configuration.
Recreate LVM Configuration (if no backup):
Identify PVs: Use pvscan to list available physical volumes.
Identify VGs: Use vgscan to identify volume groups if they are present.
Recreate PVs: If necessary, use pvcreate to create physical volumes.
Create VGs: If necessary, use vgcreate to create a new volume group.
Create LVs: If necessary, use lvcreate to create logical volumes.
Mount and Verify:
Mount the Logical Volumes: Mount the restored LVM volumes to their respective mount points.
Verify Data: Check the integrity of the data on the restored LVM volumes.
Extend LVM (if adding capacity):
Add a new disk: Ensure the new disk is recognized by the system.
Create PV: Use pvcreate on the new disk.
Add PV to VG: Use vgextend to add the PV to the volume group.
Extend LV: Use lvextend to extend the size of an existing logical volume.
Extend Filesystem: Use e2resize (for ext4) or resize2fs (for ext3) to extend the filesystem on the LV.

tjkreidl

@Chrome Do then just a "xe vm-list" and see if you recogniize any VMs other than the dom0 instance of XCP-ng.
If there is nothing else showing up, you will need to try to find your other LVM storage.

tjkreidl

@archw Just write a shell script and use rsh to securely run the script to query that host for the status of that VM. You make need to add the accessing hosts to /etc/hosts.allow (might be hosts_allow, I can't recall offhand).
See for example: https://linuxconfig.org/hosts-allow-format-and-example-on-linux

That said, HA is clearly a better option, provided you have a compatible SR available.

tjkreidl

@MasterOSkillio Typically, C-states need to be changed in the BIOS. In some cases, it can be very helpful. I wrote a few blogs including this topic entitled "A Tale of Two Servers" but cannot readily find them on-line at the moment. Alas, Citrix has purged a lot of still relevant older content over the years.

tjkreidl

@RS One option would be this, assuming in this se you want to run the job at midnight on Dec. 25:
/bin/echo "/path/to/your/script.sh" | at midnight Dec 25

While cron doesn't offer a specific one-time execution, you could also do this in cron but would have to remove the entry afterwards:
0 0 25 12 * /path/to/your/script.sh

Also, take a look at this option: https://www.fastcron.com/guides/one-time-cronjobs/

tjkreidl

@DKirk That all makes sense, thanks for clarifying. Looks like there are further comments below that seem to pinpoint where the issue may lay. The key point you make is only "after the last updates" is when this started happening!

tjkreidl

@DKirk Very odd. Maybe a electrical power issue? Do you see this if you run xentop on each host and really important, do they happen at the same time on all your servers?
Any chance they are overheating and pausing briefly?

tjkreidl