XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login
    1. Home
    2. tjkreidl
    Offline
    • Profile
    • Following 0
    • Followers 7
    • Topics 0
    • Posts 224
    • Groups 1

    tjkreidl

    @tjkreidl

    Ambassador

    Originally an astronomer for 15 years and later, an NAU employee in IT for 25+ years, most of which as a Team Lead. I was a Citrix CTP and NVIDIA NGCA for four years prior to retirement. Over 10 years' experience with XenServer/Citrix Hypervisor and close to that with NVIDIA GRID products. I was also a Red Hat Linux administrator and system programmer. Still trying to contribute what knowledge I have for the benefit of the IT community.

    112
    Reputation
    535
    Profile views
    224
    Posts
    7
    Followers
    0
    Following
    Joined
    Last Online
    Website None
    Location Somewhere, USA

    tjkreidl Unfollow Follow
    Ambassador

    Best posts made by tjkreidl

    • RE: Introduce yourself!

      Hi, everyone. Nice to see this project turning into reality. I will try to spend time here as possible, which is hard with already being spread thinly. I've been a XenServer user for around a decade and am as interesting in learning as well as contributing whatever knowledge might be helpful to the community.

      Best regards,
      -=Tobias

      posted in Off topic
      tjkreidlT
      tjkreidl
    • RE: Remove a host from a pool

      And from the CLI:

      1. xe host-list (to get the UUID of the host)
      2. xe pool-eject host-uuid=<host_UUID>
      posted in Management
      tjkreidlT
      tjkreidl
    • RE: Socket/core configuration in VM

      @robyt It depends on (1) licensing, if any, as some licenses go by cores vs. sockets, and (2) NUMA/VNUMA depending on how critical the performance is depending on how the VCPUs get allocated between sockets or on a single socket. Best way IMO is to try all and test with benchmarks. See, for example, this article and the previous two articles, as well as articles by Frank Denneman and others: https://blogs.mycugc.org/2019/04/30/a-tale-of-two-servers-part-3-the-influence-of-numa-cpus-and-sockets-cores-persocket-plus-other-vm-settings-on-apps-and-gpu-performance/

      posted in Compute
      tjkreidlT
      tjkreidl
    • RE: NUMA-impact - Xeon/Epyc - 1P vs 2P

      @olivierlambert said in NUMA-impact - Xeon/Epyc - 1P vs 2P:

      There is no universal answer (because it's mostly depending on your VM load and what do you expect). As usual, my advice is to keep it simple if you don't have a problem with it (ie: you are satisfied by the perf.). Even a default EPYC configuration will be likely always better than a Xeon one.

      After that, if you want to go deeper and learn the details, it's OK, let me just ping @tjkreidl who did a remarkable job (if I remember correctly) on this very topic.

      Thanks for the mention, @olivierlambert ! Here's a link to part 3, which contains links back to parts 1 and 2. Note that NUMA will affect EPYC processors differently as they changed the die configuration at one point with the number of cores. I'm open for any questions on this topic. 🙂 https://blogs.mycugc.org/2019/04/30/a-tale-of-two-servers-part-3-the-influence-of-numa-cpus-and-sockets-cores-persocket-plus-other-vm-settings-on-apps-and-gpu-performance/

      posted in Compute
      tjkreidlT
      tjkreidl
    • RE: vCPU Over-Subscription...

      @epretorious I would add that you have to be careful about overprovisioning when NUMA/vNUMA kicks in, that is when you allocate more VCPUs to exceed the number of physical CPUs of a bank of them as well as the associated physical memory (assume, for the sake of argument, you have two banks of physical CPUs and each has directly accessible to it one of two banks of memory) then things get inefficient because a CPU may need to go across to a different bank of memory to access data and there is additional overhead involved. See for example this article and the two preceding it:
      https://blogs.mycugc.org/2019/04/30/a-tale-of-two-servers-part-3-the-influence-of-numa-cpus-and-sockets-cores-persocket-plus-other-vm-settings-on-apps-and-gpu-performance/

      -=Tobias

      posted in Compute
      tjkreidlT
      tjkreidl
    • RE: Overprovisioning CPU + RAM?

      @MichaelCropper CPUs can be over-provisoned, but not memory. You can use DMC (dynamic memory control) to regulate how much memory a VM will actually use, but in total, you still cannot exceed the total amount of physical memory available on a server.

      CPU over-provisioning is very common, especially if loads change significantly over time (day/night weekday/weekend, special event and holidays/regular days, etc.).

      Watching the load with top and xentop will give you an idea about overall performance of dom0 and all VMs, respectively.

      As to a VM powred off, it will use up neither memory nor CPU resources.

      There are a lot of subtleties involved that would entail a much longer discussion, but hopefully this will help for starters. You can google a lot of information about memory and VCPU allocation; there is a lot of information out there.

      posted in Xen Orchestra
      tjkreidlT
      tjkreidl
    • RE: How to Re-attach an SR

      @olivierlambert Agreed. The Citrix forum used to be very active, but especially since Citrix was taken over, https://community.citrix.com has had way less activity, sadly.
      It's still gratifying that a lot of the functionality still is common to both platforms, although as XCP-ng evolves, there will be continually less commonality.

      posted in XCP-ng
      tjkreidlT
      tjkreidl
    • RE: How to Re-attach an SR

      @Chrome Cheers -- always glad to help out. I put in many thousands of posts on the old Citrix XenServer site, and am happy to share whatever knowledge I still have, as long as it's still relevant! In a few years, it probably won't be, so carpe diem!

      posted in XCP-ng
      tjkreidlT
      tjkreidl
    • RE: How to Re-attach an SR

      @Chrome Fantastic! Please mark my post as helpful if you found it as such. Was traveling much of today, hence the late response.

      BTW, it's always good to make a backup and/or archive of your LVM configuration anytime you change it, as the restore option is the cleanest way to deal with connectivity issues if there is some sort of corruption. It's saved my rear end before, I can assure you!

      Yeah, if the SSD drive got wiped, there's no option to get those back unless you made a backup somewhere of all that before you installed XCP-ng onto it.

      BTW, another very useful command for LVM is "vgchange -ay" which will attempt to renew VG information if a VG seems missing or the like.

      posted in XCP-ng
      tjkreidlT
      tjkreidl
    • RE: How to Re-attach an SR

      @Chrome As M. Lambert says, you may be able to sue pbd-plug to re-attach the SR if you can sr-introduce the old SR back into the system.
      If not, and if your LVM configuration has not been wiped out, here are some steps t try to recover it (it's an ugly process!):

      1. Identify the LVM configuration:
        Check for Backups: Look for LVM metadata backups in /etc/lvm/archive/ or /etc/lvm/backup/.
        Use vgscan: This command will search for volume groups and their metadata.
        Use pvscan: This command will scan for physical volumes.
        Use lvs: This command will list logical volumes and their status.
        Use vgs: This command will list volume groups.
      2. Restore from Backup (if available):
        Find the Backup: Locate the LVM metadata backup file (e.g., /etc/lvm/backup/<vg_name>).
        Boot into Rescue Mode: If you're unable to access the system, boot into a rescue environment.
        Restore Metadata: Use vgcfgrestore to restore the LVM configuration.
      3. Recreate LVM Configuration (if no backup):
        Identify PVs: Use pvscan to list available physical volumes.
        Identify VGs: Use vgscan to identify volume groups if they are present.
        Recreate PVs: If necessary, use pvcreate to create physical volumes.
        Create VGs: If necessary, use vgcreate to create a new volume group.
        Create LVs: If necessary, use lvcreate to create logical volumes.
      4. Mount and Verify:
        Mount the Logical Volumes: Mount the restored LVM volumes to their respective mount points.
        Verify Data: Check the integrity of the data on the restored LVM volumes.
      5. Extend LVM (if adding capacity):
        Add a new disk: Ensure the new disk is recognized by the system.
        Create PV: Use pvcreate on the new disk.
        Add PV to VG: Use vgextend to add the PV to the volume group.
        Extend LV: Use lvextend to extend the size of an existing logical volume.
        Extend Filesystem: Use e2resize (for ext4) or resize2fs (for ext3) to extend the filesystem on the LV.
      posted in XCP-ng
      tjkreidlT
      tjkreidl

    Latest posts made by tjkreidl

    • RE: Too many snapshots

      @Pilow Agree.... have to be sure that garbage collection is completed or it'll never catch up if backups continue to be run without the coalesce completing.

      posted in Backup
      tjkreidlT
      tjkreidl
    • RE: Too many snapshots

      @Pilow The other thing to to consider is being cognizant of how long your backups typically take (or even, planning a worst-case condition) and defining the backup intervals accordingly.
      In other words, if you know you cannot consistently do your incremental backups in less than an hour, perform them 90 minutes or two hours between backups. It's better IMO to have a solid backup less frequently than have them fail on a regular basis.

      posted in Backup
      tjkreidlT
      tjkreidl
    • RE: Too many snapshots

      @Pilow Right, just skip the currently planned backup if a coalesce is still in progress and check again the next scheduled backup. This could potentially be implemented in the existing backup code.

      posted in Backup
      tjkreidlT
      tjkreidl
    • RE: Too many snapshots

      @Pilow Ah, right. You'd have to check the time stamp if you worked on automating this.
      So maybe @McHenry could write a script to do the backups and that way, ensure there was no on-going task in progress before kicking off the next backup instance.
      It could be run periodically from a cron job and if there's still on-going activity, just exit and try again the next time.

      posted in Backup
      tjkreidlT
      tjkreidl
    • RE: Too many snapshots

      @Pilow I agree, the error message is misleading and indeed, garbage collection can take some time to complete and likely in some cases to be greater than one hour.
      Is there the option to monitor garbage collection with task-list or some other utility? Because if so, one could write a script to kick off backups instead of using the cron pattern in the backup setting. Just a suggestion ...

      posted in Backup
      tjkreidlT
      tjkreidl
    • RE: Too many snapshots

      @McHenry Are you sure with that frequent running backups that each backup completes successfully before the next one starts? How long does the full backup typically take (less than 7 hours?) as well as the incrementals (under 1 hour?)? Again, I'd suggest looking in /var/log/SMlog for any error conditions that might help identify an issue. Also, how fragmented is your storage, as that can slow things down quite a bit, as can the lack of adequate CPU power as well as memory (run the top or xentop utility to view the load during backups).

      posted in Backup
      tjkreidlT
      tjkreidl
    • RE: Too many snapshots

      @Pilow Yes, you are correct that the chain length is also limited. You might try to manually delete some of the snapshots and though the limit is supposed to be 30, perhaps there are other factors involved? Does that VM have a particularly large amount of storage and a lot of changes between snapshots? Are any other of your VMs experiencing similar issues? Your SR appears to be mostly empty, correct? Are there any related errors showing up in /var/log/SMlog ?

      posted in Backup
      tjkreidlT
      tjkreidl
    • RE: Too many snapshots

      @Pilow Yes; it's been my understanding that this has been the default for many years now.

      posted in Backup
      tjkreidlT
      tjkreidl
    • RE: Backup Suddenly Failing

      @JSylvia007 Sorry, I'm really late to this thread, but note that backups can become problematic if the SR is something like 90% or more full. There needs to be some buffer for storage as part of the process. The fact you could copy/clone VMs means your SR is working OK, but backups are a different situation. If need be, you can always migrate VMs to other storage which is evidently what you ended up doing, which frees up extra disk space. Also backups are pretty intensive so make sure you have both enough CPU capacity and memory to handle the load. Finally. a defective SR will definitely cause issues if there are I/O errors, so watch your /var/log/SMlog for any such entries.

      posted in Backup
      tjkreidlT
      tjkreidl
    • RE: update: vGPU w NVIDIA Tesla P4

      @Aleksander WIth standard install 8.2 and associated drivers, it's reported that support exists for the following:
      Tesla M6/M10/M60, P4/P6/P40/P100, V100, T4, A2/A10/A16/A40, and RTX A5000/A6000/6000/8000 series.
      The appripriate NVIDIA licensing must of course also be obtained and installed, including a license server.

      posted in Compute
      tjkreidlT
      tjkreidl