XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login
    1. Home
    2. planedrop
    Offline
    • Profile
    • Following 0
    • Followers 0
    • Topics 27
    • Posts 386
    • Groups 1

    planedrop

    @planedrop

    Top contributor
    88
    Reputation
    89
    Profile views
    386
    Posts
    0
    Followers
    0
    Following
    Joined
    Last Online
    Age 29
    Location Portland, OR

    planedrop Unfollow Follow
    Top contributor

    Best posts made by planedrop

    • RE: Veeam and XCP-ng

      jasonnix I've done extensive testing with this myself, first and foremost, Veeam is the one that would have to support it, not vice versa.

      Second, it would be best to use XO for the backups, it's much more fluid and is fully integrated, I've been doing this for some time and it's been excellent in multiple production setups.

      I also have tested using Veeam via agents within the VMs themselves (this was just for test purposes, I'd still not really recommend it) and it worked exactly as expected.

      Using XO for this is still better though, it's generally faster, easier to setup, more reliable, and much faster and easier to recover from backups.

      If you are considering this as a comparison to VMware, it's worth noting that it's not really a positive thing that VMware requires you buy a separate product entirely in order to handle backups.

      posted in XCP-ng
      planedropP
      planedrop
    • RE: XO - Restore Health Check

      I can confirm this is the case for me too, not a huge deal, but would be kinda nice if it could keep track of the name.

      posted in Advanced features
      planedropP
      planedrop
    • RE: Server Locks Up Periodically with ASRock X570D4I-2T AMD Ryzen 9 3900X and Intel X550-AT2

      olivierlambert Yup, I've had exactly that a few times, usually on used boards.

      R2rho if possible, however annoying, I would also take the CPU out and check for pins on the motherboard being bent with a flashlight.

      posted in XCP-ng
      planedropP
      planedrop
    • RE: VM Templates does choosing correct one matter?

      IIRC the templates help define some of the UEFI specs and things like that, generally speaking though using something similar to what you're deploying, even if not the same version (i.e. Ubuntu 20.04 template for a Ubuntu 23.10) should be functional, at least in my experience this has never created an issue.

      posted in XCP-ng
      planedropP
      planedrop
    • RE: Backup Emails Don't Send If Backups Fail Due To License Issues

      olivierlambert issue added, it's my first time doing it on GH so apologies if I missed anything or put it in the wrong category.

      https://github.com/vatesfr/xen-orchestra/issues/7893

      planedrop created this issue in vatesfr/xen-orchestra

      closed Add Email Notification For License Issues To Prevent Backup Failure #7893

      posted in Backup
      planedropP
      planedrop
    • Failing Backups: Trying To Find Root Cause

      This may be something for me to put a ticket in for, but I wanted to try and post here and do it publicly first since it could benefit others.

      One of the environments I am managing has consistent backup failures and I haven't been able to get to the root cause of them, this post will probably be long with lots of details. The short of it is that I think it's only happening to large VMs, but I can't figure out why, the majority fail on "clean VM directory" and show missing VHDs or missing parent VHDs.

      To start, this setup has 2 backups that run for all VMs on a nightly basis, one is uploaded to Backblaze and another is sent over SMB to a TrueNAS machine.

      I have a similar setup in my lab at home, and it's not failed once, never ever. But all my VMs are under 100GB, this other environment has some that are more than 2TB, which is why I am starting to think that is the root cause.

      XOA version is at 5.93.1, so not 100% up to date (will update shortly), but this has been an ongoing issue for months now so I don't think it's a version specific thing.

      Backup Schedules

      First wanted to explain my schedules in details, then will go into the errors we are seeing.

      Both schedules backup the same number of VMs, 2 of which are slightly over 2TB in size (several VHDs).

      Backblaze Backup

      • This one is setup to run every night
      • Concurrency of 2
      • Timeout of 72 hours (since they are large I set the timeout very big, but usually this finishes within a few hours, sometimes taking like 10)
      • Full Backup Interval is 15
      • NBD is enabled and set as 4 per disk
      • Speed is limited to 500MiB/s (this is never hit though)
      • Snapshot is normal
      • Schedule is set to run ever weekday at 5PM with a retention of 14 and force full backup disabled
      • Worth noting these B2 bucket settings are:
      • Lifecycle is set to keep only the last version of the file (plan is to adjust this more later)
      • Object lock is enabled but no default set, so nothing should be getting locked

      SMB NAS Backup

      • Concurrency of 1
      • Full Backup Interval of 30
      • NBD is disabled, number of connections is 1
      • Snapshot mode is normal
      • Schedule is set to run every weekday at 8PM with a retention of 7
      • This NAS does do backups of this VM directory (an additional backup I run) but those start at 7PM and I have it set to snapshot the dataset before backing it up, so in theory anything XCP-ng is touching shouldn't be messed with
        • I've been able to confirm TrueNAS's "snapshot first" feature (which runs before the backup starts) takes a snapshot, backs up the data of that snapshot, then deletes the snapshot, this whole thing is to prevent file locking on a directory that has other things accessing it

      I know the backup retention periods etc.. are a bit odd here, if we think that could be causing an issue I'm happy to adjust them, was planning on reworking retention sometime soon anyway. But as far as I can tell it shouldn't cause a major problem.

      The Errors

      Backblaze

      • Several VMs, including smaller ones are seeing this issue, which maybe means my thoughts about this being a large VM specific issue are wrong?
      • It always happens during the clean VM directory process
      • Last log I have is 3 VMs with the below:
        • UUID is Duplicated
        • Orphan Merge State
        • Parent VHD is missing (several times for each VM)
        • Unexpected number of entries in backup cache
        • Some VHDs linked to the backup are missing
      • On all of these the Backblaze "transfer" section of the logs is green and successful, but the clean VM directory is not, seems the merge is failing
      • Retrying VMs will sometimes work but other times will just fail again

      SMB

      • Only seems to happen with big VMs, they will work fine for a while (several weeks) then start erroring out
      • The only fix I've found is to wipe the entire VMs directory on the NAS so the backup starts fresh
      • The error is always parent VHD is missing (with a path to a VHD that as far as I can tell exists)
      • Then followed by a "EBUSY: resourece busy or locked, unlink (vhd path)"
      • It's always a VHD that starts with a period, so ".2024**********.vhd"
      • Checking the NAS via shell and the file definitely exists and has the same permissions on it as everything else in the directory
      • Now another super interesting thing is, if I go to the VM Restore page, select the one that failed SMB, it will show no original key backup like so (top/most recent to bottom):
        • Incremental
        • Incremental
        • Incremental
        • Incremental
        • Key
        • Incremental
        • Incremental

      So as you can see, no original Key for the last 2 incrementals

      Any ideas as to what could be causing this? I'm thinking they might be 2 entirely separate issues, it's just odd that they're both happening.

      I will do what I can to troubleshoot this directly as well and update this post with anything else I find.

      posted in Backup
      planedropP
      planedrop
    • RE: Delta backup questions

      I can confirm NFS is great on XCP-ng, would definitely encourage you got that direction, TBH FC and iSCSI are a tad outdated. There are still good use cases for them but NFS is the thing I'd always aim for in this setup.

      And like Danp said, if it's thin provisioned, then no it won't be using double the space.

      posted in Backup
      planedropP
      planedrop
    • RE: Backup Feature

      If you want to work with the backup features, but aren't using this in production, then you can compile XO from the sources and deploy it that way, then use the backup features.

      If this is for production though, would def recommend getting support (so purchasing XOA) in case something goes wrong.

      posted in Backup
      planedropP
      planedrop
    • RE: Cloud Backups Directly to BackBlaze B2

      I may be able to help out here a bit, I've done a lot of testing and production backups to B2.

      Naming wise, Olivier is right, B2 is "S3 Compatible" which is why it works but it's not "officially" supported in that way. However since the S3 compatibility "layer" in B2 has been super solid, backing up to it using S3 protocol should be fine.

      As for your question about differentials, whatever backup jobs you setup are what will be put in B2. So if you setup full backups and select B2 as the remote for it, then it's full backups, etc....

      I would recommend having your B2 backups be a different job than any local backups though (if you're doing local for faster restore), it's nice having control separately.

      posted in Backup
      planedropP
      planedrop
    • RE: Explanation of backup tags on Restore UI?

      CJ I think there is still some misunderstanding.

      A key backup is part of an incremental backup, it's the full of an an incremental.

      When you setup an incremental backup there still has to be an initial full backup, then the increments start, this is called the Key backup.

      So say you have a Incremental backup setup with a periodic full every 5 backups, you'd see something like this from start:

      Key > Increment > Increment > Increment > Increment > New Key > Increment

      Each key is a "full" backup but they are called Keys because they are part of the incremental chain so they aren't standalone.

      posted in Backup
      planedropP
      planedrop

    Latest posts made by planedrop

    • RE: Cannot Import VMDK Through Import > Disk (migrating from ESXi, all methods not working)

      piotrlotr1 Most VMDK's should work, but the V2V tool is really what is meant for this. It lets you warm migrate from VMware to XCP-ng with very little downtime.

      posted in Xen Orchestra
      planedropP
      planedrop
    • RE: Cannot Import VMDK Through Import > Disk (migrating from ESXi, all methods not working)

      piotrlotr1 Maybe I missed some context in this thread, so apologies if I did.

      But the V2V tool should handle this, is there a reason you are wanting to do it as a VMDK import instead?

      I've used the V2V a lot and it works quite well.

      posted in Xen Orchestra
      planedropP
      planedrop
    • RE: Veeam and XCP-ng

      MAnon This is a valid point actually, and without additional work, you couldn't just restore to another hypervisor.

      However, check this blog post: https://xen-orchestra.com/blog/xen-orchestra-5-100/?utm_campaign=mail_5.100&utm_term=logo&ct=YTo1OntzOjY6InNvdXJjZSI7YToyOntpOjA7czo1OiJlbWFpbCI7aToxO2k6NjU7fXM6NToiZW1haWwiO2k6NjU7czo0OiJzdGF0IjtzOjIyOiI2NzIzODI1NDE4ZjVmMjE5NDI2OTYwIjtzOjQ6ImxlYWQiO3M6NToiODM5ODciO3M6NzoiY2hhbm5lbCI7YToxOntzOjU6ImVtYWlsIjtpOjY1O319

      Veeam is likely going to properly support XCP-ng.

      And for what it's worth, you can use agent based Veeam backups in the VMs and that works fine.

      posted in XCP-ng
      planedropP
      planedrop
    • RE: Memory reporting incorrect values

      jebrown OK good to know. This may be different on different OSes though, Windows may just report actually used RAM and leave out anything that is cached.

      posted in Compute
      planedropP
      planedrop
    • RE: Memory reporting incorrect values

      fred974 This is normal behavior, most modern OSes will use up as much RAM as they possibly can for caching, unused RAM is wasted RAM.

      I'd adjust your alert settings if you're getting high RAM usage alerts. You also could change things in the OS to prevent as much RAM being used for caching, but I'd avoid that if possible.

      The OS reports, via the tools, to the hypervisor about it's RAM usage, so if the OS is caching a lot, and therefore using most of it's RAM, it will report that to the hypervisor.

      I think, could be wrong, Windows doesn't report cached RAM via the tools so it may show "real" RAM usage, but this is OS specific stuff and not really XCP-ng or any hypervisor specific stuff.

      posted in Compute
      planedropP
      planedrop
    • RE: Wide VMs on XCP-ng

      plaidypus Ah gotcha, this makes sense.

      I second scaling out instead of up.

      If you're getting new hosts, I'd also keep in mind newer CPUs do have much higher per core performance (not sure what your current stuff is), so you also might be able to get away with less vCPUs and lower likelihood of NUMA spanning.

      Either way though I think scaling out is the better direction to go.

      posted in XCP-ng
      planedropP
      planedrop
    • RE: Wide VMs on XCP-ng

      plaidypus High CPU usage doesn't necessarily mean that NUMA spanning will be much of an issue, really comes down to latency at that point. I'd say you should be OK to just go with it, I get the hesitation though.

      posted in XCP-ng
      planedropP
      planedrop
    • RE: Intel Flex GPU with SR-IOV for GPU accelarated VDIs

      jebrown Yeah I just meant in general, not specific to you. I think most businesses aren't looking for this kind of specific workload so the demand isn't very high for it.

      Are you currently using Intel GPUs with Flex for this use case? Or NVidia right now and just looking to change? It might be worth considering leaving VDI, not sure if you're in the position to be able to argue that, but there are often better solutions now.

      Either way, I would personally also love to see this, I just think there are other things that more companies are asking for from XCP-ng right now.

      posted in Hardware
      planedropP
      planedrop
    • RE: Wide VMs on XCP-ng

      plaidypus I think the real question you should ask yourself is: do your workloads actually need to worry about the extra latency from a NUMA node span?

      If you're not doing something pretty darn extreme here, I don't think it really will matter. There has been lots of talk for decades about this on VMware, but I just don't think it's that relevant anymore. Latency between sockets has gotten pretty good, so unless it's some really special workload, I don't think you should worry about this much.

      The best way to validate is to just test and see how things go.

      But do you have info on what workload this VM will be running?

      posted in XCP-ng
      planedropP
      planedrop
    • RE: Intel Flex GPU with SR-IOV for GPU accelarated VDIs

      JamesG There probably is interest from homelab people, but for use in production setups, I don't really see a lot of businesses needing it.

      VDI isn't really used that much by businesses now (at least the ones that do use it are slowly moving off it) and most that do use it don't need GPU acceleration. Often times it's for pretty basic applications, medical record systems, etc...

      So I think that is where the lack of demand is coming from.

      posted in Hardware
      planedropP
      planedrop