Posts made by TS79 | XCP-ng and XO forum

TS79

@yzgulec as people have rightly said above, no hard and fast rule. But when it comes to sizing compute for mixed workloads (or unknown workloads) I have used a 1:4 (pCPU:vCPU) contention ratio.

I also try and size 2GB of memory per vCPU (so 8GB RAM per pCPU), and with this 'rough guide' I find that my CPU and memory consumption scale nicely.

Obviously it gets difficult when someone has a workload that needs very low CPU but high memory (e.g. I've seen a 2-core VM with 64GB memory) and conversely high CPU and low relative memory (e.g. 8-core VM with 4GB RAM)... but these are typically exceptions.

Please share your decisions and build configs here, I'd be interested to see how it worked out in terms of compute:memory, costs, etc.

TS79

@archw

Hi. The only input I have is around your first question.

When anything stores multiple smaller files (especially thousands of files per GB of data), you will have filesystem overhead on your storage device (backup target) and 'wasted space'. This could mean that a lot of additional capacity will be used up, and if you're already near capacity, that problem could scale up to where it becomes a big problem.

As a dumbed-down version using round numbers, for example:

on a filesystem with 4K block size, a 1-byte file (e.g. a TXT file with the letter "a" in it) will consume 4K of disk capacity.
If you scale this up to a thousand files, you are consuming 4,000,000 bytes of disk capacity, with only 1,000 bytes of data.

Also, if you are using any other apps/utils to scan, monitor, or sync at the filesystem level (for example a sync tool, or anti-malware, or checksums) - it will need to process many thousands of files instead of just a hundred or so. This will add latency to processing files.

Again, depends on scale, so another round-number example:

assume an app/util needs 200 milliseconds to open and close each file per operation
if you have 100 files, you have 20 seconds of 'wait time'.
If you have 1,000,000 files, you are looking at about 55 hours of 'wait time'.

Not a very realistic example, but just something to be aware of when you explode data into many, many smaller file containers.

TS79

@Andrew said in All drop-down options are empty:

XO install video

thanks Andrew - 100% agree
Tom's video is the exact one that introduced me to the ronivay script, which is what I've used for all XO-from-source installs without problems.

TS79

@manilx Sorry, I haven’t played around with sequences and retention yet, so I can’t help… Still learning But I’ll try play with it on the weekend and if I find an answer I’ll post here.

TS79

@manilx Hi - I can answer this quickly as was just helping someone else with mirror backups. Always check the official doc But yes, you will need to create a mirror job for replicating full backups, and another mirror job for replicating incremental backups.

Official tip (straight from the doc) is:
If you have full and incremental backups on a remote, you must configure 2 mirror backup jobs, one full and one incremental.

Hope that helps!

TS79

@archw Actually I think XO will clean up a failed backup - I’ve had a few failures but haven’t noticed any ‘junk’ backup clutter on my remote.

TS79

@archw That’s a good question - unfortunately I don’t know the answers as I’ve not used this mirror job before, and the XO documentation doesn’t explain it in a lot of technical detail. But definitely have a read at mirror backup doc

I’ll try setup a mirror job on the weekend to test how data flows, and how failed jobs would impact. I assume all data will flow via XO appliance and NFS remotes. Which is good, as it removes load from XCP hosts and their NICs (assuming XO appliance is installed outside that pool)

The positive aspect is that, even if the mirror fails AND somehow corrupts the remote-2 copy, you’ve still got the remote-1 healthy copy, and the actual source VM (so 2 healthy copies of your data).

Retrying a mirror job is much easier and lower impact than rerunning the actual VM backup

TS79

@archw One suggestion would be to break your backup jobs down into smaller units.

For example, set your existing backup jobs to write to only one of the two NFS remotes. This could reduce runtime and hopefully avoid any timeouts.

Then setup 'mirror' jobs to copy the backups from the first remote to the second remote. This way, your backup job can hopefully complete sooner, and you can optimise the traffic flow between hosts & remotes.

Example of the Mirror backup job below (please note that if you use incremental backups, you'll likely need a job to mirror Full backups, and another job to mirror Incremental backups). I don't use this feature myself, as my homelab isn't critical.

TS79

@archw can you share how large this VM's virtual disks are? It could be that the backup is timing out because of how long that specific VM/disk is taking to complete.

You mentioned NFS targets for the backup remote: what network throughput and NFS storage throughput do you have?

As a very rough indicator, I backup around 50GB of VMs, stored on host NVMe storage repository, over 2.5GbE to a TrueNAS NFS target and that takes around 10-12minutes.

Do you have a timeout set in the XO backup job's config? There is a field for that. If that's empty, then perhaps Vates can confirm if there's a default 4-hour timeout built into XO's code (sorry I can't check myself, I'm not code- or API-savvy).

TS79

@Andrew @JamfoFL - another fan of the ronivay script here, as it offers a rollback feature.

I noticed the same 'empty dropdown' problem; for me it was while trying to create a new backup job. I ran the script and rolled back, which fixed the issue and allowed me to complete the backup job.

I waited about a week, noticed that the number of commits behind had increased, then updated again. This time it all went through fine.

TS79

@deadman2141 Thanks for sharing the above - I'd forgotten that aspect, as I run XCP-ng in a homelab so have always configured that setting.

TS79

@deadman2141 did you change your host's IP address after installation? That TLS certificate error and reason seem to indicate that the cert doesn't match the IP.

I found the link which might be worth looking at: https://docs.xcp-ng.org/guides/TLS-certificates-xcpng/

TS79

EDIT: DISCLAIMER: I'm using the below config in a home lab, using spare parts, and where there are no financial or service-level consequences if it all burns down.

@TechGrips From one of your posts I see your XO is separate from the XCP-ng host (running on VirtualBox). I tried something similar, and went through some of the same pains as you when it came to backup remote (BR) setups. In the end, I used a spare mini-PC running Ubuntu, installed XO from sources, attached a USB drive formatted as EXT4, mounted it (/mnt/USB1 or something) and used that as a 'Local BR'.

I then noticed that I sometimes saw 0 bytes free - turned out that the USB drive was going to sleep. Without properly spending any time on power management, I wrote a cron job to simply write the date to a text file on /mnt/USB1 every 5 minutes... Lazy but it kept the USB drive awake and XO backups worked great.

The biggest risk is that, if the USB drive disconnects / fails / unmounts, the /mnt/USB1 folder still exists on the root filesystem, and could fill up if a backup job consumes all the free space - so definitely look into controls for that (perhaps quotes or something more intelligent than the lazy keepawake method I have)

Side note: please remember that many people on this forum speak languages other than English, so sometimes their written messages don't carry the full context or intention. Sorry to see you've felt condescended to. I've found it helps me keep positive to try assume that people's intentions are to help, even if the person's message seems blunt or odd, or if they're asking questions that seem to blame (they're probably just trying to get more info to help). We're all here to help each other, and I've only had good, helpful service from the Vates team (@danp specifically) to date.

TS79

@olivierlambert Any idea who can be tagged here to ask internally (I assume you meant someone from the Vates team)

TS79

Tried on Ubuntu 24.04 with kernel 6, same result: host and XCP/XOA show the device as xhci 5000Mbps, guest VM shows the device as ehci 480Mbps.

I don't have any time to fiddle, and some of this seems hardware specific, but some brief searching on keywords with Linux, EHCI, XHCI, etc. seem to mention a few possibilities: kernel driver being loaded wrong, guest EUFI vs BIOS (where BIOS initialises the device as USB 2.0), host BIOS (disabling legacy USB support - but this may endanger keyboard/mouse inputs), or host IOMMU settings.

TS79

Hi @Joe_dev

I'm no expert on this, but one thing I noticed is that on your host, the driver shows as "xhci" (which is the USB 3.x controller interface standard), whereas in the VM the driver is showing as "ehci" (which is the USB 2.x interface) and therefore has the 480Mbps bandwidth limitation.

Sadly I've got little to no experience with USB or PCIe passthrough (haven't had to use them yet), but hopefully this can point someone in the right direction to troubleshoot. My guess is, either the VM's OS has a limitation, or there's a VM setting at XCP-ng level, or the XCP-ng host itself has a setting or device passthrough challenge. Sorry I can't be more helpful.

EDIT: I just plugged a USB 3.0 storage device into my XCP-ng host, went to the VM settings and added a VUSB device from the PUSB. Within Xen Orchestra, the USB device shows as 5000Mbps speed, but within the VM it shows as 480Mbps. VM's OS is Ubuntu 22.04.5. I'm going to test on a newer Ubuntu 24.04 version - will reply on this thread.

TS79

Hi @Chemikant784

Conhost process is related to command-line / console apps. It could be that something (either a server role in Win Server 2025, or perhaps the Citrix/XCP-ng guest tools, or at worst malware) is stuck in a loop and spawning these multiple instances.

If you absolutely need to salvage this particular OS install, perhaps hunt through the process IDs to find what is spawning them, or setup some debug/trace tools to determine the same. Or try remove roles / software one at a time to see which one, if any, stop the conhost spawning.

EDIT: must have missed your previous post where you've nailed it down to the ADDC role. If you typically install the role via GUI, switch to PowerShell to see if the same problem persists, or vice versa, if you typically use PowerShell, switch to the GUI

Beyond that, I have no insights except to point at the current "Preview" status of Win2025. I'm not familiar with which OS components (not to mention server roles, frameworks, etc.) Microsoft have re-used and which are entirely new.

Also, sadly, conhost process issues have been a sporadic problem in Windows for decades...

TS79

@stevewest15 Just did the same (well, using the ronivay script to update) - still had the same error adding NFS storage to the host. Noticed the following oddities:

I only typed in the IP address of the NFS server, then immediately selected the NFS version 4.1 before clicking anything else
After hitting the "search" icon to the right of the NFS Server's IP address, the only Path option was /mnt

This is NOT expected, as in TrueNAS Scale, the NFS export is set to /mnt/datapool/NFS-XEN1

Clicking the "search" icon to the right of the Subdirectory field didn't show any results, so I entered the datapool/NFS-XEN1 manually
This kicked up an SR creation error
I then set the NFS version to 'Default' and clicked Create again - this kicked up a different error specifically saying that NFS version 3 failed
I changed back to NFS 4.1, clicked Create again, and this time it worked...

Very sporadic behaviour, and not a problem I've had until recently. Many previous months of testing with NFS have all added perfectly (once I had the TrueNAS permissions set correctly).

TS79

@stevewest15 Hey dude, thanks for feeding back all that info. Is your new host part of a pool or is it a standalone host?
The fact that when you browse paths, you can see the 2 existing SR IDs means that NFS is working - at least to read the folder.
Let's ignore what ChatLLM said about the error - but it is a fair point. If the host is part of a pool, all physical networking needs to be identical across the hosts (for example, ETH0 is for LAN, ETH1 is for storage, etc.)

Interesting point: I just tried to add an NFS storage to my homelab host (standalone), and I'm having the same problem on TrueNAS Scale - I create the NFS share, set the permissions correctly, but when I go to add it in XO, I also cannot see folders correctly and it won't let me add!

@olivierlambert - perhaps a bug in recent XO(A) updates? I deploy & update mine from source using the ronivay script, onto Ubuntu 22.04 LTS. Not sure when the potential bug has surfaced though, as I don't often try add new NFS storage at home...