Posts made by kagbasi-wgsdac | XCP-ng and XO forum

kagbasi-wgsdac

@johnny I think the Vates VMS (whether you go with paid support or not) would fit your use case perfectly. Since each host is essentially a pool of one, you would pretty much have multiple pools managed by a single Xen Orchestra instance (giving you a single pane of glass, if you will). You could augment this with XCP-ng Center.

As @olivierlambert correctly inquired, you should contact the Vates Team and have an in-depth discussion into your situation.

kagbasi-wgsdac

@olivierlambert Yes sir, it is and I'm glad I confirmed this for myself. Thanks also for helping me understand how the VM Suspend process works. Hopefully this post helps other newbies with the same understanding in the future.

kagbasi-wgsdac

Here's the latest, and probably last, test. Disabling compression had no appreciable impact on performance. I am now fully convinced that SYNC is the major player here.

Screenshot 2025-07-23 012851.png

kagbasi-wgsdac

@olivierlambert Oh okay, thanks for responding.

So I turned off SYNC and COMPRESSION on the dataset and retested (by suspending 11 VMs), I immediately noticed a whopping performance improvement (essentially sustained wire speeds AND 50% faster completion time*) :

Roughly 984 Mb/s sustained WRITE speeds (during VM suspension)
Roughly 984 Mb/s sustained READ speeds (during VM resumption)
Transfer time for both READ and WRITE is about 20 minutes (down from 40-45 mins)

Screenshot 2025-07-22 170106.png

Gonna retest with SYNC disabled and COMPRESSION re-enabled and see if it degrades performance; standby for another report.

kagbasi-wgsdac

@olivierlambert Yes, I do plan on testing with SYNC disabled and then again with several permutations of dataset changes on the TrueNAS side (like compression on/off, etc.).

Do you guys have a best practices document for setting up an NFS SR using TrueNAS? I browsed through the published XCP-ng documentation site but didn't find anything specific to TrueNAS or maybe I missed it.

kagbasi-wgsdac

Sharing this so others might benefit from what I'm learning.

So, I looked at the network performance on TrueNAS during the Smart Reboot of the second XCP-ng host (screenshot below). What I saw seems to suggest that I'm getting near wire speed during READ operations. However, WRITE operations seem to be hitting a ceiling and I have a feeling it might be due to me having SYNC enabled on the dataset.

Screenshot 2025-07-22 122432.png

kagbasi-wgsdac

@olivierlambert Aaah, I've always wondered what the Suspend SR on the Advanced tab of the Pool meant......now I know.

So, the NIC capacity/pipe between the hosts and the SR does really matter here.

kagbasi-wgsdac

Good-day Folks,

I need help understanding the VM Suspend process: why does it take so long for XO to suspend a VM, triggered by a Smart Reboot of a Host?

My Environment:

HOSTs: XCP-ng 2-node pool at v8.3.0 on HP (ProLiant DL360p Gen8)
XO: Community Edition at commit c5ba7
NETWORKING: 1Gbps Management only
STORAGE: Shared NFS Storage Repository (hosted on a separate TrueNAS Server)

Today, while applying the latest host patches, I wanted to try doing a Smart Reboot. After pressing the button and acknowledging the prompt that VMs will be suspended and then un-suspended after the reboot, and quickly navigated to the Tasks page to monitor the progress. I immediately saw one VM quickly show progress from 0%, 10%, 30%.......100% and boom, it was suspended. The others, however, not so much. As you can see from the two screenshots below, they are still pending suspension and the Estimated End keeps shifting to the right.

I don't have a 10Gb Storage Network between the hosts yet (working on it). However, I didn't think that should have such an impact. Anyway, I don't think I have a proper understanding of how the suspend operation should be working, so if anyone cares to educate me, I would really appreciate it. Thank you.

Screenshot Taking After Pressing Smart Reboot on the Master Host
Screenshot 2025-07-22 091604.png

Screenshot Taking After Pressing Smart Reboot on the Master Host (10-minutes later)
Screenshot 2025-07-22 092641.png

kagbasi-wgsdac

@olivierlambert Yep.

Here's the video of the test - https://www.youtube.com/watch?v=tiVdR74PNjw

kagbasi-wgsdac

@olivierlambert Just ran the test on both XOCE and XOA, and looks like it takes about 8 minutes to get to 8.6GB downloaded before the transfer rate drops to zero. I also observed the transfer rate drop to zero at 8.3GB for a short while before picking back up and stalling at the 8 minute mark.

I made a video of the test - it's uploading to my YouTube channel now, when it finishes I'll update this post with the link.

kagbasi-wgsdac

@olivierlambert I didn't time it. I'm leaving home to drop the kids off at school, then off to work afterwards. I'll run it again and time it and report back.

kagbasi-wgsdac

@olivierlambert You were right. Looks like while we were troubleshooting this issue (about two days ago), I ran an xe vm-export command, though it failed, enough of the XVA file got created and that was sufficient to fill the partition up.

This didn't become obvious until I opened XCP-ng Center today and saw the following alarm:

Disk Usage for Control Domain on server 'WGSDAC-SV-VMH01' has reached 92%. 
XCP-NG's performance will be critically affected if this disk becomes full. 
Log files or other non-essential (user created) files should be removed.

So I deleted the culprit file and freed up the storage space, and now the export is downloading. However, it's timing out before all the file is downloaded. It gets all the way up to about 8.6 GB downloaded then the transfer rate drops to 0 B/s.

I suspect this is due to the file being deleted on the host before the transfer completes. Is there a quick way to increase the timeout?

From XOCE Instance:
Screenshot 2024-12-12 123824.png

From XOA Instance:
Screenshot 2024-12-12 143105.png

kagbasi-wgsdac

@Danp Unfortunately, I don't see any errors in the log of the VM or on the Host. Within the browser's console, the only thing that comes up is the following:

Screenshot 2024-12-10 163035.png

Fortunately, when I attempt the operation via the xe cli, it results in the following error:

Screenshot 2024-12-10 170021.png

Yet the XOCE pool tasks showed that it completed successfully:

Screenshot 2024-12-10 165607.png

kagbasi-wgsdac

@Danp As requested:

My Use Case:
I am performing a Proof-of-Concept within an air-gapped environment at work. One of tests I'm performing is to validate the VM Export/Import functionality. So I exported this VM from the air-gapped lab environment I've setup at work and brought it home to test if I could import into the pool running at my church (on a physical host). However, it suffered a catastrophic drive failure and I ended up having to rebuild it. With the rebuild done, I was able to import the VM successfully. Now I'm attempting to export it so I can take it back to work and import it, to complete my testing.

As requested, neither clicking the Download VM link nor the OK button result in a successful download. I'm using Chrome, and have also tried Edge. I don't have Firefox on my PC, but I could install and test if you think it's necessary.

kagbasi-wgsdac

@Danp Just got in the car to drive to work. I'll try doing what you've asked when I get to work (in about an hour).

kagbasi-wgsdac

@Danp Yes, understood - but a short period of time is a rather vague statement, don't you think?

I tried doing the download immediately after I copied the link, failed each time. So I decided to wait for the export task to complete then execute the download - it also fails. Do you know where the xva file is stored temporarily? Perhaps I can access the file system and move it to another location before it gets deleted.

I recall that when I exported this VM from the test lab at work, I experienced a similar issue and I simply paused the download (within the browser) and then resumed it, and that's how I got the transfer to start. Unfortunately, I wasn't tracking the precise timing of that, as it seemed rather random. I also tried that on this system, to no avail.

Also, does the temporary file get stored on XO or on one of the hosts?

kagbasi-wgsdac

@Danp Of course, thanks for replying.

I go to the VM, and click on the export button at the top-right (screenshot #1). Then I set Type = xva and Compression = Zstd (screenshot #2), then click OK, copy the download link and wait for the task to complete, then I paste the link into a new browser tab and execute.

SCREENSHOT #1:
Screenshot 2024-12-10 071756.png

SCREENSHOT #2:
Screenshot 2024-12-10 071851.png

kagbasi-wgsdac

My Environment:

Xen Orchestra, commit c7657 (Master, commit c7657)
XCP-ng v8.3

Good-day Folks,

I am attempting to export a VM and not having much success downloading the exported file. The export tasks runs for about 10 minutes and appears to have completed successfully (screenshot #1), however, when I attempt to download it I get an error (screenshot #2).

Anybody seen this before and have any thoughts to share?

SCREENSHOT #1:
Screenshot 2024-12-10 062433.png

SCREENSHOT #2:
Screenshot 2024-12-10 063439.png

kagbasi-wgsdac

Good evening all,

Just a quick update. It's been a couple of days now after the rebuild and everything seems to be humming along fine, so I believe this topic can be marked as resolved. I can confidently conclude that this wasn't an XCP-ng issue, although the error message seems a bit misleading.

kagbasi-wgsdac

Unfortunately, looks like my issue was ultimately a failed disk in a RAID 0 array. The errors XCP-ng was throwing were definitely misleading.

I just rebuilt the array as RAID 10 and have reinstalled XCP-ng v8.3. I should have the entire virtual infrastructure rebuilt in no time, before services this weekend.