I believe this issue was resolved when the health check system was changed to detect network connectivity at startup so it did not need to wait for then entire VM to boot. Needs the Xen tools to be installed. I have not had an issue since this change.
Best posts made by McHenry
-
RE: Backup Issue: "timeout reached while waiting for OpaqueRef"
-
RE: Large incremental backups
The server had high memory usage so I expect lots of paging, which could explain the block writes. I've increased the mem and want to see what difference that makes.
-
RE: Job canceled to protect the VDI chain
Host started and issue resolved.
-
RE: Disaster Recovery hardware compatibility
Results are in...
Four VMs migrated. Three using warm migration and all worked. 4th used straight migration and BSOD but worked after a reboot.
-
RE: ZFS for a backup server
Thanks Oliver. We have used GFS with Veeam previously and will be a great addition.
-
RE: VM association with shared storage
Perfect thanks. The issue is we have an IP address locked to that host so the router needs to live there. The host affinity looks like the correct solution.

Does host affinity also prevent the VM being migrated manually?
-
RE: Alarms in XO
This host does not run any VMs, just used for CR
I've increased the dom0 ram to 4GB with no more alarms.

-
RE: Windows11 VMs failing to boot
Thank you so much. If you want me I'll be at the pub.
-
RE: Zabbix on xcp-ng
We have successfully installed using:
rpm -Uvh https://repo.zabbix.com/zabbix/7.0/rhel/7/x86_64/zabbix-release-latest.el7.noarch.rpm yum install zabbix-agent2 zabbix-agent2-plugin-* --enablerepo=base,updates -
RE: Migrating a single host to an existing pool
Worked perfectly. Thanks guys.
Latest posts made by McHenry
-
RE: All VMs down
What a day...
Have two hosts and needed a RAM upg so scheduled these with OVH, one for 4pm and the other for 8pm. Plan was to migrate the VMs off the host being upgraded to maintain uptime.
Got an email from OVH stating that due to parts availability the upg will not proceed and then they downed both hosts at 8pm to upg at the same time and that caused havoc today.
-
All VMs down
I have an issue with VMs appearing to be locked and showing as running however not running.
I am unable to stop these VMs. I have tried shutting down the host HST106
The VMs are appearing to be running on HST106 however HST106 is now powered off.


I need to start these VMs on HST107 instead.
-
pfSense Guest Tools
I have been using pfSense with xcp-ng for a while now without installing the guest tools.
Due to some networking complications I have decided to install the guest tools to eliminate this as the cause.
Q1) Are the guest tools required on pfSense and what do they do?
Q2) Are these tools being maintained?

-
RE: Job canceled to protect the VDI chain
Host started and issue resolved.
-
RE: Job canceled to protect the VDI chain
Appears to be the same as:
https://xcp-ng.org/forum/topic/1751/smgc-stuck-with-xcp-ng-8-0?_=1761802212787It appears this snapshot is locked by a slave host that is currently offline.
Oct 30 08:30:09 HST106 SMGC: [1866514] Checking with slave: ('OpaqueRef:16797af5-c5d1-08d5-0e26-e17149c2807b', 'nfs-on-slave', 'check'When using shared storage how does a snapshot become locked by a host?
Is the scenario where a slave host is offline how can this lock be cleared?
-
RE: Job canceled to protect the VDI chain
As per yesterday, the backups are still being "Skipped". Checking the logs I see the following message being repeated:
Oct 30 08:30:09 HST106 SMGC: [1866514] Found 1 orphaned vdis Oct 30 08:30:09 HST106 SM: [1866514] lock: tried lock /var/lock/sm/be743b1c-7803-1943-0a70-baf5fcbfeaaf/sr, acquired: True (exists: True) Oct 30 08:30:09 HST106 SMGC: [1866514] Found 1 VDIs for deletion: Oct 30 08:30:09 HST106 SMGC: [1866514] *d4a17b38(100.000G/21.652G?) Oct 30 08:30:09 HST106 SMGC: [1866514] Deleting unlinked VDI *d4a17b38(100.000G/21.652G?) Oct 30 08:30:09 HST106 SMGC: [1866514] Checking with slave: ('OpaqueRef:16797af5-c5d1-08d5-0e26-e17149c2807b', 'nfs-on-slave', 'check', {'path': '/var/run/sr-mount/be743b1c-7803-1943-0a70-baf5fcbfeaaf/d4a17b38-5a3c-438a-b394-fcbb64784499.vhd'}) Oct 30 08:30:09 HST106 SM: [1866514] lock: released /var/lock/sm/be743b1c-7803-1943-0a70-baf5fcbfeaaf/sr Oct 30 08:30:09 HST106 SM: [1866514] lock: released /var/lock/sm/be743b1c-7803-1943-0a70-baf5fcbfeaaf/running Oct 30 08:30:09 HST106 SMGC: [1866514] GC process exiting, no work left Oct 30 08:30:09 HST106 SM: [1866514] lock: released /var/lock/sm/be743b1c-7803-1943-0a70-baf5fcbfeaaf/gc_active Oct 30 08:30:09 HST106 SMGC: [1866514] In cleanup Oct 30 08:30:09 HST106 SMGC: [1866514] SR be74 ('Shared NAS002') (166 VDIs in 27 VHD trees): no changes Oct 30 08:30:09 HST106 SM: [1866514] lock: closed /var/lock/sm/be743b1c-7803-1943-0a70-baf5fcbfeaaf/running Oct 30 08:30:09 HST106 SM: [1866514] lock: closed /var/lock/sm/be743b1c-7803-1943-0a70-baf5fcbfeaaf/gc_active Oct 30 08:30:09 HST106 SM: [1866514] lock: closed /var/lock/sm/be743b1c-7803-1943-0a70-baf5fcbfeaaf/srIt appears the unlinked VDI is never deleted. Could this be blocking and should this be deleted manually?
Deleting unlinked VDI *d4a17b38(100.000G/21.652G?)In regards to the following line, I can identify the VM UUID however is the 2nd UUID a snapshot? (d4a17b38-5a3c-438a-b394-fcbb64784499.vhd)
Oct 30 08:30:09 HST106 SMGC: [1866514] Checking with slave: ('OpaqueRef:16797af5-c5d1-08d5-0e26-e17149c2807b', 'nfs-on-slave', 'check', {'path': '/var/run/sr-mount/be743b1c-7803-1943-0a70-baf5fcbfeaaf/d4a17b38-5a3c-438a-b394-fcbb64784499.vhd'}) -
RE: Job canceled to protect the VDI chain
I have the following entry in the logs, over and over. Not sure if this is a problem:
Oct 29 15:25:08 HST106 SMGC: [1009624] Found 1 orphaned vdis Oct 29 15:25:08 HST106 SM: [1009624] lock: tried lock /var/lock/sm/be743b1c-7803-1943-0a70-baf5fcbfeaaf/sr, acquired: True (exists: True) Oct 29 15:25:08 HST106 SMGC: [1009624] Found 1 VDIs for deletion: Oct 29 15:25:08 HST106 SMGC: [1009624] *d4a17b38(100.000G/21.652G?) Oct 29 15:25:08 HST106 SMGC: [1009624] Deleting unlinked VDI *d4a17b38(100.000G/21.652G?) Oct 29 15:25:08 HST106 SMGC: [1009624] Checking with slave: ('OpaqueRef:16797af5-c5d1-08d5-0e26-e17149c2807b', 'nfs-on-slave', 'check', {'path': '/var/run/sr-mount/be743b1c-7803-1943-0a70-baf5fcbfeaaf/d4a17b38-5a3c-438a-b394-fcbb64784499.vhd'}) Oct 29 15:25:08 HST106 SM: [1009624] lock: released /var/lock/sm/be743b1c-7803-1943-0a70-baf5fcbfeaaf/sr Oct 29 15:25:08 HST106 SM: [1009624] lock: released /var/lock/sm/be743b1c-7803-1943-0a70-baf5fcbfeaaf/running Oct 29 15:25:08 HST106 SMGC: [1009624] GC process exiting, no work left Oct 29 15:25:08 HST106 SM: [1009624] lock: released /var/lock/sm/be743b1c-7803-1943-0a70-baf5fcbfeaaf/gc_active Oct 29 15:25:08 HST106 SMGC: [1009624] In cleanup Oct 29 15:25:08 HST106 SMGC: [1009624] SR be74 ('Shared NAS002') (166 VDIs in 27 VHD trees): no changes Oct 29 15:25:08 HST106 SM: [1009624] lock: closed /var/lock/sm/be743b1c-7803-1943-0a70-baf5fcbfeaaf/running Oct 29 15:25:08 HST106 SM: [1009624] lock: closed /var/lock/sm/be743b1c-7803-1943-0a70-baf5fcbfeaaf/gc_active Oct 29 15:25:08 HST106 SM: [1009624] lock: closed /var/lock/sm/be743b1c-7803-1943-0a70-baf5fcbfeaaf/sr -
RE: Job canceled to protect the VDI chain
I spoke too soon. The backups started working however the problem has returned.

I do see 44 items waiting to coalesce. This is new as these would coalesce faster previously without causing this issue.

Is there a reason the coalesce is taking longer now or is there a way I can add resources to speed up the process?
-
RE: Job canceled to protect the VDI chain
Is it XO or xcp-ng that manages the coalescing? Can more resources be applied to assist?
