Backup of Windows VM failing Health Check [Solved]
-
Issue Encountered:
A backup of a Windows VM failed a health check overnight, with the error message:
waitObjectState: timeout reached before OpaqueRef.Manual Check:
Performed a manual backup and health check, which resulted in the same error.
Diagnosis:
Connected to the VM using Xen Orchestra and monitored the console. Discovered that a Windows update was being applied during the VM boot process, which was causing the VM to not boot correctly from the perspective of Xen Orchestra (XO).
Resolution:
By applying the outstanding Windows update and rebooting the VM, the issue was resolved, and the backup and health check were successful afterward.
The problem was essentially due to the VM being in an update state during the health check, which prevented XO from properly detecting and managing the VM's state. Applying the update and rebooting ensured the VM was in a proper state for backup and health check operations.
-
You can raise the timeout, but yes, I suppose that Windows Update could cause issues because the VM won't boot completely during the restore check
-
How can the timeout be raised?
I have a Windows server VM that is slow to boot and eventually starts however the health check reports a timeout "timeout reached while waiting for OpaqueRef..."
-
In the XO config file:
[backups] healthCheckTimeout = '10m' # the default
-
I still have the health check failing after 10 mins. Can I confirm the syntax. I have tried:
- healthCheckTimeout = '15m'
- healthCheckTimeout = '15 minutes'
-
@McHenry can you try with the latest Citrix tools? It seems they fixed an issue on reporting the tool status.
-
Have already upgraded and the health checks now work well The issue I have is our DR hardware is slow to boot some VMs during a health check so the 10 mins is not enough, i.e. large DB servers. Smaller servers boot quickly and the health check works.
-
Oh sorry didn't scroll up enough this morning
We need to check internally if the timeout works as designed, let me ping @Bastien-Nollet or @stephane-m-dev
-
@olivierlambert I'll take a look
-
It appears that the value of
healthCheckTimeout
from the config file is indeed not taken into account.
We've created a card on our side to fix this issue, and we'll soon plan it with the team.If you can't wait for the fix to be released, you can modify the default value of
healthCheckTimeout
in the code in files@xen-orchestra/backups/_runners/VmsRemote.mjs
and@xen-orchestra/backups/_runners/VmsXapi.mjs
, then restart XO server, and this should fix it until next update. -
Thank you.
-
Hi @McHenry
If you still have the problem, you can increase the healthCheckTimeout value as Olivier recommended (e.g.
healthCheckTimeout = ‘30m’
), however this value should be in the [backups.defaultSettings] section of the configuration file (or in the [backups.vm.defaultSettings] section) rather than in the [backups] section.We've detailed the documentation a bit to make this more understandable: https://github.com/vatesfr/xen-orchestra/blob/a945888e63dd37227262eddc8a14850295fa303f/packages/xo-server/config.toml#L78-L79