Preform VM Operation when xapi is Not Available
-
Greetings to the fine folks at XCP-ng community.
I recently encountered a scenario where potentially an update corrupted pool meta data (not super important, but will attach some details below for context), as an result xapi/xe command is no longer available to preform operations on VMs (VM power, console, migration etc.) while the VM is still running fine.
I am wonder if there is an established method to preform some basic VM operation on a even lower level without using 'xe' command or xapi toolstack. Perhaps directly interacting with QEMU or Xen Hypervisor? Or are there any other lower level recovery options available?Many thanks in advance.
Some details about the pool meta data corruption.
XCP-ng 8.3.0 Platform 3.4.0 Version 24.11.0
After an xcp-ng update, xapi appears to stuck in some type of crash loop (xe command and XO will connect to the pool for a few second then unresponsive). XO will have connection reset error and xe command will hang or get a
'Connection refused (calling connect )'
error , but if i spam the same command, with the right timing, simple command like listing vms will work.
Can't make out anything meaningful from/var/log/xensources.log
other than
xapi: message repeated 2 times: [ [error||0 |Registering SMAPIv1 plugins D:a8f0c896a50f|smint] SM.feature: unknown feature ATOMIC_PAUSE]
which doesn't help much, also had the pool meta data backup, but was not able to restore it due to version mismatch, will be playing around with it later perhaps start another thread if there is something interesting.
-
How many hosts in the pool? What method did you use to apply the updates?
-
@Danp The xapi issue it self happened to two separate pools, one pool have two hosts, the other one have 3, non of the VMs had HA enabled. The pool patch was rolled via XO compiled from source(not XOA). One of the pool was already rebuild, I left the other one as is to investigate a bit, both are just for testing and validation. This happened in later April. It did appears that xcp-ng/xapi was trying to preform some type of database migration after the update but failing (I don't have the log readily available, but will look for it, it was something like XEN_INCOMPATIBLE error causing the toolstack to keep restarting)
Would you like me to start another thread on the DB corruption? I am not super crazy about recovering it and it might not be an XCP-ng issue, although the two pool have same behavior, they are running on different configuration, one has XOSTOR, but rather looking to see if it's possible or valuable to be able to preform lower level operation w/o xapi.
Many thanks for your time. -
To me, the issue is likely non-rebooted hosts after updates or something like this. If you do low level operation by passing the XAPI, you can create other problems (XAPI DB won't be aware of the change and will be desync with what's happening in Xen). If you really need to kill a VM,
xl list
thenxl destroy <VM ID>
can do it. But that's pretty much all you should do while managing XCP-ng. -
@wolfmon said in Preform VM Operation when xapi is Not Available:
XCP-ng 8.3.0, one has XOSTOR
This is likely why you had issues at least on the one pool since XOSTOR hasn't been released for XCP-ng 8.3.
-
@Danp Only one pool has XOSTOR, the other one is just using NFS, but same behavior after upgrade.
@olivierlambert Thanks, will play around with xl command, didn't even know they exist, any docs available? -
@wolfmon You have this documentation https://xenbits.xen.org/docs/unstable/man/xl.1.html
But be careful, most operations of xl are not officially supported in XCP-ng.