XCP host rebooted: VM's wont start anymore :-(
-
[16:44 xcp ~]# xe vm-start uuid=143f7b5b-b252-346c-5cc0-3027a7dba627 There are no suitable hosts to start this VM on. The following table provides per-host reasons for why the VM could not be started: xcp : Cannot start here [VM requires access to SR: e1fb6d59-93c5-72bf-a018-184dd3ea3643 (Local storage)] There were no servers available to complete the specified operation. [16:44 xcp ~]# xe sr-list uuid ( RO) : e1fb6d59-93c5-72bf-a018-184dd3ea3643 name-label ( RW): Local storage name-description ( RW): host ( RO): xcp type ( RO): ext content-type ( RO): user uuid ( RO) : d58928fe-4a00-3fb0-36c3-8437e3417296 name-label ( RW): Local storage name-description ( RW): host ( RO): <not in database> type ( RO): ext content-type ( RO): user
-
[16:47 xcp ~]# xe diagnostic-vm-status uuid=143f7b5b-b252-346c-5cc0-3027a7dba627 uuid ( RO) : 143f7b5b-b252-346c-5cc0-3027a7dba627 name-label ( RW): testserver power-state ( RO): halted possible-hosts ( RO): Checking to see whether disks are attachable uuid ( RO) : dd90f85b-755b-9227-aba7-8583e4b8544f vdi-uuid ( RO): <not in database> empty ( RO): true device ( RO): xvdd userdevice ( RW): 3 mode ( RW): RO type ( RW): CD attachable ( RO): true storage-lock ( RO): false uuid ( RO) : 23a51757-9fec-ab0b-a2d4-2790c8a4a63a vdi-uuid ( RO): 06f7760e-157f-4a18-83fe-ba48db06a5ef empty ( RO): false device ( RO): xvda userdevice ( RW): 0 mode ( RW): RW type ( RW): Disk attachable ( RO): true storage-lock ( RO): false Checking to see whether VM can boot on each host xcp : Cannot start here [VM requires access to SR: e1fb6d59-93c5-72bf-a018-184dd3ea3643 (Local storage)] VM is not agile because: VM requires access to non-shared SR: e1fb6d59-93c5-72bf-a018-184dd3ea3643 (Local storage). SR must both be marked as shared and a properly configured PBD must be plugged-in on every host
-
So you want to start a VM that has a disk on a host which you removed.
-
@olivierlambert
I can see the vhd file on the local disk of this host and it has always been there afaik. The other host wasnt used for this VM. And the VM was running fine without the other host being present for several months. So I assume the disk on this host ? -
Well, you can display the VM disk list with a
xe vm-disk-list uuid=<VM UUID>
.Then you can find info on those disks with, for each, a
xe vdi-param-list uuid=<VDI UUID>
.Then, you'll see on which SR are each disks, and you'll understand why the VM can't boot.
-
I'm trying to understand how it is possible that this vm has been running for a few months without the other host being present ? Where would have the vhd file been stored ?
I have a copy of the vhd file here, can I create a new vm with that ? -
[17:22 xcp ~]# xe vdi-param-list uuid=06f7760e-157f-4a18-83fe-ba48db06a5ef uuid ( RO) : 06f7760e-157f-4a18-83fe-ba48db06a5ef name-label ( RW): mailserver name-description ( RW): Created by XO is-a-snapshot ( RO): false snapshot-of ( RO): <not in database> snapshots ( RO): snapshot-time ( RO): 19700101T00:00:00Z allowed-operations (SRO): generate_config; update; forget; destroy; snapshot; copy; clone current-operations (SRO): sr-uuid ( RO): e1fb6d59-93c5-72bf-a018-184dd3ea3643 sr-name-label ( RO): Local storage
It says the sr-uuid is
e1fb6d59-93c5-72bf-a018-184dd3ea3643
, this my local storage SR of the current host ?? -
Your
mailserver
disk is using SRe1fb6d59-93c5-72bf-a018-184dd3ea3643
. This SR seems to belong tohost ( RO): xcp
, notxcp-ng-01
. -
@olivierlambert said in XCP host rebooted: VM's wont start anymore :
Your
mailserver
disk is using SRe1fb6d59-93c5-72bf-a018-184dd3ea3643
. This SR seems to belong tohost ( RO): xcp
, notxcp-ng-01
.Yes thats right, the host xcp is the current up and the host xcp-ng-01 is the one 'lost'.
I really cant see the problem -
Then check your local SR (if it's correctly connected)
-
@olivierlambert said in XCP host rebooted: VM's wont start anymore :
Then check your local SR (if it's correctly connected)
What is the proper way to do that using cli ?
-
@olivierlambert said in XCP host rebooted: VM's wont start anymore :
Then check your local SR (if it's correctly connected)
xe sr-scan uuid=e1fb6d59-93c5-72bf-a018-184dd3ea3643 The SR has no attached PBDs sr: e1fb6d59-93c5-72bf-a018-184dd3ea3643 (Local storage)
How can I connect or attach a PBD ?
-
That's your problem, indeed
In XO, it's "connect", otherwise it's
xe pbd-connect
-
I managed to find the PBD and it doesnt seemed attached
#xe pbd-list uuid ( RO) : 1a9396ae-e59b-9ea7-1d1a-3c5b139a11cb host-uuid ( RO): f4d5a20d-e7f3-4e62-8804-e2caa6922a43 sr-uuid ( RO): e1fb6d59-93c5-72bf-a018-184dd3ea3643 device-config (MRO): device: /dev/disk/by-id/ata-WDC_WD1003FBYZ-012GB0_WD-WCAW3CYHV0PK currently-attached ( RO): false
-
@olivierlambert said in XCP host rebooted: VM's wont start anymore :
That's your problem, indeed
In XO, it's "connect", otherwise it's
xe pbd-connect
I think it is 'xe pbd-plug' because 'pbd-connect' doesnt seem to exist ?
But using this command results in this:
[17:56 xcp ~]# xe pbd-plug uuid=1a9396ae-e59b-9ea7-1d1a-3c5b139a11cb Error code: SR_BACKEND_FAILURE_40 Error parameters: , The SR scan failed [opterr=uuid=mailserver],
-
Yes indeed. Okay at least the error message is very visible.
Why do you have a disk with an UUID
mailserver
? Have you rename your disk manually?? -
@olivierlambert said in XCP host rebooted: VM's wont start anymore :
Yes indeed. Okay at least the error message is very visible.
Why do you have a disk with an UUID
mailserver
? Have you rename your disk manually??To be honest: I really have no clue why it is called like this
Is there a way to fix this error, probably caased by the disk being shut off the hard way caused by the power failure ? -
No, the problem is a manual rename of the VHD in your SR.
-
@olivierlambert said in XCP host rebooted: VM's wont start anymore :
No, the problem is a manual rename of the VHD in your SR.
Checked /run/sr-mount/e1fb6d59-93c5-72bf-a018-184dd3ea3643 and there was a smal 300kb mailserver.vhd file dated october 15th ?? No clue why it was there.
I have removed it and the xe pbd-plug works now.
I also seem to able to start the vm now -
Yes, you should never rename a file manually in the SR So it blocked rescan, then PBD plub, then the VM.
Also, please remove the unused host from your pool.