Lost access to all servers
-
Hi all,
I have a cluster of 4x host and suddenly, I lost access to all of the VM on the master.
When I looked at it, the master host is saying that it hasn't got any NIC card
But I am able to ssh to it.I have XOSTOR install but not used for production.
Could anyone please advice on how i can get the system back online?
On the other host I have the following
all the host can ping google.com
Do I need to remove the host from the pool? how will that work with xostor?
What log do I need to investigateI also get this on the screen
-
Log from master
[11:57 uk ~]# tail /var/log/SMlog Apr 27 11:56:41 uk SM: [9991] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:56:42 uk SM: [9991] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:56:43 uk SM: [9991] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:56:44 uk SM: [9991] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:56:45 uk SM: [9991] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:56:47 uk SM: message repeated 2 times: [ [9991] Raising exception [150, Failed to initialize XMLRPC connection]] Apr 27 11:56:48 uk SM: [9991] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:56:49 uk SM: [9991] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:56:50 uk SM: [9991] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:56:50 uk SM: [9991] Connecting from config to LINSTOR controller using: 172.16.10.47
[11:58 uk ~]# tail /var/log/VMSSlog Apr 27 09:45:01 uk VMSS: [12832] ===Kicking cron job for VMSS=== Apr 27 09:45:01 uk VMSS: [12832] VMSS policy not enabled for this pool, Exiting cron job. Apr 27 10:00:01 uk VMSS: [28354] ===Kicking cron job for VMSS=== Apr 27 10:00:01 uk VMSS: [28354] VMSS policy not enabled for this pool, Exiting cron job. Apr 27 10:15:10 uk VMSS: [12571] ===Kicking cron job for VMSS=== Apr 27 10:15:10 uk VMSS: [12571] VMSS policy not enabled for this pool, Exiting cron job. Apr 27 10:30:11 uk VMSS: [28160] ===Kicking cron job for VMSS=== Apr 27 10:30:11 uk VMSS: [28160] VMSS policy not enabled for this pool, Exiting cron job. Apr 27 10:45:01 uk VMSS: [12520] ===Kicking cron job for VMSS=== Apr 27 10:45:01 uk VMSS: [12520] VMSS policy not enabled for this pool, Exiting cron job.
-
-
@fred974 Hi !
You can try a
xe-toolstack-restart
on the master, it will not harm your running VMs -
@AtaxyaNetwork said in Lost access to all servers:
@fred974 Hi !
You can try a xe-toolstack-restart on the master, it will not harm your running VMsI am doing it now but I am not getting the cursor back
Also got this:[12:09 uk ~]# xe host-is-in-emergency-mode true [12:09 uk ~]# xe pool-recover-slaves The server could not join the liveset because the HA daemon could not access the heartbeat disk.
-
Forgot to say the cluster has HA enable
-
-
"Forget to say HA was enabled": that's the main information here
Yes, disable HA first
-
@olivierlambert I disabled HA and set host 2 as new master and the NIC are showing again but I cannot ssh or access any VM. Including XO. In xcp-ng centre, all the host seem to be in maintenance mode.
-
[12:37 uk ~]# xe task-list uuid ( RO) : c8fc2549-9939-8ced-2ab6-cd2b5b1d6a7d name-label ( RO): server_init name-description ( RO): status ( RO): pending progress ( RO): 0.000 uuid ( RO) : 30e9fb68-8326-df55-505d-39a5de71f9cd name-label ( RO): host.call_plugin name-description ( RO): status ( RO): pending progress ( RO): 0.000 uuid ( RO) : 662298de-07e3-c3a2-6559-61e5d79c6d31 name-label ( RO): server_init name-description ( RO): status ( RO): pending progress ( RO): 0.000 uuid ( RO) : 24b360e3-dc0c-b05b-61b2-1352f23d4b44 name-label ( RO): server_init name-description ( RO): status ( RO): pending progress ( RO): 0.000 uuid ( RO) : 24d698b2-dec9-2c20-81a2-9d75a3118705 name-label ( RO): host.call_plugin name-description ( RO): status ( RO): pending progress ( RO): 0.000 uuid ( RO) : 11fa18e5-3d4a-7036-7eb8-d91dc7a399c0 name-label ( RO): host.call_plugin name-description ( RO): status ( RO): pending progress ( RO): 0.000 uuid ( RO) : d8651251-306a-22d6-fd74-231cfb570d12 name-label ( RO): server_init name-description ( RO): status ( RO): pending progress ( RO): 0.000
-
@olivierlambert I got all the system back up and running now with HA disabled. I think that I need HA enable to get my XOSTOR SR to work again but I am not happy not understanding what happened today as xcp-ng gone wrong out of the blue. Cold you please tell me what I need to do to investigate the source of the issue?
-
@fred974 You can check in /var/crash if you have something, /var/log/xensource.log and /var/log/SMlog
And maybe a dmesg -
@AtaxyaNetwork the crashed happened at 11am and here are the relevant extract for the timeframe
/var/crash
is emptymore /var/log/SMlog -f
Apr 27 10:57:41 uk SM: [26334] sr_update {'sr_uuid': 'a20ee08c-40d0-9818-084f-282bbca1f217', 'subtask_of': 'DummyRef:|129646ab-9048-4d66-b873-789ffd07fb00|SR.stat', 'args': [], 'host_ref': 'OpaqueRef:359a920d- 7bb1-4088-8b3e-42254f111f51', 'session_ref': 'OpaqueRef:69e52662-3118-4cf4-8b03-9741dbf3b312', 'device_config': {'group-name': 'linstor_group/thin_device', 'redundancy': '3', 'hosts': 'uk.dc1.xcp-ng-hyper1,uk. dc1.xcp-ng-hyper2,uk.dc1.xcp-ng-hyper3,uk.dc1.xcp-ng-hyper4', 'SRmaster': 'true', 'provisioning': 'thin'}, 'command': 'sr_update', 'sr_ref': 'OpaqueRef:f62acb08-116b-42e4-90df-e7d2153ed610', 'local_cache_sr': '28b8eb58-a6a2-c2fa-ad1e-b339b531330f'} Apr 27 10:57:41 uk SM: [25812] pread SUCCESS Apr 27 10:57:41 uk SM: [25812] lock: released /var/lock/sm/.nil/lvm Apr 27 10:57:41 uk SM: [25812] Updating metadata : {'objtype': 'sr', 'name_description': 'iSCSI Storage on TrueNAS Core - HDD', 'name_label': 'TrueStoreHDD_iSCSI'} Apr 27 10:57:41 uk SM: [25812] entering updateSR Apr 27 10:57:41 uk SM: [25812] lock: released /var/lock/sm/f7d16827-19e0-c57d-a720-c7fba180d4af/sr Apr 27 10:57:41 uk SMGC: [26291] GC process exiting, no work left Apr 27 10:57:41 uk SM: [26291] lock: released /var/lock/sm/a20ee08c-40d0-9818-084f-282bbca1f217/gc_active Apr 27 10:57:41 uk SMGC: [26291] In cleanup Apr 27 10:57:41 uk SMGC: [26291] SR a20e ('XOSTOR') (23 VDIs in 16 VHD trees): no changes Apr 27 10:57:41 uk SMGC: [26291] *~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~* Apr 27 10:57:41 uk SMGC: [26291] *********************** Apr 27 10:57:41 uk SMGC: [26291] * E X C E P T I O N * Apr 27 10:57:41 uk SMGC: [26291] *********************** Apr 27 10:57:41 uk SMGC: [26291] gc: EXCEPTION <class 'XenAPI.Failure'>, ['UUID_INVALID', 'VDI', 'DELETED_267dfbbd-bc85-4f61-92ad-0fb2703fdd49'] Apr 27 10:57:41 uk SMGC: [26291] File "/opt/xensource/sm/cleanup.py", line 3413, in gc Apr 27 10:57:41 uk SMGC: [26291] _gc(None, srUuid, dryRun) Apr 27 10:57:41 uk SMGC: [26291] File "/opt/xensource/sm/cleanup.py", line 3298, in _gc Apr 27 10:57:41 uk SMGC: [26291] _gcLoop(sr, dryRun) Apr 27 10:57:41 uk SMGC: [26291] File "/opt/xensource/sm/cleanup.py", line 3209, in _gcLoop Apr 27 10:57:41 uk SMGC: [26291] if not sr.hasWork(): Apr 27 10:57:41 uk SMGC: [26291] File "/opt/xensource/sm/cleanup.py", line 1652, in hasWork Apr 27 10:57:41 uk SMGC: [26291] if self.findLeafCoalesceable(): Apr 27 10:57:41 uk SMGC: [26291] File "/opt/xensource/sm/cleanup.py", line 1734, in findLeafCoalesceable Apr 27 10:57:41 uk SMGC: [26291] self.gatherLeafCoalesceable(candidates) Apr 27 10:57:41 uk SMGC: [26291] File "/opt/xensource/sm/cleanup.py", line 1766, in gatherLeafCoalesceable Apr 27 10:57:41 uk SMGC: [26291] if vdi.getConfig(vdi.DB_ONBOOT) == vdi.ONBOOT_RESET: Apr 27 10:57:41 uk SMGC: [26291] File "/opt/xensource/sm/cleanup.py", line 531, in getConfig Apr 27 10:57:41 uk SMGC: [26291] config = self.sr.xapi.getConfigVDI(self, key) Apr 27 10:57:41 uk SMGC: [26291] File "/opt/xensource/sm/cleanup.py", line 385, in getConfigVDI Apr 27 10:57:41 uk SMGC: [26291] cfg = self.session.xenapi.VDI.get_on_boot(vdi.getRef()) Apr 27 10:57:41 uk SMGC: [26291] File "/opt/xensource/sm/cleanup.py", line 527, in getRef Apr 27 10:57:41 uk SMGC: [26291] self._vdiRef = self.sr.xapi.getRefVDI(self) Apr 27 10:57:41 uk SMGC: [26291] File "/opt/xensource/sm/cleanup.py", line 356, in getRefVDI Apr 27 10:57:41 uk SMGC: [26291] return self._getRefVDI(vdi.uuid) Apr 27 10:57:41 uk SMGC: [26291] File "/opt/xensource/sm/cleanup.py", line 353, in _getRefVDI Apr 27 10:57:41 uk SMGC: [26291] return self.session.xenapi.VDI.get_by_uuid(uuid) Apr 27 10:57:41 uk SMGC: [26291] File "/usr/lib/python2.7/site-packages/XenAPI.py", line 264, in __call__ Apr 27 10:57:41 uk SMGC: [26291] return self.__send(self.__name, args) Apr 27 10:57:41 uk SMGC: [26291] File "/usr/lib/python2.7/site-packages/XenAPI.py", line 160, in xenapi_request Apr 27 10:57:41 uk SMGC: [26291] result = _parse_result(getattr(self, methodname)(*full_params)) Apr 27 10:57:41 uk SMGC: [26291] File "/usr/lib/python2.7/site-packages/XenAPI.py", line 238, in _parse_result Apr 27 10:57:41 uk SMGC: [26291] raise Failure(result['ErrorDescription']) Apr 27 10:57:41 uk SMGC: [26291] Apr 27 10:57:41 uk SMGC: [26291] *~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~* Apr 27 10:57:41 uk SMGC: [26291] * * * * * SR a20ee08c-40d0-9818-084f-282bbca1f217: ERROR Apr 27 10:57:41 uk SMGC: [26291] Apr 27 10:57:41 uk SM: [26334] Failed to join node(s): set([u'uk.dc1.xcp-ng-hyper3']) Apr 27 10:57:41 uk SM: [26334] Synchronize metadata... Apr 27 10:57:41 uk SM: [26334] LinstorSR.update for a20ee08c-40d0-9818-084f-282bbca1f217 Apr 27 11:02:20 uk SM: [2783] lock: opening lock file /var/lock/sm/a20ee08c-40d0-9818-084f-282bbca1f217/sr Apr 27 11:02:20 uk SM: [2783] lock: acquired /var/lock/sm/a20ee08c-40d0-9818-084f-282bbca1f217/sr Apr 27 11:02:22 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:02:24 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:02:25 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:02:26 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:02:28 uk SM: message repeated 2 times: [ [2783] Raising exception [150, Failed to initialize XMLRPC connection]] Apr 27 11:02:29 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:02:30 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:02:32 uk SM: message repeated 2 times: [ [2783] Raising exception [150, Failed to initialize XMLRPC connection]] Apr 27 11:02:32 uk SM: [2783] Connecting from config to LINSTOR controller using: 172.16.10.48 Apr 27 11:02:36 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:02:37 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:02:39 uk SM: message repeated 2 times: [ [2783] Raising exception [150, Failed to initialize XMLRPC connection]] Apr 27 11:02:40 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:02:41 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:02:43 uk SM: message repeated 2 times: [ [2783] Raising exception [150, Failed to initialize XMLRPC connection]] Apr 27 11:02:44 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:02:45 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:02:45 uk SM: [2783] Got exception: Unable to find controller uri.... Retry number: 0 Apr 27 11:02:45 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:02:46 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:02:47 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:02:48 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:02:49 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:02:51 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:02:52 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:02:54 uk SM: message repeated 2 times: [ [2783] Raising exception [150, Failed to initialize XMLRPC connection]] Apr 27 11:02:55 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:02:55 uk SM: [2783] Connecting from config to LINSTOR controller using: 172.16.10.49 Apr 27 11:02:55 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:02:56 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:02:59 uk SM: message repeated 3 times: [ [2783] Raising exception [150, Failed to initialize XMLRPC connection]] Apr 27 11:03:00 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:03:01 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:03:03 uk SM: message repeated 2 times: [ [2783] Raising exception [150, Failed to initialize XMLRPC connection]] Apr 27 11:03:04 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:03:04 uk SM: [2783] Got exception: Unable to find controller uri.... Retry number: 0 Apr 27 11:03:04 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:03:06 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:03:09 uk SM: message repeated 3 times: [ [2783] Raising exception [150, Failed to initialize XMLRPC connection]] Apr 27 11:03:10 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:03:11 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:03:14 uk SM: message repeated 3 times: [ [2783] Raising exception [150, Failed to initialize XMLRPC connection]] Apr 27 11:03:14 uk SM: [2783] Connecting from config to LINSTOR controller using: 172.16.10.47 Apr 27 11:03:15 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:03:16 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:03:18 uk SM: message repeated 2 times: [ [2783] Raising exception [150, Failed to initialize XMLRPC connection]] Apr 27 11:03:19 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:03:21 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:03:22 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:03:24 uk SM: message repeated 2 times: [ [2783] Raising exception [150, Failed to initialize XMLRPC connection]] Apr 27 11:03:25 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:03:25 uk SM: [2783] Got exception: Unable to find controller uri.... Retry number: 0 Apr 27 11:03:25 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:03:26 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:03:29 uk SM: message repeated 3 times: [ [2783] Raising exception [150, Failed to initialize XMLRPC connection]] Apr 27 11:03:30 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:03:31 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:03:33 uk SM: message repeated 2 times: [ [2783] Raising exception [150, Failed to initialize XMLRPC connection]] Apr 27 11:03:34 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:03:34 uk SM: [2783] Connecting from config to LINSTOR controller using: 172.16.10.46 Apr 27 11:03:34 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:03:36 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:03:39 uk SM: message repeated 3 times: [ [2783] Raising exception [150, Failed to initialize XMLRPC connection]] Apr 27 11:03:40 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:03:41 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:03:42 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:03:43 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:03:44 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:03:44 uk SM: [2783] Got exception: Unable to find controller uri.... Retry number: 0 Apr 27 11:03:44 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:03:45 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:03:46 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:03:48 uk SM: message repeated 2 times: [ [2783] Raising exception [150, Failed to initialize XMLRPC connection]] Apr 27 11:03:49 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:03:50 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:03:54 uk SM: message repeated 3 times: [ [2783] Raising exception [150, Failed to initialize XMLRPC connection]] Apr 27 11:03:54 uk SM: [2783] Raising exception [47, The SR is not available [opterr=No valid controller URI to attach/detach from config]] Apr 27 11:03:54 uk SM: [2783] lock: released /var/lock/sm/a20ee08c-40d0-9818-084f-282bbca1f217/sr Apr 27 11:03:54 uk SM: [2783] ***** generic exception: vdi_attach_from_config: EXCEPTION <class 'SR.SROSError'>, The SR is not available [opterr=No valid controller URI to attach/detach from config] Apr 27 11:03:54 uk SM: [2783] File "/opt/xensource/sm/SRCommand.py", line 110, in run Apr 27 11:03:54 uk SM: [2783] return self._run_locked(sr) Apr 27 11:03:54 uk SM: [2783] File "/opt/xensource/sm/SRCommand.py", line 153, in _run_locked Apr 27 11:03:54 uk SM: [2783] target = sr.vdi(self.vdi_uuid) Apr 27 11:03:54 uk SM: [2783] File "/opt/xensource/sm/LinstorSR", line 634, in wrap Apr 27 11:03:54 uk SM: [2783] return load(self, *args, **kwargs) Apr 27 11:03:54 uk SM: [2783] File "/opt/xensource/sm/LinstorSR", line 504, in load Apr 27 11:03:54 uk SM: [2783] opterr='No valid controller URI to attach/detach from config' Apr 27 11:03:54 uk SM: [2783] Apr 27 11:03:54 uk SM: [2783] ***** LINSTOR resources on XCP-ng: EXCEPTION <class 'SR.SROSError'>, The SR is not available [opterr=No valid controller URI to attach/detach from config] Apr 27 11:03:54 uk SM: [2783] File "/opt/xensource/sm/SRCommand.py", line 378, in run Apr 27 11:03:54 uk SM: [2783] ret = cmd.run(sr) Apr 27 11:03:54 uk SM: [2783] File "/opt/xensource/sm/SRCommand.py", line 110, in run Apr 27 11:03:54 uk SM: [2783] return self._run_locked(sr) Apr 27 11:03:54 uk SM: [2783] File "/opt/xensource/sm/SRCommand.py", line 153, in _run_locked Apr 27 11:03:54 uk SM: [2783] target = sr.vdi(self.vdi_uuid) Apr 27 11:03:54 uk SM: [2783] File "/opt/xensource/sm/LinstorSR", line 634, in wrap Apr 27 11:03:54 uk SM: [2783] return load(self, *args, **kwargs) Apr 27 11:03:54 uk SM: [2783] File "/opt/xensource/sm/LinstorSR", line 504, in load Apr 27 11:03:54 uk SM: [2783] opterr='No valid controller URI to attach/detach from config' Apr 27 11:03:54 uk SM: [2783] Apr 27 11:03:59 uk SM: [4037] Warning: vdi_[de]activate present for dummy Apr 27 11:04:00 uk SM: [4164] lock: opening lock file /var/lock/sm/a20ee08c-40d0-9818-084f-282bbca1f217/sr Apr 27 11:04:00 uk SM: [4164] lock: acquired /var/lock/sm/a20ee08c-40d0-9818-084f-282bbca1f217/sr Apr 27 11:04:00 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:04:01 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:04:02 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:04:03 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:04:04 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:04:05 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:04:06 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:04:07 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:04:09 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:04:10 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:04:10 uk SM: [4164] Connecting from config to LINSTOR controller using: 172.16.10.48 Apr 27 11:04:10 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:04:11 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:04:12 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:04:13 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:04:14 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:04:15 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:04:16 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:04:17 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:04:18 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:04:19 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:04:19 uk SM: [4164] Got exception: Unable to find controller uri.... Retry number: 0 Apr 27 11:04:19 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:04:20 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:04:21 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:04:22 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:04:24 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:04:25 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:04:26 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:04:27 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:04:28 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:04:29 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:04:29 uk SM: [4164] Connecting from config to LINSTOR controller using: 172.16.10.49 Apr 27 11:04:29 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:04:30 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:04:31 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:04:32 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:04:33 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:04:34 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:04:35 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:04:36 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:04:37 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:04:39 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:04:39 uk SM: [4164] Got exception: Unable to find controller uri.... Retry number: 0 Apr 27 11:04:39 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:04:40 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:04:41 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:04:42 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection]
-
/var/log/xensource.log
I am not sure what to look for so I hope this is righ
-
Hope someone can help me understand what the issue is
-
Ronan is in vacation now, but he'll take a look when he's back (tomorrow maybe, Monday I'm pretty sure)
-
@olivierlambert Thank you very much for letting me know
-
-
@fred974 Hi, well first, how many hosts do you have?
We recommend to use at least 3 hosts, (4 is more robust). And also what's your replication count on your LINSTOR SR?
I ask these questions because it's possible that a problem on a host has caused reboots on the whole pool and finally the emergency state.Now: can you share the kern.log files of each host? And execute this command (on each machine) please:
drbdsetup status xcp-persistent-database
-
@ronan-a said in Lost access to all servers:
well first, how many hosts do you have?
We have 4x hosts.
Host1 was the original master (host2 is new master) and I think the DRBD replication count is 3 (how can I double check?)
Host1:[21:15 uk ~]# drbdsetup status xcp-persistent-database xcp-persistent-database role:Secondary disk:Diskless quorum:no uk.dc1.xcp-ng-hyper2 connection:Connecting uk.dc1.xcp-ng-hyper3 connection:Connecting uk.dc1.xcp-ng-hyper4 connection:Connecting
Host2, 3 and 4 has
[21:18 uk ~]# drbdsetup status xcp-persistent-database # No currently configured DRBD found. xcp-persistent-database: No such resource
kern.log files host1
host1_kern.log.txtkern.log files host2
host2_kern.log.txtkern.log files host3
host3_kern.log.txtkern.log files host4
host4_kern.log.txtOur monitor reported the first VM been down at 11am which is reflected in the log file. We also have ourly snapshot so I was wondering if this could also been the reason why. I hope the file above can help us understand the issue. Also, should I put host1 back as master?
Thank you