Lost access to all servers
-
[12:37 uk ~]# xe task-list uuid ( RO) : c8fc2549-9939-8ced-2ab6-cd2b5b1d6a7d name-label ( RO): server_init name-description ( RO): status ( RO): pending progress ( RO): 0.000 uuid ( RO) : 30e9fb68-8326-df55-505d-39a5de71f9cd name-label ( RO): host.call_plugin name-description ( RO): status ( RO): pending progress ( RO): 0.000 uuid ( RO) : 662298de-07e3-c3a2-6559-61e5d79c6d31 name-label ( RO): server_init name-description ( RO): status ( RO): pending progress ( RO): 0.000 uuid ( RO) : 24b360e3-dc0c-b05b-61b2-1352f23d4b44 name-label ( RO): server_init name-description ( RO): status ( RO): pending progress ( RO): 0.000 uuid ( RO) : 24d698b2-dec9-2c20-81a2-9d75a3118705 name-label ( RO): host.call_plugin name-description ( RO): status ( RO): pending progress ( RO): 0.000 uuid ( RO) : 11fa18e5-3d4a-7036-7eb8-d91dc7a399c0 name-label ( RO): host.call_plugin name-description ( RO): status ( RO): pending progress ( RO): 0.000 uuid ( RO) : d8651251-306a-22d6-fd74-231cfb570d12 name-label ( RO): server_init name-description ( RO): status ( RO): pending progress ( RO): 0.000
-
@olivierlambert I got all the system back up and running now with HA disabled. I think that I need HA enable to get my XOSTOR SR to work again but I am not happy not understanding what happened today as xcp-ng gone wrong out of the blue. Cold you please tell me what I need to do to investigate the source of the issue?
-
@fred974 You can check in /var/crash if you have something, /var/log/xensource.log and /var/log/SMlog
And maybe a dmesg -
@AtaxyaNetwork the crashed happened at 11am and here are the relevant extract for the timeframe
/var/crash
is emptymore /var/log/SMlog -f
Apr 27 10:57:41 uk SM: [26334] sr_update {'sr_uuid': 'a20ee08c-40d0-9818-084f-282bbca1f217', 'subtask_of': 'DummyRef:|129646ab-9048-4d66-b873-789ffd07fb00|SR.stat', 'args': [], 'host_ref': 'OpaqueRef:359a920d- 7bb1-4088-8b3e-42254f111f51', 'session_ref': 'OpaqueRef:69e52662-3118-4cf4-8b03-9741dbf3b312', 'device_config': {'group-name': 'linstor_group/thin_device', 'redundancy': '3', 'hosts': 'uk.dc1.xcp-ng-hyper1,uk. dc1.xcp-ng-hyper2,uk.dc1.xcp-ng-hyper3,uk.dc1.xcp-ng-hyper4', 'SRmaster': 'true', 'provisioning': 'thin'}, 'command': 'sr_update', 'sr_ref': 'OpaqueRef:f62acb08-116b-42e4-90df-e7d2153ed610', 'local_cache_sr': '28b8eb58-a6a2-c2fa-ad1e-b339b531330f'} Apr 27 10:57:41 uk SM: [25812] pread SUCCESS Apr 27 10:57:41 uk SM: [25812] lock: released /var/lock/sm/.nil/lvm Apr 27 10:57:41 uk SM: [25812] Updating metadata : {'objtype': 'sr', 'name_description': 'iSCSI Storage on TrueNAS Core - HDD', 'name_label': 'TrueStoreHDD_iSCSI'} Apr 27 10:57:41 uk SM: [25812] entering updateSR Apr 27 10:57:41 uk SM: [25812] lock: released /var/lock/sm/f7d16827-19e0-c57d-a720-c7fba180d4af/sr Apr 27 10:57:41 uk SMGC: [26291] GC process exiting, no work left Apr 27 10:57:41 uk SM: [26291] lock: released /var/lock/sm/a20ee08c-40d0-9818-084f-282bbca1f217/gc_active Apr 27 10:57:41 uk SMGC: [26291] In cleanup Apr 27 10:57:41 uk SMGC: [26291] SR a20e ('XOSTOR') (23 VDIs in 16 VHD trees): no changes Apr 27 10:57:41 uk SMGC: [26291] *~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~* Apr 27 10:57:41 uk SMGC: [26291] *********************** Apr 27 10:57:41 uk SMGC: [26291] * E X C E P T I O N * Apr 27 10:57:41 uk SMGC: [26291] *********************** Apr 27 10:57:41 uk SMGC: [26291] gc: EXCEPTION <class 'XenAPI.Failure'>, ['UUID_INVALID', 'VDI', 'DELETED_267dfbbd-bc85-4f61-92ad-0fb2703fdd49'] Apr 27 10:57:41 uk SMGC: [26291] File "/opt/xensource/sm/cleanup.py", line 3413, in gc Apr 27 10:57:41 uk SMGC: [26291] _gc(None, srUuid, dryRun) Apr 27 10:57:41 uk SMGC: [26291] File "/opt/xensource/sm/cleanup.py", line 3298, in _gc Apr 27 10:57:41 uk SMGC: [26291] _gcLoop(sr, dryRun) Apr 27 10:57:41 uk SMGC: [26291] File "/opt/xensource/sm/cleanup.py", line 3209, in _gcLoop Apr 27 10:57:41 uk SMGC: [26291] if not sr.hasWork(): Apr 27 10:57:41 uk SMGC: [26291] File "/opt/xensource/sm/cleanup.py", line 1652, in hasWork Apr 27 10:57:41 uk SMGC: [26291] if self.findLeafCoalesceable(): Apr 27 10:57:41 uk SMGC: [26291] File "/opt/xensource/sm/cleanup.py", line 1734, in findLeafCoalesceable Apr 27 10:57:41 uk SMGC: [26291] self.gatherLeafCoalesceable(candidates) Apr 27 10:57:41 uk SMGC: [26291] File "/opt/xensource/sm/cleanup.py", line 1766, in gatherLeafCoalesceable Apr 27 10:57:41 uk SMGC: [26291] if vdi.getConfig(vdi.DB_ONBOOT) == vdi.ONBOOT_RESET: Apr 27 10:57:41 uk SMGC: [26291] File "/opt/xensource/sm/cleanup.py", line 531, in getConfig Apr 27 10:57:41 uk SMGC: [26291] config = self.sr.xapi.getConfigVDI(self, key) Apr 27 10:57:41 uk SMGC: [26291] File "/opt/xensource/sm/cleanup.py", line 385, in getConfigVDI Apr 27 10:57:41 uk SMGC: [26291] cfg = self.session.xenapi.VDI.get_on_boot(vdi.getRef()) Apr 27 10:57:41 uk SMGC: [26291] File "/opt/xensource/sm/cleanup.py", line 527, in getRef Apr 27 10:57:41 uk SMGC: [26291] self._vdiRef = self.sr.xapi.getRefVDI(self) Apr 27 10:57:41 uk SMGC: [26291] File "/opt/xensource/sm/cleanup.py", line 356, in getRefVDI Apr 27 10:57:41 uk SMGC: [26291] return self._getRefVDI(vdi.uuid) Apr 27 10:57:41 uk SMGC: [26291] File "/opt/xensource/sm/cleanup.py", line 353, in _getRefVDI Apr 27 10:57:41 uk SMGC: [26291] return self.session.xenapi.VDI.get_by_uuid(uuid) Apr 27 10:57:41 uk SMGC: [26291] File "/usr/lib/python2.7/site-packages/XenAPI.py", line 264, in __call__ Apr 27 10:57:41 uk SMGC: [26291] return self.__send(self.__name, args) Apr 27 10:57:41 uk SMGC: [26291] File "/usr/lib/python2.7/site-packages/XenAPI.py", line 160, in xenapi_request Apr 27 10:57:41 uk SMGC: [26291] result = _parse_result(getattr(self, methodname)(*full_params)) Apr 27 10:57:41 uk SMGC: [26291] File "/usr/lib/python2.7/site-packages/XenAPI.py", line 238, in _parse_result Apr 27 10:57:41 uk SMGC: [26291] raise Failure(result['ErrorDescription']) Apr 27 10:57:41 uk SMGC: [26291] Apr 27 10:57:41 uk SMGC: [26291] *~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~* Apr 27 10:57:41 uk SMGC: [26291] * * * * * SR a20ee08c-40d0-9818-084f-282bbca1f217: ERROR Apr 27 10:57:41 uk SMGC: [26291] Apr 27 10:57:41 uk SM: [26334] Failed to join node(s): set([u'uk.dc1.xcp-ng-hyper3']) Apr 27 10:57:41 uk SM: [26334] Synchronize metadata... Apr 27 10:57:41 uk SM: [26334] LinstorSR.update for a20ee08c-40d0-9818-084f-282bbca1f217 Apr 27 11:02:20 uk SM: [2783] lock: opening lock file /var/lock/sm/a20ee08c-40d0-9818-084f-282bbca1f217/sr Apr 27 11:02:20 uk SM: [2783] lock: acquired /var/lock/sm/a20ee08c-40d0-9818-084f-282bbca1f217/sr Apr 27 11:02:22 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:02:24 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:02:25 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:02:26 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:02:28 uk SM: message repeated 2 times: [ [2783] Raising exception [150, Failed to initialize XMLRPC connection]] Apr 27 11:02:29 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:02:30 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:02:32 uk SM: message repeated 2 times: [ [2783] Raising exception [150, Failed to initialize XMLRPC connection]] Apr 27 11:02:32 uk SM: [2783] Connecting from config to LINSTOR controller using: 172.16.10.48 Apr 27 11:02:36 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:02:37 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:02:39 uk SM: message repeated 2 times: [ [2783] Raising exception [150, Failed to initialize XMLRPC connection]] Apr 27 11:02:40 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:02:41 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:02:43 uk SM: message repeated 2 times: [ [2783] Raising exception [150, Failed to initialize XMLRPC connection]] Apr 27 11:02:44 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:02:45 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:02:45 uk SM: [2783] Got exception: Unable to find controller uri.... Retry number: 0 Apr 27 11:02:45 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:02:46 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:02:47 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:02:48 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:02:49 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:02:51 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:02:52 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:02:54 uk SM: message repeated 2 times: [ [2783] Raising exception [150, Failed to initialize XMLRPC connection]] Apr 27 11:02:55 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:02:55 uk SM: [2783] Connecting from config to LINSTOR controller using: 172.16.10.49 Apr 27 11:02:55 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:02:56 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:02:59 uk SM: message repeated 3 times: [ [2783] Raising exception [150, Failed to initialize XMLRPC connection]] Apr 27 11:03:00 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:03:01 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:03:03 uk SM: message repeated 2 times: [ [2783] Raising exception [150, Failed to initialize XMLRPC connection]] Apr 27 11:03:04 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:03:04 uk SM: [2783] Got exception: Unable to find controller uri.... Retry number: 0 Apr 27 11:03:04 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:03:06 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:03:09 uk SM: message repeated 3 times: [ [2783] Raising exception [150, Failed to initialize XMLRPC connection]] Apr 27 11:03:10 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:03:11 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:03:14 uk SM: message repeated 3 times: [ [2783] Raising exception [150, Failed to initialize XMLRPC connection]] Apr 27 11:03:14 uk SM: [2783] Connecting from config to LINSTOR controller using: 172.16.10.47 Apr 27 11:03:15 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:03:16 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:03:18 uk SM: message repeated 2 times: [ [2783] Raising exception [150, Failed to initialize XMLRPC connection]] Apr 27 11:03:19 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:03:21 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:03:22 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:03:24 uk SM: message repeated 2 times: [ [2783] Raising exception [150, Failed to initialize XMLRPC connection]] Apr 27 11:03:25 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:03:25 uk SM: [2783] Got exception: Unable to find controller uri.... Retry number: 0 Apr 27 11:03:25 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:03:26 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:03:29 uk SM: message repeated 3 times: [ [2783] Raising exception [150, Failed to initialize XMLRPC connection]] Apr 27 11:03:30 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:03:31 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:03:33 uk SM: message repeated 2 times: [ [2783] Raising exception [150, Failed to initialize XMLRPC connection]] Apr 27 11:03:34 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:03:34 uk SM: [2783] Connecting from config to LINSTOR controller using: 172.16.10.46 Apr 27 11:03:34 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:03:36 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:03:39 uk SM: message repeated 3 times: [ [2783] Raising exception [150, Failed to initialize XMLRPC connection]] Apr 27 11:03:40 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:03:41 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:03:42 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:03:43 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:03:44 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:03:44 uk SM: [2783] Got exception: Unable to find controller uri.... Retry number: 0 Apr 27 11:03:44 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:03:45 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:03:46 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:03:48 uk SM: message repeated 2 times: [ [2783] Raising exception [150, Failed to initialize XMLRPC connection]] Apr 27 11:03:49 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:03:50 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:03:54 uk SM: message repeated 3 times: [ [2783] Raising exception [150, Failed to initialize XMLRPC connection]] Apr 27 11:03:54 uk SM: [2783] Raising exception [47, The SR is not available [opterr=No valid controller URI to attach/detach from config]] Apr 27 11:03:54 uk SM: [2783] lock: released /var/lock/sm/a20ee08c-40d0-9818-084f-282bbca1f217/sr Apr 27 11:03:54 uk SM: [2783] ***** generic exception: vdi_attach_from_config: EXCEPTION <class 'SR.SROSError'>, The SR is not available [opterr=No valid controller URI to attach/detach from config] Apr 27 11:03:54 uk SM: [2783] File "/opt/xensource/sm/SRCommand.py", line 110, in run Apr 27 11:03:54 uk SM: [2783] return self._run_locked(sr) Apr 27 11:03:54 uk SM: [2783] File "/opt/xensource/sm/SRCommand.py", line 153, in _run_locked Apr 27 11:03:54 uk SM: [2783] target = sr.vdi(self.vdi_uuid) Apr 27 11:03:54 uk SM: [2783] File "/opt/xensource/sm/LinstorSR", line 634, in wrap Apr 27 11:03:54 uk SM: [2783] return load(self, *args, **kwargs) Apr 27 11:03:54 uk SM: [2783] File "/opt/xensource/sm/LinstorSR", line 504, in load Apr 27 11:03:54 uk SM: [2783] opterr='No valid controller URI to attach/detach from config' Apr 27 11:03:54 uk SM: [2783] Apr 27 11:03:54 uk SM: [2783] ***** LINSTOR resources on XCP-ng: EXCEPTION <class 'SR.SROSError'>, The SR is not available [opterr=No valid controller URI to attach/detach from config] Apr 27 11:03:54 uk SM: [2783] File "/opt/xensource/sm/SRCommand.py", line 378, in run Apr 27 11:03:54 uk SM: [2783] ret = cmd.run(sr) Apr 27 11:03:54 uk SM: [2783] File "/opt/xensource/sm/SRCommand.py", line 110, in run Apr 27 11:03:54 uk SM: [2783] return self._run_locked(sr) Apr 27 11:03:54 uk SM: [2783] File "/opt/xensource/sm/SRCommand.py", line 153, in _run_locked Apr 27 11:03:54 uk SM: [2783] target = sr.vdi(self.vdi_uuid) Apr 27 11:03:54 uk SM: [2783] File "/opt/xensource/sm/LinstorSR", line 634, in wrap Apr 27 11:03:54 uk SM: [2783] return load(self, *args, **kwargs) Apr 27 11:03:54 uk SM: [2783] File "/opt/xensource/sm/LinstorSR", line 504, in load Apr 27 11:03:54 uk SM: [2783] opterr='No valid controller URI to attach/detach from config' Apr 27 11:03:54 uk SM: [2783] Apr 27 11:03:59 uk SM: [4037] Warning: vdi_[de]activate present for dummy Apr 27 11:04:00 uk SM: [4164] lock: opening lock file /var/lock/sm/a20ee08c-40d0-9818-084f-282bbca1f217/sr Apr 27 11:04:00 uk SM: [4164] lock: acquired /var/lock/sm/a20ee08c-40d0-9818-084f-282bbca1f217/sr Apr 27 11:04:00 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:04:01 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:04:02 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:04:03 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:04:04 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:04:05 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:04:06 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:04:07 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:04:09 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:04:10 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:04:10 uk SM: [4164] Connecting from config to LINSTOR controller using: 172.16.10.48 Apr 27 11:04:10 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:04:11 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:04:12 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:04:13 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:04:14 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:04:15 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:04:16 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:04:17 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:04:18 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:04:19 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:04:19 uk SM: [4164] Got exception: Unable to find controller uri.... Retry number: 0 Apr 27 11:04:19 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:04:20 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:04:21 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:04:22 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:04:24 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:04:25 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:04:26 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:04:27 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:04:28 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:04:29 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:04:29 uk SM: [4164] Connecting from config to LINSTOR controller using: 172.16.10.49 Apr 27 11:04:29 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:04:30 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:04:31 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:04:32 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:04:33 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:04:34 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:04:35 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:04:36 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:04:37 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:04:39 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:04:39 uk SM: [4164] Got exception: Unable to find controller uri.... Retry number: 0 Apr 27 11:04:39 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:04:40 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:04:41 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection] Apr 27 11:04:42 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection]
-
/var/log/xensource.log
I am not sure what to look for so I hope this is righ
-
Hope someone can help me understand what the issue is
-
Ronan is in vacation now, but he'll take a look when he's back (tomorrow maybe, Monday I'm pretty sure)
-
@olivierlambert Thank you very much for letting me know
-
-
@fred974 Hi, well first, how many hosts do you have?
We recommend to use at least 3 hosts, (4 is more robust). And also what's your replication count on your LINSTOR SR?
I ask these questions because it's possible that a problem on a host has caused reboots on the whole pool and finally the emergency state.Now: can you share the kern.log files of each host? And execute this command (on each machine) please:
drbdsetup status xcp-persistent-database
-
@ronan-a said in Lost access to all servers:
well first, how many hosts do you have?
We have 4x hosts.
Host1 was the original master (host2 is new master) and I think the DRBD replication count is 3 (how can I double check?)
Host1:[21:15 uk ~]# drbdsetup status xcp-persistent-database xcp-persistent-database role:Secondary disk:Diskless quorum:no uk.dc1.xcp-ng-hyper2 connection:Connecting uk.dc1.xcp-ng-hyper3 connection:Connecting uk.dc1.xcp-ng-hyper4 connection:Connecting
Host2, 3 and 4 has
[21:18 uk ~]# drbdsetup status xcp-persistent-database # No currently configured DRBD found. xcp-persistent-database: No such resource
kern.log files host1
host1_kern.log.txtkern.log files host2
host2_kern.log.txtkern.log files host3
host3_kern.log.txtkern.log files host4
host4_kern.log.txtOur monitor reported the first VM been down at 11am which is reflected in the log file. We also have ourly snapshot so I was wondering if this could also been the reason why. I hope the file above can help us understand the issue. Also, should I put host1 back as master?
Thank you
-
@fred974 I'll take a look at the logs. Thanks. What's the ouput of
lvs
? If the database is not active, execute:vgchange -ay linstor_group
. -
@ronan-a said in Lost access to all servers:
Thanks. What's the ouput of lvs
host1
[11:25 uk ~]# lvs Device read short 82432 bytes remaining Device read short 40960 bytes remaining Device read short 40960 bytes remaining Device read short 40960 bytes remaining Device read short 40960 bytes remaining Device read short 98304 bytes remaining LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert MGT VG_XenStorage-f7d16827-19e0-c57d-a720-c7fba180d4af -wi------- 4.00m 28b8eb58-a6a2-c2fa-ad1e-b339b531330f XSLocalEXT-28b8eb58-a6a2-c2fa-ad1e-b339b531330f -wi-ao---- <517.40g thin_device linstor_group twi-aotz-- 2.18t 3.28 12.11 xcp-persistent-ha-statefile_00000 linstor_group Vwi-a-tz-- 8.00m thin_device 50.00 xcp-persistent-redo-log_00000 linstor_group Vwi-a-tz-- 260.00m thin_device 2.31 xcp-volume-126b1370-6042-40ce-8184-22a771fbf1e4_00000 linstor_group Vwi-a-tz-- <40.10g thin_device 0.11 xcp-volume-24413a81-84b6-4242-a245-6076d5670bb4_00000 linstor_group Vwi-a-tz-- 20.05g thin_device 42.69 xcp-volume-3c45b809-33b7-40a3-a602-01e1511327e7_00000 linstor_group Vwi-a-tz-- 10.03g thin_device 45.23 xcp-volume-46caa8c3-2585-4296-a756-2d96cf2141df_00000 linstor_group Vwi-a-tz-- <40.10g thin_device 8.10 xcp-volume-6b125e04-7134-40e3-85ce-19417c186ac5_00000 linstor_group Vwi-a-tz-- <50.12g thin_device 54.75 xcp-volume-78744846-0432-44e6-a135-021f6b5dc072_00000 linstor_group Vwi-a-tz-- 20.00m thin_device 90.00 xcp-volume-7a83c50f-6bd5-4d7e-89a5-c3dee95bdd0b_00000 linstor_group Vwi-a-tz-- <40.10g thin_device 1.13 xcp-volume-91a4068e-73cc-402f-87eb-2f631e66d6e2_00000 linstor_group Vwi-a-tz-- 20.05g thin_device 64.23 xcp-volume-ac921e7e-71ab-4ee9-8d61-5a55fe5fc369_00000 linstor_group Vwi-a-tz-- <40.10g thin_device 0.11 xcp-volume-c1a113f6-9d1e-45f6-9b7d-656327523ce3_00000 linstor_group Vwi-a-tz-- 20.00m thin_device 90.00 xcp-volume-c31378ba-1ec6-4756-ab54-67c49b2ecd51_00000 linstor_group Vwi-a-tz-- <40.10g thin_device 17.40 xcp-volume-d017c7e9-c2bc-422e-a94c-580d7001f5d0_00000 linstor_group Vwi-a-tz-- 20.00m thin_device 90.00 xcp-volume-d0b249b1-fb43-4013-bd6e-67c5fbdcd9b5_00000 linstor_group Vwi-a-tz-- 20.00m thin_device 90.00 xcp-volume-d8d37107-bb28-4884-9fb2-e771b4df1c70_00000 linstor_group Vwi-a-tz-- 10.03g thin_device 45.33 xcp-volume-fb436790-958a-46a3-b38f-aaca7d6738c8_00000 linstor_group Vwi-a-tz-- <50.12g thin_device 0.11 xcp-volume-fec259ee-bee0-4118-b5ba-09035aad8ca2_00000 linstor_group Vwi-a-tz-- 10.03g thin_device 44.47
host2
[11:28 uk ~]# lvs Device read short 82432 bytes remaining Device read short 40960 bytes remaining Device read short 40960 bytes remaining Device read short 40960 bytes remaining Device read short 40960 bytes remaining Device read short 98304 bytes remaining LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert MGT VG_XenStorage-f7d16827-19e0-c57d-a720-c7fba180d4af -wi-a----- 4.00m ae8a3b6f-b412-0294-43f6-6c11250c6927 XSLocalEXT-ae8a3b6f-b412-0294-43f6-6c11250c6927 -wi-ao---- <517.40g thin_device linstor_group twi-aotz-- 2.18t 3.49 12.21 xcp-persistent-database_00000 linstor_group Vwi-a-tz-- 1.00g thin_device 6.03 xcp-persistent-ha-statefile_00000 linstor_group Vwi-a-tz-- 8.00m thin_device 50.00 xcp-persistent-redo-log_00000 linstor_group Vwi-a-tz-- 260.00m thin_device 2.31 xcp-volume-24413a81-84b6-4242-a245-6076d5670bb4_00000 linstor_group Vwi-a-tz-- 20.05g thin_device 42.69 xcp-volume-2ac918c0-1feb-4ad9-97d6-dcc561832b5d_00000 linstor_group Vwi-a-tz-- 10.03g thin_device 73.19 xcp-volume-3c45b809-33b7-40a3-a602-01e1511327e7_00000 linstor_group Vwi-a-tz-- 10.03g thin_device 45.23 xcp-volume-3e025e5e-c339-4e0c-b8ca-eb4e509ce24d_00000 linstor_group Vwi-a-tz-- <4.02g thin_device 51.99 xcp-volume-46caa8c3-2585-4296-a756-2d96cf2141df_00000 linstor_group Vwi-a-tz-- <40.10g thin_device 8.10 xcp-volume-6b125e04-7134-40e3-85ce-19417c186ac5_00000 linstor_group Vwi-a-tz-- <50.12g thin_device 54.75 xcp-volume-78744846-0432-44e6-a135-021f6b5dc072_00000 linstor_group Vwi-a-tz-- 20.00m thin_device 90.00 xcp-volume-7a83c50f-6bd5-4d7e-89a5-c3dee95bdd0b_00000 linstor_group Vwi-a-tz-- <40.10g thin_device 1.13 xcp-volume-9e32d56e-7c7d-443b-955a-57015b968375_00000 linstor_group Vwi-a-tz-- 6.02g thin_device 96.63 xcp-volume-ac921e7e-71ab-4ee9-8d61-5a55fe5fc369_00000 linstor_group Vwi-a-tz-- <40.10g thin_device 0.11 xcp-volume-c176df5f-5ef6-46b3-841e-93ab0b5af30e_00000 linstor_group Vwi-a-tz-- <40.10g thin_device 11.33 xcp-volume-c31378ba-1ec6-4756-ab54-67c49b2ecd51_00000 linstor_group Vwi-a-tz-- <40.10g thin_device 17.40 xcp-volume-d017c7e9-c2bc-422e-a94c-580d7001f5d0_00000 linstor_group Vwi-a-tz-- 20.00m thin_device 90.00 xcp-volume-f6901916-1ce0-4757-88b6-642b96c4ab80_00000 linstor_group Vwi-a-tz-- <40.10g thin_device 6.03 xcp-volume-fb436790-958a-46a3-b38f-aaca7d6738c8_00000 linstor_group Vwi-a-tz-- <50.12g thin_device 0.11 xcp-volume-fec259ee-bee0-4118-b5ba-09035aad8ca2_00000 linstor_group Vwi-a-tz-- 10.03g thin_device 44.47
host3
[11:25 uk ~]# lvs Device read short 82432 bytes remaining Device read short 40960 bytes remaining Device read short 40960 bytes remaining Device read short 40960 bytes remaining Device read short 40960 bytes remaining Device read short 98304 bytes remaining LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert MGT VG_XenStorage-f7d16827-19e0-c57d-a720-c7fba180d4af -wi------- 4.00m 5792308f-7a3c-e62d-07c5-21ac24d3a56a XSLocalEXT-5792308f-7a3c-e62d-07c5-21ac24d3a56a -wi-ao---- <517.40g thin_device linstor_group twi-aotz-- 2.18t 3.54 12.23 xcp-persistent-database_00000 linstor_group Vwi-a-tz-- 1.00g thin_device 6.03 xcp-persistent-ha-statefile_00000 linstor_group Vwi-a-tz-- 8.00m thin_device 50.00 xcp-persistent-redo-log_00000 linstor_group Vwi-a-tz-- 260.00m thin_device 2.31 xcp-volume-126b1370-6042-40ce-8184-22a771fbf1e4_00000 linstor_group Vwi-a-tz-- <40.10g thin_device 0.11 xcp-volume-24413a81-84b6-4242-a245-6076d5670bb4_00000 linstor_group Vwi-a-tz-- 20.05g thin_device 42.69 xcp-volume-2ac918c0-1feb-4ad9-97d6-dcc561832b5d_00000 linstor_group Vwi-a-tz-- 10.03g thin_device 73.19 xcp-volume-3e025e5e-c339-4e0c-b8ca-eb4e509ce24d_00000 linstor_group Vwi-a-tz-- <4.02g thin_device 51.99 xcp-volume-46caa8c3-2585-4296-a756-2d96cf2141df_00000 linstor_group Vwi-a-tz-- <40.10g thin_device 8.10 xcp-volume-6b125e04-7134-40e3-85ce-19417c186ac5_00000 linstor_group Vwi-a-tz-- <50.12g thin_device 54.75 xcp-volume-91a4068e-73cc-402f-87eb-2f631e66d6e2_00000 linstor_group Vwi-a-tz-- 20.05g thin_device 64.23 xcp-volume-9e32d56e-7c7d-443b-955a-57015b968375_00000 linstor_group Vwi-a-tz-- 6.02g thin_device 96.63 xcp-volume-c176df5f-5ef6-46b3-841e-93ab0b5af30e_00000 linstor_group Vwi-a-tz-- <40.10g thin_device 11.33 xcp-volume-c1a113f6-9d1e-45f6-9b7d-656327523ce3_00000 linstor_group Vwi-a-tz-- 20.00m thin_device 90.00 xcp-volume-d0b249b1-fb43-4013-bd6e-67c5fbdcd9b5_00000 linstor_group Vwi-a-tz-- 20.00m thin_device 90.00 xcp-volume-d8d37107-bb28-4884-9fb2-e771b4df1c70_00000 linstor_group Vwi-a-tz-- 10.03g thin_device 45.33 xcp-volume-f6901916-1ce0-4757-88b6-642b96c4ab80_00000 linstor_group Vwi-a-tz-- <40.10g thin_device 6.03 xcp-volume-fb436790-958a-46a3-b38f-aaca7d6738c8_00000 linstor_group Vwi-a-tz-- <50.12g thin_device 0.11
host4
[11:25 uk ~]# lvs Device read short 82432 bytes remaining Device read short 40960 bytes remaining Device read short 40960 bytes remaining Device read short 40960 bytes remaining Device read short 40960 bytes remaining Device read short 98304 bytes remaining LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert MGT VG_XenStorage-f7d16827-19e0-c57d-a720-c7fba180d4af -wi------- 4.00m 3d07204c-eec9-caf1-f86a-fab419537889 XSLocalEXT-3d07204c-eec9-caf1-f86a-fab419537889 -wi-ao---- <517.40g thin_device linstor_group twi-aotz-- 2.18t 2.52 11.73 xcp-persistent-database_00000 linstor_group Vwi-a-tz-- 1.00g thin_device 6.03 xcp-volume-126b1370-6042-40ce-8184-22a771fbf1e4_00000 linstor_group Vwi-a-tz-- <40.10g thin_device 0.11 xcp-volume-2ac918c0-1feb-4ad9-97d6-dcc561832b5d_00000 linstor_group Vwi-a-tz-- 10.03g thin_device 73.19 xcp-volume-3c45b809-33b7-40a3-a602-01e1511327e7_00000 linstor_group Vwi-a-tz-- 10.03g thin_device 45.23 xcp-volume-3e025e5e-c339-4e0c-b8ca-eb4e509ce24d_00000 linstor_group Vwi-a-tz-- <4.02g thin_device 51.99 xcp-volume-78744846-0432-44e6-a135-021f6b5dc072_00000 linstor_group Vwi-a-tz-- 20.00m thin_device 90.00 xcp-volume-7a83c50f-6bd5-4d7e-89a5-c3dee95bdd0b_00000 linstor_group Vwi-a-tz-- <40.10g thin_device 1.13 xcp-volume-91a4068e-73cc-402f-87eb-2f631e66d6e2_00000 linstor_group Vwi-a-tz-- 20.05g thin_device 64.23 xcp-volume-9e32d56e-7c7d-443b-955a-57015b968375_00000 linstor_group Vwi-a-tz-- 6.02g thin_device 96.63 xcp-volume-ac921e7e-71ab-4ee9-8d61-5a55fe5fc369_00000 linstor_group Vwi-a-tz-- <40.10g thin_device 0.11 xcp-volume-c176df5f-5ef6-46b3-841e-93ab0b5af30e_00000 linstor_group Vwi-a-tz-- <40.10g thin_device 11.33 xcp-volume-c1a113f6-9d1e-45f6-9b7d-656327523ce3_00000 linstor_group Vwi-a-tz-- 20.00m thin_device 90.00 xcp-volume-c31378ba-1ec6-4756-ab54-67c49b2ecd51_00000 linstor_group Vwi-a-tz-- <40.10g thin_device 17.40 xcp-volume-d017c7e9-c2bc-422e-a94c-580d7001f5d0_00000 linstor_group Vwi-a-tz-- 20.00m thin_device 90.00 xcp-volume-d0b249b1-fb43-4013-bd6e-67c5fbdcd9b5_00000 linstor_group Vwi-a-tz-- 20.00m thin_device 90.00 xcp-volume-d8d37107-bb28-4884-9fb2-e771b4df1c70_00000 linstor_group Vwi-a-tz-- 10.03g thin_device 45.33 xcp-volume-f6901916-1ce0-4757-88b6-642b96c4ab80_00000 linstor_group Vwi-a-tz-- <40.10g thin_device 6.03 xcp-volume-fec259ee-bee0-4118-b5ba-09035aad8ca2_00000 linstor_group Vwi-a-tz-- 10.03g thin_device 44.47
@ronan-a said in Lost access to all servers:
If the database is not active, execute: vgchange -ay linstor_group.
How do I know if the database is active or not?
-
@ronan-a did you get a chance to review the log? Did you see anything that can help me move forward?
Thank you -
@fred974 I was a little bit busy, I can take a look at your problems tomorrow.
In the worst case, do you have a way to open a ssh connection to your servers? -
@ronan-a thank you very much. Do you want me to open a tunnel via Xen Orchestra?
-
@fred974 If you can yes. Send me the code using the chat.
-
@ronan-a Thank you very much for helping fixing my pool
-
@fred974
It would be great, if you could write down some lines about the issue and how it could get fixed -
@KPS The DRBD volume of the LINSTOR database was not created by the driver. We just restarted few services + the hosts to fix that. Unfortunately, we have no explanation for what could have happened. So unfortunately I don't have much more interesting information to give. However if a person finds himself again in this situation, I can assist him in order to see if we can obtain more interesting logs.