XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Lost access to all servers

    Scheduled Pinned Locked Moved Compute
    36 Posts 6 Posters 10.2k Views 6 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • F Offline
      fred974 @fred974
      last edited by

      Log from master

      [11:57 uk ~]# tail /var/log/SMlog
      Apr 27 11:56:41 uk SM: [9991] Raising exception [150, Failed to initialize XMLRPC connection]
      Apr 27 11:56:42 uk SM: [9991] Raising exception [150, Failed to initialize XMLRPC connection]
      Apr 27 11:56:43 uk SM: [9991] Raising exception [150, Failed to initialize XMLRPC connection]
      Apr 27 11:56:44 uk SM: [9991] Raising exception [150, Failed to initialize XMLRPC connection]
      Apr 27 11:56:45 uk SM: [9991] Raising exception [150, Failed to initialize XMLRPC connection]
      Apr 27 11:56:47 uk SM: message repeated 2 times: [ [9991] Raising exception [150, Failed to initialize XMLRPC connection]]
      Apr 27 11:56:48 uk SM: [9991] Raising exception [150, Failed to initialize XMLRPC connection]
      Apr 27 11:56:49 uk SM: [9991] Raising exception [150, Failed to initialize XMLRPC connection]
      Apr 27 11:56:50 uk SM: [9991] Raising exception [150, Failed to initialize XMLRPC connection]
      Apr 27 11:56:50 uk SM: [9991] Connecting from config to LINSTOR controller using: 172.16.10.47
      
      [11:58 uk ~]# tail /var/log/VMSSlog
      Apr 27 09:45:01 uk VMSS: [12832] ===Kicking cron job for VMSS===
      Apr 27 09:45:01 uk VMSS: [12832] VMSS policy not enabled for this pool, Exiting cron job.
      Apr 27 10:00:01 uk VMSS: [28354] ===Kicking cron job for VMSS===
      Apr 27 10:00:01 uk VMSS: [28354] VMSS policy not enabled for this pool, Exiting cron job.
      Apr 27 10:15:10 uk VMSS: [12571] ===Kicking cron job for VMSS===
      Apr 27 10:15:10 uk VMSS: [12571] VMSS policy not enabled for this pool, Exiting cron job.
      Apr 27 10:30:11 uk VMSS: [28160] ===Kicking cron job for VMSS===
      Apr 27 10:30:11 uk VMSS: [28160] VMSS policy not enabled for this pool, Exiting cron job.
      Apr 27 10:45:01 uk VMSS: [12520] ===Kicking cron job for VMSS===
      Apr 27 10:45:01 uk VMSS: [12520] VMSS policy not enabled for this pool, Exiting cron job.
      
      F 1 Reply Last reply Reply Quote 0
      • F Offline
        fred974 @fred974
        last edited by fred974

        1b3dc9f6-2029-41e8-9146-2f600e284daa-image.png @fred974

        AtaxyaNetworkA 1 Reply Last reply Reply Quote 0
        • AtaxyaNetworkA Offline
          AtaxyaNetwork Ambassador @fred974
          last edited by

          @fred974 Hi !

          You can try a xe-toolstack-restart on the master, it will not harm your running VMs

          F 1 Reply Last reply Reply Quote 0
          • F Offline
            fred974 @AtaxyaNetwork
            last edited by

            @AtaxyaNetwork said in Lost access to all servers:

            @fred974 Hi !
            You can try a xe-toolstack-restart on the master, it will not harm your running VMs

            I am doing it now but I am not getting the cursor back
            Also got this:

            [12:09 uk ~]# xe host-is-in-emergency-mode
            true
            [12:09 uk ~]# xe pool-recover-slaves
            The server could not join the liveset because the HA daemon could not access the heartbeat disk.
            
            F 1 Reply Last reply Reply Quote 0
            • F Offline
              fred974 @fred974
              last edited by

              Forgot to say the cluster has HA enable

              F 1 Reply Last reply Reply Quote 0
              • F Offline
                fred974 @fred974
                last edited by

                Should I follow this?
                https://support.citrix.com/article/CTX131127/unable-to-connect-to-high-availability-enabled-xensever-pool-and-all-servers-in-pool-are-in-emergency-mode

                1 Reply Last reply Reply Quote 0
                • olivierlambertO Offline
                  olivierlambert Vates 🪐 Co-Founder CEO
                  last edited by

                  "Forget to say HA was enabled": that's the main information here 😆

                  Yes, disable HA first 🙂

                  F 1 Reply Last reply Reply Quote 0
                  • F Offline
                    fred974 @olivierlambert
                    last edited by

                    @olivierlambert I disabled HA and set host 2 as new master and the NIC are showing again but I cannot ssh or access any VM. Including XO. In xcp-ng centre, all the host seem to be in maintenance mode.
                    633c06a7-f79e-4523-8430-bcaaaa982e74-image.png

                    F 1 Reply Last reply Reply Quote 0
                    • F Offline
                      fred974 @fred974
                      last edited by

                      [12:37 uk ~]# xe task-list
                      uuid ( RO)                : c8fc2549-9939-8ced-2ab6-cd2b5b1d6a7d
                                name-label ( RO): server_init
                          name-description ( RO):
                                    status ( RO): pending
                                  progress ( RO): 0.000
                      
                      
                      uuid ( RO)                : 30e9fb68-8326-df55-505d-39a5de71f9cd
                                name-label ( RO): host.call_plugin
                          name-description ( RO):
                                    status ( RO): pending
                                  progress ( RO): 0.000
                      
                      
                      uuid ( RO)                : 662298de-07e3-c3a2-6559-61e5d79c6d31
                                name-label ( RO): server_init
                          name-description ( RO):
                                    status ( RO): pending
                                  progress ( RO): 0.000
                      
                      
                      uuid ( RO)                : 24b360e3-dc0c-b05b-61b2-1352f23d4b44
                                name-label ( RO): server_init
                          name-description ( RO):
                                    status ( RO): pending
                                  progress ( RO): 0.000
                      
                      
                      uuid ( RO)                : 24d698b2-dec9-2c20-81a2-9d75a3118705
                                name-label ( RO): host.call_plugin
                          name-description ( RO):
                                    status ( RO): pending
                                  progress ( RO): 0.000
                      
                      
                      uuid ( RO)                : 11fa18e5-3d4a-7036-7eb8-d91dc7a399c0
                                name-label ( RO): host.call_plugin
                          name-description ( RO):
                                    status ( RO): pending
                                  progress ( RO): 0.000
                      
                      
                      uuid ( RO)                : d8651251-306a-22d6-fd74-231cfb570d12
                                name-label ( RO): server_init
                          name-description ( RO):
                                    status ( RO): pending
                                  progress ( RO): 0.000
                      
                      F 1 Reply Last reply Reply Quote 0
                      • F Offline
                        fred974 @fred974
                        last edited by

                        @olivierlambert I got all the system back up and running now with HA disabled. I think that I need HA enable to get my XOSTOR SR to work again but I am not happy not understanding what happened today as xcp-ng gone wrong out of the blue. Cold you please tell me what I need to do to investigate the source of the issue?

                        AtaxyaNetworkA 1 Reply Last reply Reply Quote 0
                        • AtaxyaNetworkA Offline
                          AtaxyaNetwork Ambassador @fred974
                          last edited by

                          @fred974 You can check in /var/crash if you have something, /var/log/xensource.log and /var/log/SMlog
                          And maybe a dmesg

                          F 1 Reply Last reply Reply Quote 0
                          • F Offline
                            fred974 @AtaxyaNetwork
                            last edited by

                            @AtaxyaNetwork the crashed happened at 11am and here are the relevant extract for the timeframe
                            /var/crash is empty

                            more /var/log/SMlog -f

                            Apr 27 10:57:41 uk SM: [26334] sr_update {'sr_uuid': 'a20ee08c-40d0-9818-084f-282bbca1f217', 'subtask_of': 'DummyRef:|129646ab-9048-4d66-b873-789ffd07fb00|SR.stat', 'args': [], 'host_ref': 'OpaqueRef:359a920d-
                            7bb1-4088-8b3e-42254f111f51', 'session_ref': 'OpaqueRef:69e52662-3118-4cf4-8b03-9741dbf3b312', 'device_config': {'group-name': 'linstor_group/thin_device', 'redundancy': '3', 'hosts': 'uk.dc1.xcp-ng-hyper1,uk.
                            dc1.xcp-ng-hyper2,uk.dc1.xcp-ng-hyper3,uk.dc1.xcp-ng-hyper4', 'SRmaster': 'true', 'provisioning': 'thin'}, 'command': 'sr_update', 'sr_ref': 'OpaqueRef:f62acb08-116b-42e4-90df-e7d2153ed610', 'local_cache_sr':
                            '28b8eb58-a6a2-c2fa-ad1e-b339b531330f'}
                            Apr 27 10:57:41 uk SM: [25812]   pread SUCCESS
                            Apr 27 10:57:41 uk SM: [25812] lock: released /var/lock/sm/.nil/lvm
                            Apr 27 10:57:41 uk SM: [25812] Updating metadata : {'objtype': 'sr', 'name_description': 'iSCSI Storage on TrueNAS Core - HDD', 'name_label': 'TrueStoreHDD_iSCSI'}
                            Apr 27 10:57:41 uk SM: [25812] entering updateSR
                            Apr 27 10:57:41 uk SM: [25812] lock: released /var/lock/sm/f7d16827-19e0-c57d-a720-c7fba180d4af/sr
                            Apr 27 10:57:41 uk SMGC: [26291] GC process exiting, no work left
                            Apr 27 10:57:41 uk SM: [26291] lock: released /var/lock/sm/a20ee08c-40d0-9818-084f-282bbca1f217/gc_active
                            Apr 27 10:57:41 uk SMGC: [26291] In cleanup
                            Apr 27 10:57:41 uk SMGC: [26291] SR a20e ('XOSTOR') (23 VDIs in 16 VHD trees): no changes
                            Apr 27 10:57:41 uk SMGC: [26291] *~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*
                            Apr 27 10:57:41 uk SMGC: [26291]          ***********************
                            Apr 27 10:57:41 uk SMGC: [26291]          *  E X C E P T I O N  *
                            Apr 27 10:57:41 uk SMGC: [26291]          ***********************
                            Apr 27 10:57:41 uk SMGC: [26291] gc: EXCEPTION <class 'XenAPI.Failure'>, ['UUID_INVALID', 'VDI', 'DELETED_267dfbbd-bc85-4f61-92ad-0fb2703fdd49']
                            Apr 27 10:57:41 uk SMGC: [26291]   File "/opt/xensource/sm/cleanup.py", line 3413, in gc
                            Apr 27 10:57:41 uk SMGC: [26291]     _gc(None, srUuid, dryRun)
                            Apr 27 10:57:41 uk SMGC: [26291]   File "/opt/xensource/sm/cleanup.py", line 3298, in _gc
                            Apr 27 10:57:41 uk SMGC: [26291]     _gcLoop(sr, dryRun)
                            Apr 27 10:57:41 uk SMGC: [26291]   File "/opt/xensource/sm/cleanup.py", line 3209, in _gcLoop
                            Apr 27 10:57:41 uk SMGC: [26291]     if not sr.hasWork():
                            Apr 27 10:57:41 uk SMGC: [26291]   File "/opt/xensource/sm/cleanup.py", line 1652, in hasWork
                            Apr 27 10:57:41 uk SMGC: [26291]     if self.findLeafCoalesceable():
                            Apr 27 10:57:41 uk SMGC: [26291]   File "/opt/xensource/sm/cleanup.py", line 1734, in findLeafCoalesceable
                            Apr 27 10:57:41 uk SMGC: [26291]     self.gatherLeafCoalesceable(candidates)
                            Apr 27 10:57:41 uk SMGC: [26291]   File "/opt/xensource/sm/cleanup.py", line 1766, in gatherLeafCoalesceable
                            Apr 27 10:57:41 uk SMGC: [26291]     if vdi.getConfig(vdi.DB_ONBOOT) == vdi.ONBOOT_RESET:
                            Apr 27 10:57:41 uk SMGC: [26291]   File "/opt/xensource/sm/cleanup.py", line 531, in getConfig
                            Apr 27 10:57:41 uk SMGC: [26291]     config = self.sr.xapi.getConfigVDI(self, key)
                            Apr 27 10:57:41 uk SMGC: [26291]   File "/opt/xensource/sm/cleanup.py", line 385, in getConfigVDI
                            Apr 27 10:57:41 uk SMGC: [26291]     cfg = self.session.xenapi.VDI.get_on_boot(vdi.getRef())
                            Apr 27 10:57:41 uk SMGC: [26291]   File "/opt/xensource/sm/cleanup.py", line 527, in getRef
                            Apr 27 10:57:41 uk SMGC: [26291]     self._vdiRef = self.sr.xapi.getRefVDI(self)
                            Apr 27 10:57:41 uk SMGC: [26291]   File "/opt/xensource/sm/cleanup.py", line 356, in getRefVDI
                            Apr 27 10:57:41 uk SMGC: [26291]     return self._getRefVDI(vdi.uuid)
                            Apr 27 10:57:41 uk SMGC: [26291]   File "/opt/xensource/sm/cleanup.py", line 353, in _getRefVDI
                            Apr 27 10:57:41 uk SMGC: [26291]     return self.session.xenapi.VDI.get_by_uuid(uuid)
                            Apr 27 10:57:41 uk SMGC: [26291]   File "/usr/lib/python2.7/site-packages/XenAPI.py", line 264, in __call__
                            Apr 27 10:57:41 uk SMGC: [26291]     return self.__send(self.__name, args)
                            Apr 27 10:57:41 uk SMGC: [26291]   File "/usr/lib/python2.7/site-packages/XenAPI.py", line 160, in xenapi_request
                            Apr 27 10:57:41 uk SMGC: [26291]     result = _parse_result(getattr(self, methodname)(*full_params))
                            Apr 27 10:57:41 uk SMGC: [26291]   File "/usr/lib/python2.7/site-packages/XenAPI.py", line 238, in _parse_result
                            Apr 27 10:57:41 uk SMGC: [26291]     raise Failure(result['ErrorDescription'])
                            Apr 27 10:57:41 uk SMGC: [26291]
                            Apr 27 10:57:41 uk SMGC: [26291] *~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*
                            Apr 27 10:57:41 uk SMGC: [26291] * * * * * SR a20ee08c-40d0-9818-084f-282bbca1f217: ERROR
                            Apr 27 10:57:41 uk SMGC: [26291]
                            Apr 27 10:57:41 uk SM: [26334] Failed to join node(s): set([u'uk.dc1.xcp-ng-hyper3'])
                            Apr 27 10:57:41 uk SM: [26334] Synchronize metadata...
                            Apr 27 10:57:41 uk SM: [26334] LinstorSR.update for a20ee08c-40d0-9818-084f-282bbca1f217
                            Apr 27 11:02:20 uk SM: [2783] lock: opening lock file /var/lock/sm/a20ee08c-40d0-9818-084f-282bbca1f217/sr
                            Apr 27 11:02:20 uk SM: [2783] lock: acquired /var/lock/sm/a20ee08c-40d0-9818-084f-282bbca1f217/sr
                            Apr 27 11:02:22 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:02:24 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:02:25 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:02:26 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:02:28 uk SM: message repeated 2 times: [ [2783] Raising exception [150, Failed to initialize XMLRPC connection]]
                            Apr 27 11:02:29 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:02:30 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:02:32 uk SM: message repeated 2 times: [ [2783] Raising exception [150, Failed to initialize XMLRPC connection]]
                            Apr 27 11:02:32 uk SM: [2783] Connecting from config to LINSTOR controller using: 172.16.10.48
                            Apr 27 11:02:36 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:02:37 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:02:39 uk SM: message repeated 2 times: [ [2783] Raising exception [150, Failed to initialize XMLRPC connection]]
                            Apr 27 11:02:40 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:02:41 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:02:43 uk SM: message repeated 2 times: [ [2783] Raising exception [150, Failed to initialize XMLRPC connection]]
                            Apr 27 11:02:44 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:02:45 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:02:45 uk SM: [2783] Got exception: Unable to find controller uri.... Retry number: 0
                            Apr 27 11:02:45 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:02:46 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:02:47 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:02:48 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:02:49 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:02:51 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:02:52 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:02:54 uk SM: message repeated 2 times: [ [2783] Raising exception [150, Failed to initialize XMLRPC connection]]
                            Apr 27 11:02:55 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:02:55 uk SM: [2783] Connecting from config to LINSTOR controller using: 172.16.10.49
                            Apr 27 11:02:55 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:02:56 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:02:59 uk SM: message repeated 3 times: [ [2783] Raising exception [150, Failed to initialize XMLRPC connection]]
                            Apr 27 11:03:00 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:03:01 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:03:03 uk SM: message repeated 2 times: [ [2783] Raising exception [150, Failed to initialize XMLRPC connection]]
                            Apr 27 11:03:04 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:03:04 uk SM: [2783] Got exception: Unable to find controller uri.... Retry number: 0
                            Apr 27 11:03:04 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:03:06 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:03:09 uk SM: message repeated 3 times: [ [2783] Raising exception [150, Failed to initialize XMLRPC connection]]
                            Apr 27 11:03:10 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:03:11 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:03:14 uk SM: message repeated 3 times: [ [2783] Raising exception [150, Failed to initialize XMLRPC connection]]
                            Apr 27 11:03:14 uk SM: [2783] Connecting from config to LINSTOR controller using: 172.16.10.47
                            Apr 27 11:03:15 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:03:16 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:03:18 uk SM: message repeated 2 times: [ [2783] Raising exception [150, Failed to initialize XMLRPC connection]]
                            Apr 27 11:03:19 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:03:21 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:03:22 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:03:24 uk SM: message repeated 2 times: [ [2783] Raising exception [150, Failed to initialize XMLRPC connection]]
                            Apr 27 11:03:25 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:03:25 uk SM: [2783] Got exception: Unable to find controller uri.... Retry number: 0
                            Apr 27 11:03:25 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:03:26 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:03:29 uk SM: message repeated 3 times: [ [2783] Raising exception [150, Failed to initialize XMLRPC connection]]
                            Apr 27 11:03:30 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:03:31 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:03:33 uk SM: message repeated 2 times: [ [2783] Raising exception [150, Failed to initialize XMLRPC connection]]
                            Apr 27 11:03:34 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:03:34 uk SM: [2783] Connecting from config to LINSTOR controller using: 172.16.10.46
                            Apr 27 11:03:34 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:03:36 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:03:39 uk SM: message repeated 3 times: [ [2783] Raising exception [150, Failed to initialize XMLRPC connection]]
                            Apr 27 11:03:40 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:03:41 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:03:42 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:03:43 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:03:44 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:03:44 uk SM: [2783] Got exception: Unable to find controller uri.... Retry number: 0
                            Apr 27 11:03:44 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:03:45 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:03:46 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:03:48 uk SM: message repeated 2 times: [ [2783] Raising exception [150, Failed to initialize XMLRPC connection]]
                            Apr 27 11:03:49 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:03:50 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:03:54 uk SM: message repeated 3 times: [ [2783] Raising exception [150, Failed to initialize XMLRPC connection]]
                            Apr 27 11:03:54 uk SM: [2783] Raising exception [47, The SR is not available [opterr=No valid controller URI to attach/detach from config]]
                            Apr 27 11:03:54 uk SM: [2783] lock: released /var/lock/sm/a20ee08c-40d0-9818-084f-282bbca1f217/sr
                            Apr 27 11:03:54 uk SM: [2783] ***** generic exception: vdi_attach_from_config: EXCEPTION <class 'SR.SROSError'>, The SR is not available [opterr=No valid controller URI to attach/detach from config]
                            Apr 27 11:03:54 uk SM: [2783]   File "/opt/xensource/sm/SRCommand.py", line 110, in run
                            Apr 27 11:03:54 uk SM: [2783]     return self._run_locked(sr)
                            Apr 27 11:03:54 uk SM: [2783]   File "/opt/xensource/sm/SRCommand.py", line 153, in _run_locked
                            Apr 27 11:03:54 uk SM: [2783]     target = sr.vdi(self.vdi_uuid)
                            Apr 27 11:03:54 uk SM: [2783]   File "/opt/xensource/sm/LinstorSR", line 634, in wrap
                            Apr 27 11:03:54 uk SM: [2783]     return load(self, *args, **kwargs)
                            Apr 27 11:03:54 uk SM: [2783]   File "/opt/xensource/sm/LinstorSR", line 504, in load
                            Apr 27 11:03:54 uk SM: [2783]     opterr='No valid controller URI to attach/detach from config'
                            Apr 27 11:03:54 uk SM: [2783]
                            Apr 27 11:03:54 uk SM: [2783] ***** LINSTOR resources on XCP-ng: EXCEPTION <class 'SR.SROSError'>, The SR is not available [opterr=No valid controller URI to attach/detach from config]
                            Apr 27 11:03:54 uk SM: [2783]   File "/opt/xensource/sm/SRCommand.py", line 378, in run
                            Apr 27 11:03:54 uk SM: [2783]     ret = cmd.run(sr)
                            Apr 27 11:03:54 uk SM: [2783]   File "/opt/xensource/sm/SRCommand.py", line 110, in run
                            Apr 27 11:03:54 uk SM: [2783]     return self._run_locked(sr)
                            Apr 27 11:03:54 uk SM: [2783]   File "/opt/xensource/sm/SRCommand.py", line 153, in _run_locked
                            Apr 27 11:03:54 uk SM: [2783]     target = sr.vdi(self.vdi_uuid)
                            Apr 27 11:03:54 uk SM: [2783]   File "/opt/xensource/sm/LinstorSR", line 634, in wrap
                            Apr 27 11:03:54 uk SM: [2783]     return load(self, *args, **kwargs)
                            Apr 27 11:03:54 uk SM: [2783]   File "/opt/xensource/sm/LinstorSR", line 504, in load
                            Apr 27 11:03:54 uk SM: [2783]     opterr='No valid controller URI to attach/detach from config'
                            Apr 27 11:03:54 uk SM: [2783]
                            Apr 27 11:03:59 uk SM: [4037] Warning: vdi_[de]activate present for dummy
                            Apr 27 11:04:00 uk SM: [4164] lock: opening lock file /var/lock/sm/a20ee08c-40d0-9818-084f-282bbca1f217/sr
                            Apr 27 11:04:00 uk SM: [4164] lock: acquired /var/lock/sm/a20ee08c-40d0-9818-084f-282bbca1f217/sr
                            Apr 27 11:04:00 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:04:01 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:04:02 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:04:03 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:04:04 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:04:05 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:04:06 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:04:07 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:04:09 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:04:10 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:04:10 uk SM: [4164] Connecting from config to LINSTOR controller using: 172.16.10.48
                            Apr 27 11:04:10 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:04:11 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:04:12 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:04:13 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:04:14 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:04:15 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:04:16 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:04:17 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:04:18 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:04:19 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:04:19 uk SM: [4164] Got exception: Unable to find controller uri.... Retry number: 0
                            Apr 27 11:04:19 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:04:20 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:04:21 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:04:22 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:04:24 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:04:25 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:04:26 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:04:27 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:04:28 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:04:29 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:04:29 uk SM: [4164] Connecting from config to LINSTOR controller using: 172.16.10.49
                            Apr 27 11:04:29 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:04:30 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:04:31 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:04:32 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:04:33 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:04:34 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:04:35 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:04:36 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:04:37 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:04:39 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:04:39 uk SM: [4164] Got exception: Unable to find controller uri.... Retry number: 0
                            Apr 27 11:04:39 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:04:40 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:04:41 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection]
                            Apr 27 11:04:42 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection]                     
                            
                            F 1 Reply Last reply Reply Quote 0
                            • F Offline
                              fred974 @fred974
                              last edited by

                              /var/log/xensource.log

                              https://pastebin.com/NUqGsSk6

                              I am not sure what to look for so I hope this is righ

                              F 1 Reply Last reply Reply Quote 0
                              • F Offline
                                fred974 @fred974
                                last edited by

                                Hope someone can help me understand what the issue is

                                1 Reply Last reply Reply Quote 0
                                • olivierlambertO Offline
                                  olivierlambert Vates 🪐 Co-Founder CEO
                                  last edited by olivierlambert

                                  Ronan is in vacation now, but he'll take a look when he's back 🙂 (tomorrow maybe, Monday I'm pretty sure)

                                  F 1 Reply Last reply Reply Quote 0
                                  • F Offline
                                    fred974 @olivierlambert
                                    last edited by

                                    @olivierlambert Thank you very much for letting me know

                                    F 1 Reply Last reply Reply Quote 0
                                    • F Offline
                                      fred974 @fred974
                                      last edited by

                                      @ronan-a are you able to help me with this problem? I added more info and log file on this thread too.

                                      ronan-aR 1 Reply Last reply Reply Quote 0
                                      • ronan-aR Offline
                                        ronan-a Vates 🪐 XCP-ng Team @fred974
                                        last edited by

                                        @fred974 Hi, well first, how many hosts do you have?
                                        We recommend to use at least 3 hosts, (4 is more robust). And also what's your replication count on your LINSTOR SR?
                                        I ask these questions because it's possible that a problem on a host has caused reboots on the whole pool and finally the emergency state.

                                        Now: can you share the kern.log files of each host? And execute this command (on each machine) please:

                                        drbdsetup status xcp-persistent-database
                                        
                                        F 1 Reply Last reply Reply Quote 0
                                        • F Offline
                                          fred974 @ronan-a
                                          last edited by fred974

                                          @ronan-a said in Lost access to all servers:

                                          well first, how many hosts do you have?

                                          We have 4x hosts.
                                          Host1 was the original master (host2 is new master) and I think the DRBD replication count is 3 (how can I double check?)
                                          Host1:

                                          [21:15 uk ~]# drbdsetup status xcp-persistent-database
                                          xcp-persistent-database role:Secondary
                                            disk:Diskless quorum:no
                                            uk.dc1.xcp-ng-hyper2 connection:Connecting
                                            uk.dc1.xcp-ng-hyper3 connection:Connecting
                                            uk.dc1.xcp-ng-hyper4 connection:Connecting
                                          

                                          Host2, 3 and 4 has

                                          [21:18 uk ~]# drbdsetup status xcp-persistent-database
                                          # No currently configured DRBD found.
                                          xcp-persistent-database: No such resource
                                          

                                          kern.log files host1
                                          host1_kern.log.txt

                                          kern.log files host2
                                          host2_kern.log.txt

                                          kern.log files host3
                                          host3_kern.log.txt

                                          kern.log files host4
                                          host4_kern.log.txt

                                          Our monitor reported the first VM been down at 11am which is reflected in the log file. We also have ourly snapshot so I was wondering if this could also been the reason why. I hope the file above can help us understand the issue. Also, should I put host1 back as master?

                                          Thank you

                                          ronan-aR 1 Reply Last reply Reply Quote 0
                                          • ronan-aR Offline
                                            ronan-a Vates 🪐 XCP-ng Team @fred974
                                            last edited by

                                            @fred974 I'll take a look at the logs. Thanks. What's the ouput of lvs? If the database is not active, execute: vgchange -ay linstor_group.

                                            F 1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post