XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Lost access to all servers

    Scheduled Pinned Locked Moved Compute
    36 Posts 6 Posters 9.2k Views 6 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • F Offline
      fred974
      last edited by

      Hi all,

      I have a cluster of 4x host and suddenly, I lost access to all of the VM on the master.
      When I looked at it, the master host is saying that it hasn't got any NIC card

      cbb60d02-fb8d-4480-9566-f3c7bc8d2814-image.png
      But I am able to ssh to it.

      I have XOSTOR install but not used for production.

      Could anyone please advice on how i can get the system back online?

      On the other host I have the following
      366365b0-e086-4c2b-9f21-c91e55a4128f-image.png

      all the host can ping google.com

      Do I need to remove the host from the pool? how will that work with xostor?
      What log do I need to investigate

      I also get this on the screen
      c08a9181-1a13-413f-b6fb-5d4f3d056b5c-image.png

      F 1 Reply Last reply Reply Quote 0
      • F Offline
        fred974 @fred974
        last edited by

        Log from master

        [11:57 uk ~]# tail /var/log/SMlog
        Apr 27 11:56:41 uk SM: [9991] Raising exception [150, Failed to initialize XMLRPC connection]
        Apr 27 11:56:42 uk SM: [9991] Raising exception [150, Failed to initialize XMLRPC connection]
        Apr 27 11:56:43 uk SM: [9991] Raising exception [150, Failed to initialize XMLRPC connection]
        Apr 27 11:56:44 uk SM: [9991] Raising exception [150, Failed to initialize XMLRPC connection]
        Apr 27 11:56:45 uk SM: [9991] Raising exception [150, Failed to initialize XMLRPC connection]
        Apr 27 11:56:47 uk SM: message repeated 2 times: [ [9991] Raising exception [150, Failed to initialize XMLRPC connection]]
        Apr 27 11:56:48 uk SM: [9991] Raising exception [150, Failed to initialize XMLRPC connection]
        Apr 27 11:56:49 uk SM: [9991] Raising exception [150, Failed to initialize XMLRPC connection]
        Apr 27 11:56:50 uk SM: [9991] Raising exception [150, Failed to initialize XMLRPC connection]
        Apr 27 11:56:50 uk SM: [9991] Connecting from config to LINSTOR controller using: 172.16.10.47
        
        [11:58 uk ~]# tail /var/log/VMSSlog
        Apr 27 09:45:01 uk VMSS: [12832] ===Kicking cron job for VMSS===
        Apr 27 09:45:01 uk VMSS: [12832] VMSS policy not enabled for this pool, Exiting cron job.
        Apr 27 10:00:01 uk VMSS: [28354] ===Kicking cron job for VMSS===
        Apr 27 10:00:01 uk VMSS: [28354] VMSS policy not enabled for this pool, Exiting cron job.
        Apr 27 10:15:10 uk VMSS: [12571] ===Kicking cron job for VMSS===
        Apr 27 10:15:10 uk VMSS: [12571] VMSS policy not enabled for this pool, Exiting cron job.
        Apr 27 10:30:11 uk VMSS: [28160] ===Kicking cron job for VMSS===
        Apr 27 10:30:11 uk VMSS: [28160] VMSS policy not enabled for this pool, Exiting cron job.
        Apr 27 10:45:01 uk VMSS: [12520] ===Kicking cron job for VMSS===
        Apr 27 10:45:01 uk VMSS: [12520] VMSS policy not enabled for this pool, Exiting cron job.
        
        F 1 Reply Last reply Reply Quote 0
        • F Offline
          fred974 @fred974
          last edited by fred974

          1b3dc9f6-2029-41e8-9146-2f600e284daa-image.png @fred974

          AtaxyaNetworkA 1 Reply Last reply Reply Quote 0
          • AtaxyaNetworkA Offline
            AtaxyaNetwork Ambassador @fred974
            last edited by

            @fred974 Hi !

            You can try a xe-toolstack-restart on the master, it will not harm your running VMs

            F 1 Reply Last reply Reply Quote 0
            • F Offline
              fred974 @AtaxyaNetwork
              last edited by

              @AtaxyaNetwork said in Lost access to all servers:

              @fred974 Hi !
              You can try a xe-toolstack-restart on the master, it will not harm your running VMs

              I am doing it now but I am not getting the cursor back
              Also got this:

              [12:09 uk ~]# xe host-is-in-emergency-mode
              true
              [12:09 uk ~]# xe pool-recover-slaves
              The server could not join the liveset because the HA daemon could not access the heartbeat disk.
              
              F 1 Reply Last reply Reply Quote 0
              • F Offline
                fred974 @fred974
                last edited by

                Forgot to say the cluster has HA enable

                F 1 Reply Last reply Reply Quote 0
                • F Offline
                  fred974 @fred974
                  last edited by

                  Should I follow this?
                  https://support.citrix.com/article/CTX131127/unable-to-connect-to-high-availability-enabled-xensever-pool-and-all-servers-in-pool-are-in-emergency-mode

                  1 Reply Last reply Reply Quote 0
                  • olivierlambertO Offline
                    olivierlambert Vates 🪐 Co-Founder CEO
                    last edited by

                    "Forget to say HA was enabled": that's the main information here 😆

                    Yes, disable HA first 🙂

                    F 1 Reply Last reply Reply Quote 0
                    • F Offline
                      fred974 @olivierlambert
                      last edited by

                      @olivierlambert I disabled HA and set host 2 as new master and the NIC are showing again but I cannot ssh or access any VM. Including XO. In xcp-ng centre, all the host seem to be in maintenance mode.
                      633c06a7-f79e-4523-8430-bcaaaa982e74-image.png

                      F 1 Reply Last reply Reply Quote 0
                      • F Offline
                        fred974 @fred974
                        last edited by

                        [12:37 uk ~]# xe task-list
                        uuid ( RO)                : c8fc2549-9939-8ced-2ab6-cd2b5b1d6a7d
                                  name-label ( RO): server_init
                            name-description ( RO):
                                      status ( RO): pending
                                    progress ( RO): 0.000
                        
                        
                        uuid ( RO)                : 30e9fb68-8326-df55-505d-39a5de71f9cd
                                  name-label ( RO): host.call_plugin
                            name-description ( RO):
                                      status ( RO): pending
                                    progress ( RO): 0.000
                        
                        
                        uuid ( RO)                : 662298de-07e3-c3a2-6559-61e5d79c6d31
                                  name-label ( RO): server_init
                            name-description ( RO):
                                      status ( RO): pending
                                    progress ( RO): 0.000
                        
                        
                        uuid ( RO)                : 24b360e3-dc0c-b05b-61b2-1352f23d4b44
                                  name-label ( RO): server_init
                            name-description ( RO):
                                      status ( RO): pending
                                    progress ( RO): 0.000
                        
                        
                        uuid ( RO)                : 24d698b2-dec9-2c20-81a2-9d75a3118705
                                  name-label ( RO): host.call_plugin
                            name-description ( RO):
                                      status ( RO): pending
                                    progress ( RO): 0.000
                        
                        
                        uuid ( RO)                : 11fa18e5-3d4a-7036-7eb8-d91dc7a399c0
                                  name-label ( RO): host.call_plugin
                            name-description ( RO):
                                      status ( RO): pending
                                    progress ( RO): 0.000
                        
                        
                        uuid ( RO)                : d8651251-306a-22d6-fd74-231cfb570d12
                                  name-label ( RO): server_init
                            name-description ( RO):
                                      status ( RO): pending
                                    progress ( RO): 0.000
                        
                        F 1 Reply Last reply Reply Quote 0
                        • F Offline
                          fred974 @fred974
                          last edited by

                          @olivierlambert I got all the system back up and running now with HA disabled. I think that I need HA enable to get my XOSTOR SR to work again but I am not happy not understanding what happened today as xcp-ng gone wrong out of the blue. Cold you please tell me what I need to do to investigate the source of the issue?

                          AtaxyaNetworkA 1 Reply Last reply Reply Quote 0
                          • AtaxyaNetworkA Offline
                            AtaxyaNetwork Ambassador @fred974
                            last edited by

                            @fred974 You can check in /var/crash if you have something, /var/log/xensource.log and /var/log/SMlog
                            And maybe a dmesg

                            F 1 Reply Last reply Reply Quote 0
                            • F Offline
                              fred974 @AtaxyaNetwork
                              last edited by

                              @AtaxyaNetwork the crashed happened at 11am and here are the relevant extract for the timeframe
                              /var/crash is empty

                              more /var/log/SMlog -f

                              Apr 27 10:57:41 uk SM: [26334] sr_update {'sr_uuid': 'a20ee08c-40d0-9818-084f-282bbca1f217', 'subtask_of': 'DummyRef:|129646ab-9048-4d66-b873-789ffd07fb00|SR.stat', 'args': [], 'host_ref': 'OpaqueRef:359a920d-
                              7bb1-4088-8b3e-42254f111f51', 'session_ref': 'OpaqueRef:69e52662-3118-4cf4-8b03-9741dbf3b312', 'device_config': {'group-name': 'linstor_group/thin_device', 'redundancy': '3', 'hosts': 'uk.dc1.xcp-ng-hyper1,uk.
                              dc1.xcp-ng-hyper2,uk.dc1.xcp-ng-hyper3,uk.dc1.xcp-ng-hyper4', 'SRmaster': 'true', 'provisioning': 'thin'}, 'command': 'sr_update', 'sr_ref': 'OpaqueRef:f62acb08-116b-42e4-90df-e7d2153ed610', 'local_cache_sr':
                              '28b8eb58-a6a2-c2fa-ad1e-b339b531330f'}
                              Apr 27 10:57:41 uk SM: [25812]   pread SUCCESS
                              Apr 27 10:57:41 uk SM: [25812] lock: released /var/lock/sm/.nil/lvm
                              Apr 27 10:57:41 uk SM: [25812] Updating metadata : {'objtype': 'sr', 'name_description': 'iSCSI Storage on TrueNAS Core - HDD', 'name_label': 'TrueStoreHDD_iSCSI'}
                              Apr 27 10:57:41 uk SM: [25812] entering updateSR
                              Apr 27 10:57:41 uk SM: [25812] lock: released /var/lock/sm/f7d16827-19e0-c57d-a720-c7fba180d4af/sr
                              Apr 27 10:57:41 uk SMGC: [26291] GC process exiting, no work left
                              Apr 27 10:57:41 uk SM: [26291] lock: released /var/lock/sm/a20ee08c-40d0-9818-084f-282bbca1f217/gc_active
                              Apr 27 10:57:41 uk SMGC: [26291] In cleanup
                              Apr 27 10:57:41 uk SMGC: [26291] SR a20e ('XOSTOR') (23 VDIs in 16 VHD trees): no changes
                              Apr 27 10:57:41 uk SMGC: [26291] *~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*
                              Apr 27 10:57:41 uk SMGC: [26291]          ***********************
                              Apr 27 10:57:41 uk SMGC: [26291]          *  E X C E P T I O N  *
                              Apr 27 10:57:41 uk SMGC: [26291]          ***********************
                              Apr 27 10:57:41 uk SMGC: [26291] gc: EXCEPTION <class 'XenAPI.Failure'>, ['UUID_INVALID', 'VDI', 'DELETED_267dfbbd-bc85-4f61-92ad-0fb2703fdd49']
                              Apr 27 10:57:41 uk SMGC: [26291]   File "/opt/xensource/sm/cleanup.py", line 3413, in gc
                              Apr 27 10:57:41 uk SMGC: [26291]     _gc(None, srUuid, dryRun)
                              Apr 27 10:57:41 uk SMGC: [26291]   File "/opt/xensource/sm/cleanup.py", line 3298, in _gc
                              Apr 27 10:57:41 uk SMGC: [26291]     _gcLoop(sr, dryRun)
                              Apr 27 10:57:41 uk SMGC: [26291]   File "/opt/xensource/sm/cleanup.py", line 3209, in _gcLoop
                              Apr 27 10:57:41 uk SMGC: [26291]     if not sr.hasWork():
                              Apr 27 10:57:41 uk SMGC: [26291]   File "/opt/xensource/sm/cleanup.py", line 1652, in hasWork
                              Apr 27 10:57:41 uk SMGC: [26291]     if self.findLeafCoalesceable():
                              Apr 27 10:57:41 uk SMGC: [26291]   File "/opt/xensource/sm/cleanup.py", line 1734, in findLeafCoalesceable
                              Apr 27 10:57:41 uk SMGC: [26291]     self.gatherLeafCoalesceable(candidates)
                              Apr 27 10:57:41 uk SMGC: [26291]   File "/opt/xensource/sm/cleanup.py", line 1766, in gatherLeafCoalesceable
                              Apr 27 10:57:41 uk SMGC: [26291]     if vdi.getConfig(vdi.DB_ONBOOT) == vdi.ONBOOT_RESET:
                              Apr 27 10:57:41 uk SMGC: [26291]   File "/opt/xensource/sm/cleanup.py", line 531, in getConfig
                              Apr 27 10:57:41 uk SMGC: [26291]     config = self.sr.xapi.getConfigVDI(self, key)
                              Apr 27 10:57:41 uk SMGC: [26291]   File "/opt/xensource/sm/cleanup.py", line 385, in getConfigVDI
                              Apr 27 10:57:41 uk SMGC: [26291]     cfg = self.session.xenapi.VDI.get_on_boot(vdi.getRef())
                              Apr 27 10:57:41 uk SMGC: [26291]   File "/opt/xensource/sm/cleanup.py", line 527, in getRef
                              Apr 27 10:57:41 uk SMGC: [26291]     self._vdiRef = self.sr.xapi.getRefVDI(self)
                              Apr 27 10:57:41 uk SMGC: [26291]   File "/opt/xensource/sm/cleanup.py", line 356, in getRefVDI
                              Apr 27 10:57:41 uk SMGC: [26291]     return self._getRefVDI(vdi.uuid)
                              Apr 27 10:57:41 uk SMGC: [26291]   File "/opt/xensource/sm/cleanup.py", line 353, in _getRefVDI
                              Apr 27 10:57:41 uk SMGC: [26291]     return self.session.xenapi.VDI.get_by_uuid(uuid)
                              Apr 27 10:57:41 uk SMGC: [26291]   File "/usr/lib/python2.7/site-packages/XenAPI.py", line 264, in __call__
                              Apr 27 10:57:41 uk SMGC: [26291]     return self.__send(self.__name, args)
                              Apr 27 10:57:41 uk SMGC: [26291]   File "/usr/lib/python2.7/site-packages/XenAPI.py", line 160, in xenapi_request
                              Apr 27 10:57:41 uk SMGC: [26291]     result = _parse_result(getattr(self, methodname)(*full_params))
                              Apr 27 10:57:41 uk SMGC: [26291]   File "/usr/lib/python2.7/site-packages/XenAPI.py", line 238, in _parse_result
                              Apr 27 10:57:41 uk SMGC: [26291]     raise Failure(result['ErrorDescription'])
                              Apr 27 10:57:41 uk SMGC: [26291]
                              Apr 27 10:57:41 uk SMGC: [26291] *~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*
                              Apr 27 10:57:41 uk SMGC: [26291] * * * * * SR a20ee08c-40d0-9818-084f-282bbca1f217: ERROR
                              Apr 27 10:57:41 uk SMGC: [26291]
                              Apr 27 10:57:41 uk SM: [26334] Failed to join node(s): set([u'uk.dc1.xcp-ng-hyper3'])
                              Apr 27 10:57:41 uk SM: [26334] Synchronize metadata...
                              Apr 27 10:57:41 uk SM: [26334] LinstorSR.update for a20ee08c-40d0-9818-084f-282bbca1f217
                              Apr 27 11:02:20 uk SM: [2783] lock: opening lock file /var/lock/sm/a20ee08c-40d0-9818-084f-282bbca1f217/sr
                              Apr 27 11:02:20 uk SM: [2783] lock: acquired /var/lock/sm/a20ee08c-40d0-9818-084f-282bbca1f217/sr
                              Apr 27 11:02:22 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:02:24 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:02:25 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:02:26 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:02:28 uk SM: message repeated 2 times: [ [2783] Raising exception [150, Failed to initialize XMLRPC connection]]
                              Apr 27 11:02:29 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:02:30 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:02:32 uk SM: message repeated 2 times: [ [2783] Raising exception [150, Failed to initialize XMLRPC connection]]
                              Apr 27 11:02:32 uk SM: [2783] Connecting from config to LINSTOR controller using: 172.16.10.48
                              Apr 27 11:02:36 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:02:37 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:02:39 uk SM: message repeated 2 times: [ [2783] Raising exception [150, Failed to initialize XMLRPC connection]]
                              Apr 27 11:02:40 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:02:41 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:02:43 uk SM: message repeated 2 times: [ [2783] Raising exception [150, Failed to initialize XMLRPC connection]]
                              Apr 27 11:02:44 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:02:45 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:02:45 uk SM: [2783] Got exception: Unable to find controller uri.... Retry number: 0
                              Apr 27 11:02:45 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:02:46 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:02:47 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:02:48 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:02:49 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:02:51 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:02:52 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:02:54 uk SM: message repeated 2 times: [ [2783] Raising exception [150, Failed to initialize XMLRPC connection]]
                              Apr 27 11:02:55 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:02:55 uk SM: [2783] Connecting from config to LINSTOR controller using: 172.16.10.49
                              Apr 27 11:02:55 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:02:56 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:02:59 uk SM: message repeated 3 times: [ [2783] Raising exception [150, Failed to initialize XMLRPC connection]]
                              Apr 27 11:03:00 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:03:01 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:03:03 uk SM: message repeated 2 times: [ [2783] Raising exception [150, Failed to initialize XMLRPC connection]]
                              Apr 27 11:03:04 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:03:04 uk SM: [2783] Got exception: Unable to find controller uri.... Retry number: 0
                              Apr 27 11:03:04 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:03:06 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:03:09 uk SM: message repeated 3 times: [ [2783] Raising exception [150, Failed to initialize XMLRPC connection]]
                              Apr 27 11:03:10 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:03:11 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:03:14 uk SM: message repeated 3 times: [ [2783] Raising exception [150, Failed to initialize XMLRPC connection]]
                              Apr 27 11:03:14 uk SM: [2783] Connecting from config to LINSTOR controller using: 172.16.10.47
                              Apr 27 11:03:15 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:03:16 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:03:18 uk SM: message repeated 2 times: [ [2783] Raising exception [150, Failed to initialize XMLRPC connection]]
                              Apr 27 11:03:19 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:03:21 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:03:22 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:03:24 uk SM: message repeated 2 times: [ [2783] Raising exception [150, Failed to initialize XMLRPC connection]]
                              Apr 27 11:03:25 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:03:25 uk SM: [2783] Got exception: Unable to find controller uri.... Retry number: 0
                              Apr 27 11:03:25 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:03:26 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:03:29 uk SM: message repeated 3 times: [ [2783] Raising exception [150, Failed to initialize XMLRPC connection]]
                              Apr 27 11:03:30 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:03:31 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:03:33 uk SM: message repeated 2 times: [ [2783] Raising exception [150, Failed to initialize XMLRPC connection]]
                              Apr 27 11:03:34 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:03:34 uk SM: [2783] Connecting from config to LINSTOR controller using: 172.16.10.46
                              Apr 27 11:03:34 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:03:36 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:03:39 uk SM: message repeated 3 times: [ [2783] Raising exception [150, Failed to initialize XMLRPC connection]]
                              Apr 27 11:03:40 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:03:41 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:03:42 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:03:43 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:03:44 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:03:44 uk SM: [2783] Got exception: Unable to find controller uri.... Retry number: 0
                              Apr 27 11:03:44 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:03:45 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:03:46 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:03:48 uk SM: message repeated 2 times: [ [2783] Raising exception [150, Failed to initialize XMLRPC connection]]
                              Apr 27 11:03:49 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:03:50 uk SM: [2783] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:03:54 uk SM: message repeated 3 times: [ [2783] Raising exception [150, Failed to initialize XMLRPC connection]]
                              Apr 27 11:03:54 uk SM: [2783] Raising exception [47, The SR is not available [opterr=No valid controller URI to attach/detach from config]]
                              Apr 27 11:03:54 uk SM: [2783] lock: released /var/lock/sm/a20ee08c-40d0-9818-084f-282bbca1f217/sr
                              Apr 27 11:03:54 uk SM: [2783] ***** generic exception: vdi_attach_from_config: EXCEPTION <class 'SR.SROSError'>, The SR is not available [opterr=No valid controller URI to attach/detach from config]
                              Apr 27 11:03:54 uk SM: [2783]   File "/opt/xensource/sm/SRCommand.py", line 110, in run
                              Apr 27 11:03:54 uk SM: [2783]     return self._run_locked(sr)
                              Apr 27 11:03:54 uk SM: [2783]   File "/opt/xensource/sm/SRCommand.py", line 153, in _run_locked
                              Apr 27 11:03:54 uk SM: [2783]     target = sr.vdi(self.vdi_uuid)
                              Apr 27 11:03:54 uk SM: [2783]   File "/opt/xensource/sm/LinstorSR", line 634, in wrap
                              Apr 27 11:03:54 uk SM: [2783]     return load(self, *args, **kwargs)
                              Apr 27 11:03:54 uk SM: [2783]   File "/opt/xensource/sm/LinstorSR", line 504, in load
                              Apr 27 11:03:54 uk SM: [2783]     opterr='No valid controller URI to attach/detach from config'
                              Apr 27 11:03:54 uk SM: [2783]
                              Apr 27 11:03:54 uk SM: [2783] ***** LINSTOR resources on XCP-ng: EXCEPTION <class 'SR.SROSError'>, The SR is not available [opterr=No valid controller URI to attach/detach from config]
                              Apr 27 11:03:54 uk SM: [2783]   File "/opt/xensource/sm/SRCommand.py", line 378, in run
                              Apr 27 11:03:54 uk SM: [2783]     ret = cmd.run(sr)
                              Apr 27 11:03:54 uk SM: [2783]   File "/opt/xensource/sm/SRCommand.py", line 110, in run
                              Apr 27 11:03:54 uk SM: [2783]     return self._run_locked(sr)
                              Apr 27 11:03:54 uk SM: [2783]   File "/opt/xensource/sm/SRCommand.py", line 153, in _run_locked
                              Apr 27 11:03:54 uk SM: [2783]     target = sr.vdi(self.vdi_uuid)
                              Apr 27 11:03:54 uk SM: [2783]   File "/opt/xensource/sm/LinstorSR", line 634, in wrap
                              Apr 27 11:03:54 uk SM: [2783]     return load(self, *args, **kwargs)
                              Apr 27 11:03:54 uk SM: [2783]   File "/opt/xensource/sm/LinstorSR", line 504, in load
                              Apr 27 11:03:54 uk SM: [2783]     opterr='No valid controller URI to attach/detach from config'
                              Apr 27 11:03:54 uk SM: [2783]
                              Apr 27 11:03:59 uk SM: [4037] Warning: vdi_[de]activate present for dummy
                              Apr 27 11:04:00 uk SM: [4164] lock: opening lock file /var/lock/sm/a20ee08c-40d0-9818-084f-282bbca1f217/sr
                              Apr 27 11:04:00 uk SM: [4164] lock: acquired /var/lock/sm/a20ee08c-40d0-9818-084f-282bbca1f217/sr
                              Apr 27 11:04:00 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:04:01 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:04:02 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:04:03 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:04:04 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:04:05 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:04:06 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:04:07 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:04:09 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:04:10 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:04:10 uk SM: [4164] Connecting from config to LINSTOR controller using: 172.16.10.48
                              Apr 27 11:04:10 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:04:11 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:04:12 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:04:13 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:04:14 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:04:15 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:04:16 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:04:17 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:04:18 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:04:19 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:04:19 uk SM: [4164] Got exception: Unable to find controller uri.... Retry number: 0
                              Apr 27 11:04:19 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:04:20 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:04:21 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:04:22 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:04:24 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:04:25 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:04:26 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:04:27 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:04:28 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:04:29 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:04:29 uk SM: [4164] Connecting from config to LINSTOR controller using: 172.16.10.49
                              Apr 27 11:04:29 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:04:30 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:04:31 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:04:32 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:04:33 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:04:34 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:04:35 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:04:36 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:04:37 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:04:39 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:04:39 uk SM: [4164] Got exception: Unable to find controller uri.... Retry number: 0
                              Apr 27 11:04:39 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:04:40 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:04:41 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection]
                              Apr 27 11:04:42 uk SM: [4164] Raising exception [150, Failed to initialize XMLRPC connection]                     
                              
                              F 1 Reply Last reply Reply Quote 0
                              • F Offline
                                fred974 @fred974
                                last edited by

                                /var/log/xensource.log

                                https://pastebin.com/NUqGsSk6

                                I am not sure what to look for so I hope this is righ

                                F 1 Reply Last reply Reply Quote 0
                                • F Offline
                                  fred974 @fred974
                                  last edited by

                                  Hope someone can help me understand what the issue is

                                  1 Reply Last reply Reply Quote 0
                                  • olivierlambertO Offline
                                    olivierlambert Vates 🪐 Co-Founder CEO
                                    last edited by olivierlambert

                                    Ronan is in vacation now, but he'll take a look when he's back 🙂 (tomorrow maybe, Monday I'm pretty sure)

                                    F 1 Reply Last reply Reply Quote 0
                                    • F Offline
                                      fred974 @olivierlambert
                                      last edited by

                                      @olivierlambert Thank you very much for letting me know

                                      F 1 Reply Last reply Reply Quote 0
                                      • F Offline
                                        fred974 @fred974
                                        last edited by

                                        @ronan-a are you able to help me with this problem? I added more info and log file on this thread too.

                                        ronan-aR 1 Reply Last reply Reply Quote 0
                                        • ronan-aR Offline
                                          ronan-a Vates 🪐 XCP-ng Team @fred974
                                          last edited by

                                          @fred974 Hi, well first, how many hosts do you have?
                                          We recommend to use at least 3 hosts, (4 is more robust). And also what's your replication count on your LINSTOR SR?
                                          I ask these questions because it's possible that a problem on a host has caused reboots on the whole pool and finally the emergency state.

                                          Now: can you share the kern.log files of each host? And execute this command (on each machine) please:

                                          drbdsetup status xcp-persistent-database
                                          
                                          F 1 Reply Last reply Reply Quote 0
                                          • F Offline
                                            fred974 @ronan-a
                                            last edited by fred974

                                            @ronan-a said in Lost access to all servers:

                                            well first, how many hosts do you have?

                                            We have 4x hosts.
                                            Host1 was the original master (host2 is new master) and I think the DRBD replication count is 3 (how can I double check?)
                                            Host1:

                                            [21:15 uk ~]# drbdsetup status xcp-persistent-database
                                            xcp-persistent-database role:Secondary
                                              disk:Diskless quorum:no
                                              uk.dc1.xcp-ng-hyper2 connection:Connecting
                                              uk.dc1.xcp-ng-hyper3 connection:Connecting
                                              uk.dc1.xcp-ng-hyper4 connection:Connecting
                                            

                                            Host2, 3 and 4 has

                                            [21:18 uk ~]# drbdsetup status xcp-persistent-database
                                            # No currently configured DRBD found.
                                            xcp-persistent-database: No such resource
                                            

                                            kern.log files host1
                                            host1_kern.log.txt

                                            kern.log files host2
                                            host2_kern.log.txt

                                            kern.log files host3
                                            host3_kern.log.txt

                                            kern.log files host4
                                            host4_kern.log.txt

                                            Our monitor reported the first VM been down at 11am which is reflected in the log file. We also have ourly snapshot so I was wondering if this could also been the reason why. I hope the file above can help us understand the issue. Also, should I put host1 back as master?

                                            Thank you

                                            ronan-aR 1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post