XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Orphaned host after pool metadata restore

    Scheduled Pinned Locked Moved Xen Orchestra
    7 Posts 3 Posters 937 Views 1 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • S Offline
      shep
      last edited by shep

      After restoring pool metadata hosts rebooted, but one of my hosts seems orphaned. I tried to re-add the host with New -> Server, but after connecting I get the following errror:
      "Unkown error
      this pool is already connected"

      Going to the pool and trying to add a host there doesn't show any hosts available.

      Looking closer at the xcp-ng host (8.2) shows some weirdness.

      Status display on xcp-ng console shows Management Network Parameters <No network configured>
      Under Network and Management Interface -> Configure Management Interface, shows "<No interfaces present>""Display NICs" is empty.
      I performed the "Emergency Network Reset". The correct interface was chosen, IP of the master host was entered but this action didn't seem to change anything when the host finally rebooted.

      Looking at the network interfaces, things look correct, I can ping the master's IP. I was also getting these errors:

      Broadcast message from systemd-journald@xcp-hostname (Mon 2021-06-14 20:53:38 EDT):
      xapi-nbd[22551]: main:   "Failed to log in via xapi's Unix domain socket in 300.000000 seconds")
      

      Some entries from xensource.log on host:

      Jun 14 20:42:05 xcp-blade4 xapi: [debug||0 ||dummytaskhelper] task Starting SM xapi event service D:2e9a7197ef49 created by task D:69a2c28634ac
      Jun 14 20:42:05 xcp-blade4 xapi: [debug||0 ||startup] task [Killing stray sparse_dd processes]
      Jun 14 20:42:05 xcp-blade4 xapi: [debug||0 ||dummytaskhelper] task Killing stray sparse_dd processes D:46f097b12a6c created by task D:69a2c28634ac
      Jun 14 20:42:05 xcp-blade4 xapi: [debug||0 ||startup] task [Registering http handlers]
      Jun 14 20:42:05 xcp-blade4 xapi: [debug||0 ||dummytaskhelper] task Registering http handlers D:d50d84df4380 created by task D:69a2c28634ac
      Jun 14 20:42:05 xcp-blade4 xapi: [debug||0 ||startup] task [Listening unix socket]
      Jun 14 20:42:05 xcp-blade4 xapi: [debug||0 ||dummytaskhelper] task Listening unix socket D:bc4b151de98d created by task D:69a2c28634ac
      Jun 14 20:42:05 xcp-blade4 xapi: [ info||0 |Listening unix socket D:bc4b151de98d|xapi_http] Successfully bound socket to: UNIX /var/lib/xcp/xapi
      Jun 14 20:42:05 xcp-blade4 xapi: [debug||0 ||startup] task [Checking HA configuration]
      Jun 14 20:42:05 xcp-blade4 xapi: [debug||0 ||dummytaskhelper] task Checking HA configuration D:5a0296150025 created by task D:69a2c28634ac
      Jun 14 20:42:05 xcp-blade4 xapi: [debug||0 ||startup] task [Checking for non-HA redo-log]
      Jun 14 20:42:05 xcp-blade4 xapi: [debug||0 ||dummytaskhelper] task Checking for non-HA redo-log D:0352bb5751ec created by task D:69a2c28634ac
      Jun 14 20:42:05 xcp-blade4 xapi: [debug||0 ||startup] task [Setup DB configuration]
      Jun 14 20:42:05 xcp-blade4 xapi: [debug||0 ||dummytaskhelper] task Setup DB configuration D:bf13fd588924 created by task D:69a2c28634ac
      Jun 14 20:42:05 xcp-blade4 xapi: [debug||0 |Setup DB configuration D:bf13fd588924|xapi] parsing db config file
      Jun 14 20:42:05 xcp-blade4 xapi: [debug||0 |Setup DB configuration D:bf13fd588924|parse_db_conf] [/var/lib/xcp/state.db]
      Jun 14 20:42:05 xcp-blade4 xapi: [debug||0 |Setup DB configuration D:bf13fd588924|parse_db_conf] mode:no_limit
      Jun 14 20:42:05 xcp-blade4 xapi: [debug||0 |Setup DB configuration D:bf13fd588924|parse_db_conf] format:xml
      Jun 14 20:42:05 xcp-blade4 xapi: [debug||0 |Setup DB configuration D:bf13fd588924|parse_db_conf] available_this_boot:true
      Jun 14 20:42:05 xcp-blade4 xapi: [debug||0 |Setup DB configuration D:bf13fd588924|parse_db_conf]
      Jun 14 20:42:05 xcp-blade4 xapi: [debug||0 ||startup] task [bringing up management interface]
      Jun 14 20:42:05 xcp-blade4 xapi: [debug||0 ||dummytaskhelper] task bringing up management interface D:4956e5ae7e45 created by task D:69a2c28634ac
      Jun 14 20:42:06 xcp-blade4 xapi: [debug||0 |bringing up management interface D:4956e5ae7e45|xapi_mgmt_iface] Shutting down the old management interface (if any)
      Jun 14 20:42:06 xcp-blade4 xapi: [ info||0 |bringing up management interface D:4956e5ae7e45|xapi_mgmt_iface] Starting new server (listening on all IP addresses)
      Jun 14 20:42:06 xcp-blade4 xapi: [ info||0 |bringing up management interface D:4956e5ae7e45|xapi_http] Successfully bound socket to: INET :::80
      Jun 14 20:42:06 xcp-blade4 xapi: [ info||0 |bringing up management interface D:4956e5ae7e45|xapi_mgmt_iface] Restarting stunnel (accepting connections on :::443)
      Jun 14 20:42:06 xcp-blade4 xapi: [debug||9 ||helpers] about to call script: /usr/bin/systemctl
      Jun 14 20:42:06 xcp-blade4 xapi: [debug||9 ||helpers] /usr/bin/systemctl is-enabled stunnel@xapi succeeded [ output = 'enabled\x0A' ]
      Jun 14 20:42:06 xcp-blade4 xapi: [debug||9 ||helpers] about to call script: /usr/bin/systemctl
      Jun 14 20:42:06 xcp-blade4 xapi: [debug||0 |bringing up management interface D:4956e5ae7e45|xapi] Management IP address is: 10.10.x.x
      Jun 14 20:42:06 xcp-blade4 xapi: [error||0 |bringing up management interface D:4956e5ae7e45|master_connection] Caught Master_connection.Goto_handler
      Jun 14 20:42:06 xcp-blade4 xapi: [debug||11 ||dummytaskhelper] task dom0 networking update D:110afb84267b created by task D:69a2c28634ac
      Jun 14 20:42:06 xcp-blade4 xapi: [debug||0 |bringing up management interface D:4956e5ae7e45|master_connection] Connection to master died. I will continue to retry indefinitely (supressing future logging of this message).
      Jun 14 20:42:06 xcp-blade4 xapi: [debug||11 |dom0 networking update D:110afb84267b|xapi_mgmt_iface] Checking to see if hostname or management IP has changed
      Jun 14 20:42:06 xcp-blade4 xapi: [error||0 |bringing up management interface D:4956e5ae7e45|master_connection] Connection to master died. I will continue to retry indefinitely (supressing future logging of this message).
      Jun 14 20:42:06 xcp-blade4 xapi: [debug||11 |dom0 networking update D:110afb84267b|helpers] Updating IP addresses in DB for DHCP and autoconf PIFs
      Jun 14 20:42:06 xcp-blade4 xapi: [debug||0 |bringing up management interface D:4956e5ae7e45|master_connection] Sleeping 2.000000 seconds before retrying master connection...
      Jun 14 20:42:06 xcp-blade4 xapi: [debug||9 ||helpers] /usr/bin/systemctl restart stunnel@xapi succeeded [ output = '' ]
      Jun 14 20:42:08 xcp-blade4 xapi: [debug||0 |bringing up management interface D:4956e5ae7e45|master_connection] stunnel: stunnel start\x0A
      Jun 14 20:42:13 xcp-blade4 xapi: [ info||0 |bringing up management interface D:4956e5ae7e45|master_connection] stunnel connected pid=16456 fd=23
      

      Any help is much appreciated.

      1 Reply Last reply Reply Quote 0
      • olivierlambertO Offline
        olivierlambert Vates 🪐 Co-Founder CEO
        last edited by

        Hi!

        Can you tell me why did you restored metadata on a working host? The initial behavior is normal, restoring metadata isn't without consequences.

        In any case, the master can't be reach anymore, that's why XAPI can't start on this host.

        S 1 Reply Last reply Reply Quote 0
        • S Offline
          shep @olivierlambert
          last edited by

          @olivierlambert That was actually an accident on this pool. This pool was healthy and I used a snapshot that was taken from the night before. I just let it play out though and when the pool came back up, this host didn't seem to work. The part I'm not sure about is from a linux perspective, the interface is there and there's nothing wrong with the comms between the host.

          1 Reply Last reply Reply Quote 0
          • olivierlambertO Offline
            olivierlambert Vates 🪐 Co-Founder CEO
            last edited by

            On the slave, you have to force it to reconnect to the master, and the master should get the slave. There's multiple commands to do it, I don't have them in mind right now, but I'm sure someone around will tell you 🙂

            S 1 Reply Last reply Reply Quote 0
            • S Offline
              shep @olivierlambert
              last edited by

              @olivierlambert I tried xe pool-join, but this doesn't seem to work, I just get an error about an invalid uuid. Also when I just run xe host-list I get a very similar message:

              The uuid you supplied was invalid.
              type: host
              uuid: 2b438638-d0e2-4bb1-8fee-b45f2a5d86aa
              
              1 Reply Last reply Reply Quote 0
              • O Offline
                oryon.br
                last edited by

                i I have the same problem, has anyone found the solution?

                S 1 Reply Last reply Reply Quote 0
                • S Offline
                  shep @oryon.br
                  last edited by

                  @oryon-br I didn't get any more help on this and I couldn't find any solutions so I ended up rebuilding the pool to fix this issue. Not ideal.

                  1 Reply Last reply Reply Quote 0
                  • First post
                    Last post