XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Network Management lost, No Nic display Consol

    Scheduled Pinned Locked Moved Management
    14 Posts 9 Posters 1.3k Views 7 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • nikadeN Offline
      nikade Top contributor
      last edited by

      This is a classic issue with XAPI, once you have hosts in a pool and the slave cannot reach the master it will go crazy. Never seen this issue with standalone hosts tho.

      We usually had this issue when upgrading xenserver, so we simply stopped doing that and then never had any issues. We went to "new" versions by simply standing up a new pool and migrate all the vm's over to it 🙂

      A 1 Reply Last reply Reply Quote 0
      • A Online
        acebmxer @nikade
        last edited by

        @nikade Can you elaborate on that more for us? Do you create a new pool with existing host? if so how? or do you pull a host make new pool from that host and migrate the rest over?

        nikadeN 1 Reply Last reply Reply Quote 0
        • nikadeN Offline
          nikade Top contributor @acebmxer
          last edited by

          @acebmxer It's not pretty, but its failsafe. The proceedure looked like this in our case:

          1. Disable HA in the "old pool"
          2. Put a host in the "old pool" into maintenance mode
          3. Reinstall that host and connect it to XOA and then patch it
          4. Create a "new pool" from that host
          5. Create a new LUN or NFS share in the SAN for "new pool" and attach it to "new pool"
          6. Live migrate VM's over from "old pool" to "new pool"
          7. Once you've freed up another host you repeate step 2 and 3 and then join that host to "new pool". It is important that you patch it before joining it to the pool, that is done by going to Settings -> Servers in XOA and connect to it manually.

          And then just continue untill you're done. Live migration is pretty reliable now days, so this works pretty good and since we had 10G network its not taking as long as it used to do with 1G network.
          We did this after a major incident on our primary production site where 2 out of 4 hosts in a pool "suddenly" lost their NIC's after updating them. Since then we never updated the pools again. Standalone hosts are fine tho, they never did this.

          Luckily we had 2 other pools where we could migrate the VM's to, but we couldn't realy trust the updating after that.

          DustyArmstrongD 1 Reply Last reply Reply Quote 1
          • DustyArmstrongD Offline
            DustyArmstrong @nikade
            last edited by DustyArmstrong

            @nikade sorry to drag this up but, is there a particular process or methodology to avoid this in the first place? Just had it happen on two brand new hosts, I had to re-install XCP from scratch. Bit worried if I reboot one of them now for any reason this will happen again. It happened on both the pool master and the slave, network completely wiped out on both.

            Genuinely one of the most bizarre series of events I've ever experienced with server infrastructure, I could not understand what was going on until I found this thread.

            P nikadeN 2 Replies Last reply Reply Quote 0
            • P Offline
              Pilow @DustyArmstrong
              last edited by Pilow

              @DustyArmstrong on your slave host, do a

              # cat /etc/xensource/pool.conf
              slave:xxx.xxx.xxx.xxx
              

              you should see IP address of the master. If not, correct it.
              the master must be pingable and accessible from management of the slaves in order for the slaves to have correct network propagation.

              you can try the command on MASTER host, you should see

              master
              

              if you corrected the file on slave host, reboot it, it should come back normally

              DanpD 1 Reply Last reply Reply Quote 0
              • DanpD Offline
                Danp Pro Support Team @Pilow
                last edited by

                @Pilow said in Network Management lost, No Nic display Consol:

                you can try the command on MASTER host, you should see

                master:ip_of_the_master

                Slight correction. On the master, you should only have

                master
                

                without the colon and IP address.

                P 1 Reply Last reply Reply Quote 1
                • P Offline
                  Pilow @Danp
                  last edited by

                  @Danp ha thanks for the correction, I was so certain to have seen it, that I didn't check on my master

                  1 Reply Last reply Reply Quote 0
                  • nikadeN Offline
                    nikade Top contributor @DustyArmstrong
                    last edited by

                    @DustyArmstrong said in Network Management lost, No Nic display Consol:

                    @nikade sorry to drag this up but, is there a particular process or methodology to avoid this in the first place? Just had it happen on two brand new hosts, I had to re-install XCP from scratch. Bit worried if I reboot one of them now for any reason this will happen again. It happened on both the pool master and the slave, network completely wiped out on both.

                    Genuinely one of the most bizarre series of events I've ever experienced with server infrastructure, I could not understand what was going on until I found this thread.

                    What exactly happend? Could you try and explain in 1-2-3 steps?

                    DustyArmstrongD 1 Reply Last reply Reply Quote 0
                    • DustyArmstrongD Offline
                      DustyArmstrong @nikade
                      last edited by

                      @nikade Sure.

                      I have 2 mini machines running as XCP hosts, decided to upgrade them to newer hardware as they were struggling. I have:

                      XC1 - Brand new pool master on pool XC1
                      XC2 - Brand new pool slave on pool XC1

                      XCP1 - Old pool master on pool XCP1
                      XCP2 - Old pool master on pool XCP2

                      Installed XCP on both the new hosts, imported to Xen Orchestra, designated the first as pool master, added the second to the pool. Performed updates, did some nominal checks and called them good. Wasn't ready to migrate my VMs yet so powered both off.

                      Came yesterday, I unplugged my old hosts and set them aside, plugged in my new hosts (this is onto my UPS supply) power and network. Plugged in my old hosts to my temporary working area to migrate VMs. Powered everything on (old then new). XO stayed online the whole time, though DNS is on a VM that went down.

                      Only a single host came up (XCP2 - slave of XCP1). XCP1 had to be rebooted 3 or 4 times before it finally came alive but at a big ping delay (4ms, usually <1ms). XC1 and XC2 never came alive, their NICs had physical activity but got nothing. Eventually plugged in a monitor (awkward so didn't do it immediately), rebooted them both and saw no network configuration on either, just empty. Found this thread and decided, as they were basically fresh, I would just re-install and start over. Eventually got everything back.

                      I am assuming based on this thread that XC2 may have come up before XC1 and thus couldn't connect so it obliterated itself. Don't know why XC1 also did the same.

                      During testing I powered both XC1 and XC2 on and off multiple times without this happening, it was only when I powered everything on after moving them that it occurred. Thought I was going nuts or had caused a major network loop somehow.

                      nikadeN 1 Reply Last reply Reply Quote 0
                      • nikadeN Offline
                        nikade Top contributor @DustyArmstrong
                        last edited by

                        @DustyArmstrong thats super-strange, i actually have the same setup at home, 2 hp z240 machines running xcp-ng in a small pool.
                        xcp1 is always up and running, xcp2 is powered down when I dont need it, everything important is running on xcp1, maybe that's the reason I don't run into these issues.

                        1 Reply Last reply Reply Quote 0
                        • First post
                          Last post