XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Network Management lost, No Nic display Consol

    Scheduled Pinned Locked Moved Management
    13 Posts 9 Posters 1.2k Views 7 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • M Offline
      msgerbs
      last edited by

      I know this is old, but I also had this issue. I updated my pool from 8.2.1 to 8.3 using the installer ISO. Both hosts came back up, I did an update on the pool master and then the slave, and now the slave has lost its network interfaces. In ifconfig, I see eth0, lo, and xenbro with no IP, along with a "xentemp" which has the IP address I assigned the management address. I did an xe-toolstack-restart and emergency network reset, which did not seem to help. xe pif-list, network-list and all other xe commands simply hang forever.

      1 Reply Last reply Reply Quote 0
      • nikadeN Offline
        nikade Top contributor
        last edited by

        This is a classic issue with XAPI, once you have hosts in a pool and the slave cannot reach the master it will go crazy. Never seen this issue with standalone hosts tho.

        We usually had this issue when upgrading xenserver, so we simply stopped doing that and then never had any issues. We went to "new" versions by simply standing up a new pool and migrate all the vm's over to it 🙂

        A 1 Reply Last reply Reply Quote 0
        • A Offline
          acebmxer @nikade
          last edited by

          @nikade Can you elaborate on that more for us? Do you create a new pool with existing host? if so how? or do you pull a host make new pool from that host and migrate the rest over?

          nikadeN 1 Reply Last reply Reply Quote 0
          • nikadeN Offline
            nikade Top contributor @acebmxer
            last edited by

            @acebmxer It's not pretty, but its failsafe. The proceedure looked like this in our case:

            1. Disable HA in the "old pool"
            2. Put a host in the "old pool" into maintenance mode
            3. Reinstall that host and connect it to XOA and then patch it
            4. Create a "new pool" from that host
            5. Create a new LUN or NFS share in the SAN for "new pool" and attach it to "new pool"
            6. Live migrate VM's over from "old pool" to "new pool"
            7. Once you've freed up another host you repeate step 2 and 3 and then join that host to "new pool". It is important that you patch it before joining it to the pool, that is done by going to Settings -> Servers in XOA and connect to it manually.

            And then just continue untill you're done. Live migration is pretty reliable now days, so this works pretty good and since we had 10G network its not taking as long as it used to do with 1G network.
            We did this after a major incident on our primary production site where 2 out of 4 hosts in a pool "suddenly" lost their NIC's after updating them. Since then we never updated the pools again. Standalone hosts are fine tho, they never did this.

            Luckily we had 2 other pools where we could migrate the VM's to, but we couldn't realy trust the updating after that.

            DustyArmstrongD 1 Reply Last reply Reply Quote 1
            • DustyArmstrongD Offline
              DustyArmstrong @nikade
              last edited by DustyArmstrong

              @nikade sorry to drag this up but, is there a particular process or methodology to avoid this in the first place? Just had it happen on two brand new hosts, I had to re-install XCP from scratch. Bit worried if I reboot one of them now for any reason this will happen again. It happened on both the pool master and the slave, network completely wiped out on both.

              Genuinely one of the most bizarre series of events I've ever experienced with server infrastructure, I could not understand what was going on until I found this thread.

              P nikadeN 2 Replies Last reply Reply Quote 0
              • P Offline
                Pilow @DustyArmstrong
                last edited by Pilow

                @DustyArmstrong on your slave host, do a

                # cat /etc/xensource/pool.conf
                slave:xxx.xxx.xxx.xxx
                

                you should see IP address of the master. If not, correct it.
                the master must be pingable and accessible from management of the slaves in order for the slaves to have correct network propagation.

                you can try the command on MASTER host, you should see

                master
                

                if you corrected the file on slave host, reboot it, it should come back normally

                DanpD 1 Reply Last reply Reply Quote 0
                • DanpD Offline
                  Danp Pro Support Team @Pilow
                  last edited by

                  @Pilow said in Network Management lost, No Nic display Consol:

                  you can try the command on MASTER host, you should see

                  master:ip_of_the_master

                  Slight correction. On the master, you should only have

                  master
                  

                  without the colon and IP address.

                  P 1 Reply Last reply Reply Quote 1
                  • P Offline
                    Pilow @Danp
                    last edited by

                    @Danp ha thanks for the correction, I was so certain to have seen it, that I didn't check on my master

                    1 Reply Last reply Reply Quote 0
                    • nikadeN Offline
                      nikade Top contributor @DustyArmstrong
                      last edited by

                      @DustyArmstrong said in Network Management lost, No Nic display Consol:

                      @nikade sorry to drag this up but, is there a particular process or methodology to avoid this in the first place? Just had it happen on two brand new hosts, I had to re-install XCP from scratch. Bit worried if I reboot one of them now for any reason this will happen again. It happened on both the pool master and the slave, network completely wiped out on both.

                      Genuinely one of the most bizarre series of events I've ever experienced with server infrastructure, I could not understand what was going on until I found this thread.

                      What exactly happend? Could you try and explain in 1-2-3 steps?

                      DustyArmstrongD 1 Reply Last reply Reply Quote 0
                      • DustyArmstrongD Offline
                        DustyArmstrong @nikade
                        last edited by

                        @nikade Sure.

                        I have 2 mini machines running as XCP hosts, decided to upgrade them to newer hardware as they were struggling. I have:

                        XC1 - Brand new pool master on pool XC1
                        XC2 - Brand new pool slave on pool XC1

                        XCP1 - Old pool master on pool XCP1
                        XCP2 - Old pool master on pool XCP2

                        Installed XCP on both the new hosts, imported to Xen Orchestra, designated the first as pool master, added the second to the pool. Performed updates, did some nominal checks and called them good. Wasn't ready to migrate my VMs yet so powered both off.

                        Came yesterday, I unplugged my old hosts and set them aside, plugged in my new hosts (this is onto my UPS supply) power and network. Plugged in my old hosts to my temporary working area to migrate VMs. Powered everything on (old then new). XO stayed online the whole time, though DNS is on a VM that went down.

                        Only a single host came up (XCP2 - slave of XCP1). XCP1 had to be rebooted 3 or 4 times before it finally came alive but at a big ping delay (4ms, usually <1ms). XC1 and XC2 never came alive, their NICs had physical activity but got nothing. Eventually plugged in a monitor (awkward so didn't do it immediately), rebooted them both and saw no network configuration on either, just empty. Found this thread and decided, as they were basically fresh, I would just re-install and start over. Eventually got everything back.

                        I am assuming based on this thread that XC2 may have come up before XC1 and thus couldn't connect so it obliterated itself. Don't know why XC1 also did the same.

                        During testing I powered both XC1 and XC2 on and off multiple times without this happening, it was only when I powered everything on after moving them that it occurred. Thought I was going nuts or had caused a major network loop somehow.

                        1 Reply Last reply Reply Quote 0
                        • First post
                          Last post