Network Management lost, No Nic display Consol
-
This is a classic issue with XAPI, once you have hosts in a pool and the slave cannot reach the master it will go crazy. Never seen this issue with standalone hosts tho.
We usually had this issue when upgrading xenserver, so we simply stopped doing that and then never had any issues. We went to "new" versions by simply standing up a new pool and migrate all the vm's over to it

-
@nikade Can you elaborate on that more for us? Do you create a new pool with existing host? if so how? or do you pull a host make new pool from that host and migrate the rest over?
-
@acebmxer It's not pretty, but its failsafe. The proceedure looked like this in our case:
- Disable HA in the "old pool"
- Put a host in the "old pool" into maintenance mode
- Reinstall that host and connect it to XOA and then patch it
- Create a "new pool" from that host
- Create a new LUN or NFS share in the SAN for "new pool" and attach it to "new pool"
- Live migrate VM's over from "old pool" to "new pool"
- Once you've freed up another host you repeate step 2 and 3 and then join that host to "new pool". It is important that you patch it before joining it to the pool, that is done by going to Settings -> Servers in XOA and connect to it manually.
And then just continue untill you're done. Live migration is pretty reliable now days, so this works pretty good and since we had 10G network its not taking as long as it used to do with 1G network.
We did this after a major incident on our primary production site where 2 out of 4 hosts in a pool "suddenly" lost their NIC's after updating them. Since then we never updated the pools again. Standalone hosts are fine tho, they never did this.Luckily we had 2 other pools where we could migrate the VM's to, but we couldn't realy trust the updating after that.
-
@nikade sorry to drag this up but, is there a particular process or methodology to avoid this in the first place? Just had it happen on two brand new hosts, I had to re-install XCP from scratch. Bit worried if I reboot one of them now for any reason this will happen again. It happened on both the pool master and the slave, network completely wiped out on both.
Genuinely one of the most bizarre series of events I've ever experienced with server infrastructure, I could not understand what was going on until I found this thread.
-
@DustyArmstrong on your slave host, do a
# cat /etc/xensource/pool.conf slave:xxx.xxx.xxx.xxxyou should see IP address of the master. If not, correct it.
the master must be pingable and accessible from management of the slaves in order for the slaves to have correct network propagation.you can try the command on MASTER host, you should see
masterif you corrected the file on slave host, reboot it, it should come back normally
-
@Pilow said in Network Management lost, No Nic display Consol:
you can try the command on MASTER host, you should see
master:ip_of_the_master
Slight correction. On the master, you should only have
masterwithout the colon and IP address.
-
@Danp ha thanks for the correction, I was so certain to have seen it, that I didn't check on my master
-
@DustyArmstrong said in Network Management lost, No Nic display Consol:
@nikade sorry to drag this up but, is there a particular process or methodology to avoid this in the first place? Just had it happen on two brand new hosts, I had to re-install XCP from scratch. Bit worried if I reboot one of them now for any reason this will happen again. It happened on both the pool master and the slave, network completely wiped out on both.
Genuinely one of the most bizarre series of events I've ever experienced with server infrastructure, I could not understand what was going on until I found this thread.
What exactly happend? Could you try and explain in 1-2-3 steps?
-
@nikade Sure.
I have 2 mini machines running as XCP hosts, decided to upgrade them to newer hardware as they were struggling. I have:
XC1 - Brand new pool master on pool XC1
XC2 - Brand new pool slave on pool XC1XCP1 - Old pool master on pool XCP1
XCP2 - Old pool master on pool XCP2Installed XCP on both the new hosts, imported to Xen Orchestra, designated the first as pool master, added the second to the pool. Performed updates, did some nominal checks and called them good. Wasn't ready to migrate my VMs yet so powered both off.
Came yesterday, I unplugged my old hosts and set them aside, plugged in my new hosts (this is onto my UPS supply) power and network. Plugged in my old hosts to my temporary working area to migrate VMs. Powered everything on (old then new). XO stayed online the whole time, though DNS is on a VM that went down.
Only a single host came up (XCP2 - slave of XCP1). XCP1 had to be rebooted 3 or 4 times before it finally came alive but at a big ping delay (4ms, usually <1ms). XC1 and XC2 never came alive, their NICs had physical activity but got nothing. Eventually plugged in a monitor (awkward so didn't do it immediately), rebooted them both and saw no network configuration on either, just empty. Found this thread and decided, as they were basically fresh, I would just re-install and start over. Eventually got everything back.
I am assuming based on this thread that XC2 may have come up before XC1 and thus couldn't connect so it obliterated itself. Don't know why XC1 also did the same.
During testing I powered both XC1 and XC2 on and off multiple times without this happening, it was only when I powered everything on after moving them that it occurred. Thought I was going nuts or had caused a major network loop somehow.
-
@DustyArmstrong thats super-strange, i actually have the same setup at home, 2 hp z240 machines running xcp-ng in a small pool.
xcp1 is always up and running, xcp2 is powered down when I dont need it, everything important is running on xcp1, maybe that's the reason I don't run into these issues.