Backups failed after changing masters?
-
Had a hardware issue, so I changed master to another server in the pool, evacuated all the VMs and detached the host and removed it from the pool.
Backup jobs in XO started failing with an error about not being able to connect to the detached host: Error: connect EHOSTUNREACH 172.23.6.101:443.
Of course it's unreachable. The master was changed and the original server was removed from the pool and taken off line.
So, what did I miss? I know it's not a lot to go on, but this is all I'm seeing. Not sure where else to look.
-
Check on which IP XO is connecting to the pool.
-
I inadvertently fixed it, but didn't take any notes.
Changed master using XO. XO still sees all 3 hosts.
Detach old master, power down, fixed hardware issue, re-install XCP.
When I tried to add the host back into the pool, I found that it was still listed on the servers page, but saw no way to re-connect it, so I deleted it.
Now the entire pool is missing?
*** Now I know that servers and hosts are two completely different things. Odd nomenclature, but I can deal with it.
Re-added the new master server, now the pool is back. Added the old master back and joined it to the pool, and all is good.
I'm sure I missed something when changing masters and removing the old one, but not sure what. I would think that changing masters in XO would change all the things that needed to be changed. Guess not.
-
Well crud, just read the thread from a few days ago about "backups failing if one host is down."
Reading your responses to that, I now understand what happened. Even after changing masters, XO was still connecting to the old master, which re-directed it to the new master.
When the old master went offline, XO continued to function, but when the backup job was launched, it tried to connect to the IP of the old master.
I like that XO will redirect itself when connecting to a host rather than the master, but in the event that XO changes the master, shouldn't it be able to update it's own configuration?
-
No it doesn't modify its own "server" record, it will just be redirected by a slave. This redirect info doesn't persist, every time it's reconnecting, there's no recollection on the pool status/members.
Note this will be "fixed" in our next release; this time, on first valid connection to the pool, XO will store all host IPs of the pool, so it can cycle on recorded IPs in case the main IP doesn't answer at all.
-