Let's Test the HA
-
@Midget said in Let's Test the HA:
So I pulled one of the sleds. One of the servers I mean from the chassis. I have 3 hosts in the cluster and one stand alone.
Standalone
- XCP-HOST1
Cluster - XCP-HOST2 (Master)
- XCP-HOST3
- XCP-HOST4
Each host has a Debian VM on it. I pulled the sled for Host 4. And it was from what I can tell, a success. The Debian VM that was on Host 4 moved to host 3 on its own. And I noticed the XOSTOR dropped down to 10TB roughly. So it noticed the drives gone.
After checking everything I then slotted the server back in place, and it rejoined the pool. I even migrated the VM back to it's home server after it was part of the pool again.
When I stated power failure it was a reference to a test for a small scale style action to simulate what would happen if the data centre (or a part of) were to lose power.
- XCP-HOST1
-
When I stated power failure it was a reference to a test for a small scale style action to simulate what would happen if the data centre were to lose power.
I was already in progress of pulling a sled when you posted. BUT, the chassis only has 2 power supplies. Each individual server does not. So that wouldn't work. I mean, I guess I could power a host down individually. I'll add that to the tests as well.
-
@Midget said in Let's Test the HA:
When I stated power failure it was a reference to a test for a small scale style action to simulate what would happen if the data centre were to lose power.
I was already in progress of pulling a sled when you posted. BUT, the chassis only has 2 power supplies. Each individual server does not. So that wouldn't work. I mean, I guess I could power a host down individually. I'll add that to the tests as well.
It's a reference in other words to a power black out, then recovery from a blackout of data centre (or part of one).
-
@john-c Oh you mean literally pull the power on the entire lab? I guess I could do that. Although our DC has dual 16kVA UPS', dual 600 amp DC plants, and dual generators. So it would take a lot for that building to go dark. But it's a valid test.
-
@Midget said in Let's Test the HA:
@john-c Oh you mean literally pull the power on the entire lab? I guess I could do that. Although our DC has dual 16kVA UPS', dual 600 amp DC plants, and dual generators. So it would take a lot for that building to go dark. But it's a valid test.
Also depending on results there's in the latest XOA an API interface for emergency pool shutdown and resume on power failure.
-
I let the environment calm down. And let things get back to normal. Gave it a few minutes and pulled out the Master. Which was XCP-HOST2.
It's been about 5 minutes, just checked XOA, and the cluster is gone. None of the VM's, nothing. How long should master selection take? I'll give it another 10 or so minutes before slotting the server back in place.
EDIT
I just noticed the XOSTOR no longer exists either... -
@Midget said in Let's Test the HA:
I let the environment calm down. And let things get back to normal. Gave it a few minutes and pulled out the Master. Which was XCP-HOST2.
It's been about 5 minutes, just checked XOA, and the cluster is gone. None of the VM's, nothing. How long should master selection take? I'll give it another 10 or so minutes before slotting the server back in place.
EDIT
I just noticed the XOSTOR no longer exists either...That's why when I setup my XCP-ng system, it was with a bare metal storage server which is maintained. That way VMs can recover and migrate cleanly that's a potential failure of hyper convergence based storage methodologies. Where storage is provided on the same host(s) as the hypervisor and VMs.
As the VMs can not start up if storage isn't available, but the storage is provided by a VM. In other words a chicken and egg situation to avoid.
@olivierlambert @Midget We may have discovered a potential failing of XOSTOR and hyper convergence generally during putting the lab through its paces.
-
We are not there yet, there still some issues in XOSTOR before playing with HA, even if it theory that should work, LINSTOR proved problematic in some situations. So please use it, but not with HA yet.
-
Well, it appears the SSD I was using for the hypervisor died. So now I’m reinstalling XCP onto what was the Master on a “new” SSD. Good thing we have no shortage of hardware in our lab lol.
-
@olivierlambert
I guess I could build a TrueNAS quick. Maybe after my vacation.