@olivierlambert Thanks a lot.
We have not SPOF and full fiber 100Gb network spine/leaf infrastructure so I will give it a go (currently we are only on a test plateform so I do as much as I need )
Best posts made by dsmteam
-
RE: HA failover reaction time question
Latest posts made by dsmteam
-
RE: HA failover reaction time question
@olivierlambert Unfortunately, the parameters are reverted back to their default value when I turn on HA. Might be hard coded somewhere.
-
RE: HA failover reaction time question
@olivierlambert I think I found what I need in the following documentation
https://xapi-project.github.io/features/HA/HA.html
Various parameters which must be the same of every hosts in /etc/xensource/xhad.conf<parameters> <HeartbeatInterval>4</HeartbeatInterval> <HeartbeatTimeout>30</HeartbeatTimeout> <StateFileInterval>4</StateFileInterval> <StateFileTimeout>30</StateFileTimeout> <HeartbeatWatchdogTimeout>30</HeartbeatWatchdogTimeout> <StateFileWatchdogTimeout>45</StateFileWatchdogTimeout> <BootJoinTimeout>90</BootJoinTimeout> <EnableJoinTimeout>90</EnableJoinTimeout> <XapiHealthCheckInterval>60</XapiHealthCheckInterval> <XapiHealthCheckTimeout>10</XapiHealthCheckTimeout> <XapiRestartAttempts>1</XapiRestartAttempts> <XapiRestartTimeout>30</XapiRestartTimeout> <XapiLicenseCheckTimeout>30</XapiLicenseCheckTimeout> </parameters>
-
RE: HA failover reaction time question
@Danp Oh..................
Indeed, much faster now. Down from 2:00 minutes to 1:20 minutes
Less than 10 seconds might be too aggressive.
This is closer to what we expect.
I can see in the GUI that when I bring a host down, the pool still takes a minute to consider the host down. Any way to decrease this timer further or there are too many dependencies ? -
RE: HA failover reaction time question
@olivierlambert Just tried but there is no change in reaction time.
After googling this parameter I found this page you wrote (small world) on xcp-ng.org website https://xcp-ng.org/blog/2024/08/22/xcp-ng-high-availability-a-guide/ which indicates that this timeout purpose is for self fencing in case of loss of network/storage (I actually had this page opened already in my browser but missed this line)
Doesn't seem to influence restart timer in case of full host failure. -
RE: HA failover reaction time question
@olivierlambert Thanks a lot.
We have not SPOF and full fiber 100Gb network spine/leaf infrastructure so I will give it a go (currently we are only on a test plateform so I do as much as I need ) -
RE: HA failover reaction time question
@Danp Hello Danp,
no just the standard DRS and High availabilty configuration, no overkill FT
In case of host failure, VM would restart with 10 seconds (at worse) -
HA failover reaction time question
Hello everyone,
we are testing XCP-NG and are quite satisfied with the ease of use and functionnality (still using ESX with around 100 hosts)
However one caveat we saw (same issue with Proxmox) is that the failover reaction time is quite long compared to ESX.
Under ESX, VM that are hosted on a failed host are restarted on a different host within seconds.
With XCP-NG it takes about 2 minutes for the VM to be restarted on a different host (HA cluster of 3 hosts which had ESX installed before so the physical environnement is identical)
Are those delays normal ? I suppose they are according to various video we saw online showing this kind of reaction time.
If they are, is there some way to reduce them ?
Couldn't find any information nor settings in Orchestra or in the hosts themselves