HA failover reaction time question
-
@olivierlambert Thanks a lot.
We have not SPOF and full fiber 100Gb network spine/leaf infrastructure so I will give it a go (currently we are only on a test plateform so I do as much as I need ) -
Great, keep us posted!
-
@olivierlambert Just tried but there is no change in reaction time.
After googling this parameter I found this page you wrote (small world) on xcp-ng.org website https://xcp-ng.org/blog/2024/08/22/xcp-ng-high-availability-a-guide/ which indicates that this timeout purpose is for self fencing in case of loss of network/storage (I actually had this page opened already in my browser but missed this line)
Doesn't seem to influence restart timer in case of full host failure. -
@dsmteam Did you try disabling and then enabling HA again to be sure that the new setting was being used?
-
@Danp Oh..................
Indeed, much faster now. Down from 2:00 minutes to 1:20 minutes
Less than 10 seconds might be too aggressive.
This is closer to what we expect.
I can see in the GUI that when I bring a host down, the pool still takes a minute to consider the host down. Any way to decrease this timer further or there are too many dependencies ? -
That's a good progress For the other number, let me ask around
-
@olivierlambert I think I found what I need in the following documentation
https://xapi-project.github.io/features/HA/HA.html
Various parameters which must be the same of every hosts in /etc/xensource/xhad.conf<parameters> <HeartbeatInterval>4</HeartbeatInterval> <HeartbeatTimeout>30</HeartbeatTimeout> <StateFileInterval>4</StateFileInterval> <StateFileTimeout>30</StateFileTimeout> <HeartbeatWatchdogTimeout>30</HeartbeatWatchdogTimeout> <StateFileWatchdogTimeout>45</StateFileWatchdogTimeout> <BootJoinTimeout>90</BootJoinTimeout> <EnableJoinTimeout>90</EnableJoinTimeout> <XapiHealthCheckInterval>60</XapiHealthCheckInterval> <XapiHealthCheckTimeout>10</XapiHealthCheckTimeout> <XapiRestartAttempts>1</XapiRestartAttempts> <XapiRestartTimeout>30</XapiRestartTimeout> <XapiLicenseCheckTimeout>30</XapiLicenseCheckTimeout> </parameters>
-
Explanations here: https://github.com/xapi-project/xen-api/pull/4169
No idea about how to tinker it. But happy to hear your experiments
-
@olivierlambert Unfortunately, the parameters are reverted back to their default value when I turn on HA. Might be hard coded somewhere.
-
@dsmteam Still trying to browse the web and various xo forum but it looks like those parameters are in the .c and other precompile file so the build in xcp-ng are probably using those default parameters.