XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    HA failover reaction time question

    Scheduled Pinned Locked Moved Compute
    14 Posts 3 Posters 481 Views 2 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • D Offline
      dsmteam @olivierlambert
      last edited by

      @olivierlambert Thanks a lot.
      We have not SPOF and full fiber 100Gb network spine/leaf infrastructure so I will give it a go (currently we are only on a test plateform so I do as much as I need 🙂 )

      1 Reply Last reply Reply Quote 1
      • olivierlambertO Offline
        olivierlambert Vates 🪐 Co-Founder CEO
        last edited by

        Great, keep us posted!

        D 1 Reply Last reply Reply Quote 0
        • D Offline
          dsmteam @olivierlambert
          last edited by dsmteam

          @olivierlambert Just tried but there is no change in reaction time.
          After googling this parameter I found this page you wrote (small world) on xcp-ng.org website https://xcp-ng.org/blog/2024/08/22/xcp-ng-high-availability-a-guide/ which indicates that this timeout purpose is for self fencing in case of loss of network/storage (I actually had this page opened already in my browser but missed this line)
          Doesn't seem to influence restart timer in case of full host failure.

          DanpD 1 Reply Last reply Reply Quote 0
          • DanpD Online
            Danp Pro Support Team @dsmteam
            last edited by

            @dsmteam Did you try disabling and then enabling HA again to be sure that the new setting was being used?

            D 1 Reply Last reply Reply Quote 1
            • D Offline
              dsmteam @Danp
              last edited by dsmteam

              @Danp Oh..................
              Indeed, much faster now. Down from 2:00 minutes to 1:20 minutes
              Less than 10 seconds might be too aggressive.
              This is closer to what we expect.
              I can see in the GUI that when I bring a host down, the pool still takes a minute to consider the host down. Any way to decrease this timer further or there are too many dependencies ?

              1 Reply Last reply Reply Quote 0
              • olivierlambertO Offline
                olivierlambert Vates 🪐 Co-Founder CEO
                last edited by

                That's a good progress 😄 For the other number, let me ask around 🙂

                D 1 Reply Last reply Reply Quote 0
                • D Offline
                  dsmteam @olivierlambert
                  last edited by olivierlambert

                  @olivierlambert I think I found what I need in the following documentation
                  https://xapi-project.github.io/features/HA/HA.html
                  Various parameters which must be the same of every hosts in /etc/xensource/xhad.conf

                  <parameters>
                        <HeartbeatInterval>4</HeartbeatInterval>
                        <HeartbeatTimeout>30</HeartbeatTimeout>
                        <StateFileInterval>4</StateFileInterval>
                        <StateFileTimeout>30</StateFileTimeout>
                        <HeartbeatWatchdogTimeout>30</HeartbeatWatchdogTimeout>
                        <StateFileWatchdogTimeout>45</StateFileWatchdogTimeout>
                        <BootJoinTimeout>90</BootJoinTimeout>
                        <EnableJoinTimeout>90</EnableJoinTimeout>
                        <XapiHealthCheckInterval>60</XapiHealthCheckInterval>
                        <XapiHealthCheckTimeout>10</XapiHealthCheckTimeout>
                        <XapiRestartAttempts>1</XapiRestartAttempts>
                        <XapiRestartTimeout>30</XapiRestartTimeout>
                        <XapiLicenseCheckTimeout>30</XapiLicenseCheckTimeout>
                      </parameters>
                  
                  1 Reply Last reply Reply Quote 0
                  • olivierlambertO Offline
                    olivierlambert Vates 🪐 Co-Founder CEO
                    last edited by

                    Explanations here: https://github.com/xapi-project/xen-api/pull/4169

                    No idea about how to tinker it. But happy to hear your experiments 🙂

                    lindig opened this pull request in xapi-project/xen-api

                    closed Improve HA parameter derived from timeout #4169

                    D 1 Reply Last reply Reply Quote 0
                    • D Offline
                      dsmteam @olivierlambert
                      last edited by

                      @olivierlambert Unfortunately, the parameters are reverted back to their default value when I turn on HA. Might be hard coded somewhere.

                      D 1 Reply Last reply Reply Quote 0
                      • D Offline
                        dsmteam @dsmteam
                        last edited by

                        @dsmteam Still trying to browse the web and various xo forum but it looks like those parameters are in the .c and other precompile file so the build in xcp-ng are probably using those default parameters.

                        1 Reply Last reply Reply Quote 0
                        • First post
                          Last post