XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Working out why HA activated

    Scheduled Pinned Locked Moved Compute
    5 Posts 2 Posters 929 Views 1 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • B Offline
      bpbp-boop
      last edited by

      Hi we had an issue over the weekend, 2 of the hosts in a 5 host pool rebooted due to HA activating and I'm trying to work out what caused it.

      We're running XCP-NG 8.1 and heartbeat / shared storage is NFS on FreeNAS

      On the master I see this

      Sep 27 01:27:44 compute-013 xapi: [debug||33 ha_monitor||xapi_ha] Liveset: online c179beef-47fe-44b1-98a1-cbc6fc5dca2d [ L  A ]; c1a44772-22fd-428e-ba96-958467d836a1 [*LM A ]; e6dcb3fb-484b-494a-a232-fbb5bd1ffc5a [ L  A ]; e30cdba9-f3e8-497a-8996-d794c7c19504 [ L  A ]; 2f9779ca-6fd8-48a1-892d-12d0adc1ce86 [ L  A ];
      Sep 27 01:27:44 compute-013 xapi: [debug||33 ha_monitor||xapi_ha] Processing warnings
      Sep 27 01:27:44 compute-013 xapi: [debug||33 ha_monitor||xapi_ha] Done with warnings
      Sep 27 01:27:44 compute-013 xapi: [debug||33 ha_monitor||xapi_ha_vm_failover] restart_auto_run_vms called
      Sep 27 01:27:44 compute-013 xapi: [debug||33 ha_monitor||xapi_ha_vm_failover] Setting all VMs running or paused to Halted
      Sep 27 01:27:44 compute-013 xapi: [debug||33 ha_monitor||xapi_ha_vm_failover] Protected VMs: (huge list of VMs)
      

      and on one of the hosts that restarted I found these weird log lines

      Sep 27 01:28:43 compute-014 xapi: [debug||6 ha_monitor|HA monitor D:55db2d949753|xapi_ha] Liveset: online c179beef-47fe-44b1-98a1-cbc6fc5dca2d [*L  A ]; c1a44772-22fd-428e-ba96-958467d836a1 [ LM A ]; e6dcb3fb-484b-494a-a232-fbb5bd1ffc5a [ L  A ]; e30cdba9-f3e8-497a-8996-d794c7c19504 [ L  A ]; 2f9779ca-6fd8-48a1-892d-12d0adc1ce86 [ L  A ];
      Sep 27 01:28:43 compute-014 xapi: [debug||6 ha_monitor|HA monitor D:55db2d949753|xapi_ha] Processing warnings
      Sep 27 01:28:43 compute-014 xapi: [debug||6 ha_monitor|HA monitor D:55db2d949753|xapi_ha] Done with warnings
      Sep 27 01:28:43 compute-014 xapi: [debug||6 ha_monitor|HA monitor D:55db2d949753|xapi_ha] The node we think is the master is still alive and marked as master; this is OK
      Sep 27 01:28:44 compute-014 xapi: [ info||40018 UNIX /var/lib/xcp/xapi||cli] xe host-ha-xapi-healthcheck username=root password=(omitted)
      Sep 27 01:28:44 compute-014 xapi: [debug||40018 UNIX /var/lib/xcp/xapi|session.slave_local_login_with_password D:7f8d3b61600e|xapi] Add session to local storage
      Sep 27 01:28:46 compute-014 xcp-rrdd: [ info||7 ||rrdd_main] memfree has changed to 81024 in domain 21
      Sep 27 01:28:46 compute-014 xenopsd-xc: [debug||4 ||xenops_server] Received an event on managed VM ede7422e-e34f-d547-e27d-063aa20395b7
      Sep 27 01:28:46 compute-014 xenopsd-xc: [debug||4 |queue|xenops_server] Queue.push ["VM_check_state","ede7422e-e34f-d547-e27d-063aa20395b7"] onto ede7422e-e34f-d547-e27d-063aa20395b7:[  ]
      Sep 27 01:28:46 compute-014 xenopsd-xc: [debug||16 ||xenops_server] Queue.pop returned ["VM_check_state","ede7422e-e34f-d547-e27d-063aa20395b7"]
      Sep 27 01:28:46 compute-014 xenopsd-xc: [debug||16 |events|xenops_server] Task 110715 reference events: ["VM_check_state","ede7422e-e34f-d547-e27d-063aa20395b7"]
      Sep 27 01:28:46 compute-014 xenopsd-xc: [debug||16 |events|xenops_server] VM ede7422e-e34f-d547-e27d-063aa20395b7 is not requesting any attention
      Sep 27 01:28:46 compute-014 xenopsd-xc: [debug||16 |events|xenops_server] VM_DB.signal ede7422e-e34f-d547-e27d-063aa20395b7
      Sep 27 01:28:46 compute-014 xenopsd-xc: [debug||16 |events|task_server] Task 110715 completed; duration = 0
      Sep 27 01:28:46 compute-014 xenopsd-xc: [debug||16 ||xenops_server] TASK.signal 110715 (object deleted)
      Sep 27 01:28:46 compute-014 xapi: [debug||245 |org.xen.xapi.xenops.classic events D:66ada224147d|xenops] Processing event: ["Vm","ede7422e-e34f-d547-e27d-063aa20395b7"]
      Sep 27 01:28:46 compute-014 xapi: [debug||245 |org.xen.xapi.xenops.classic events D:66ada224147d|xenops] xenops event on VM ede7422e-e34f-d547-e27d-063aa20395b7
      Sep 27 01:28:46 compute-014 xenopsd-xc: [debug||196565 |org.xen.xapi.xenops.classic events D:66ada224147d|xenops_server] VM.stat ede7422e-e34f-d547-e27d-063aa20395b7
      Sep 27 01:28:46 compute-014 xapi: [debug||245 |org.xen.xapi.xenops.classic events D:66ada224147d|xenops] xenopsd event: processing event for VM ede7422e-e34f-d547-e27d-063aa20395b7
      Sep 27 01:28:46 compute-014 xapi: [debug||245 |org.xen.xapi.xenops.classic events D:66ada224147d|xenops] Supressing VM.allowed_operations update because guest_agent data is largely the same
      Sep 27 01:28:46 compute-014 xapi: [debug||245 |org.xen.xapi.xenops.classic events D:66ada224147d|xenops] xenopsd event: Updating VM ede7422e-e34f-d547-e27d-063aa20395b7 domid 21 guest_agent
      ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
      ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
      ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
      ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
      ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
      ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
      Sep 27 01:35:45 compute-014 xcp-rrdd: [debug||0 ||rrdd_main] Reading configuration file ..
      

      The switch they're all in doesn't have anything in the logs so I don't think any ports flapped. Anywhere else I can check to see what happened on XCP?

      Thanks

      1 Reply Last reply Reply Quote 0
      • B Offline
        bpbp-boop
        last edited by

        Oh it looks like FreeNAS did a pool scrub around that time. I wonder if I need to adjust HA so it doesn't panic when this happens

        1 Reply Last reply Reply Quote 0
        • olivierlambertO Offline
          olivierlambert Vates 🪐 Co-Founder CEO
          last edited by olivierlambert

          You can set a larger HA timeout on the heartbeat if I remember correctly.

          Also, HA might use a dedicated file, maybe /var/log/xha.log, so interesting info can be there.

          1 Reply Last reply Reply Quote 0
          • B Offline
            bpbp-boop
            last edited by

            Looks like there was some kind of fight 😂

            Sep 27 01:28:39 AEST 2020 [warn] Host (0) and the local host do not agre on the view to the pool membership.
            
            Sep 27 01:28:39 AEST 2020 [debug] FH: All hosts now have consistent view to the pool membership.
            **Sep 27 01:28:45 AEST 2020 [debug] FH: I have won.**
            Sep 27 01:28:45 AEST 2020 [debug] SM: Node (0) will be removed from liveset.
            

            Any idea where to change the HA timeouts please?

            1 Reply Last reply Reply Quote 0
            • olivierlambertO Offline
              olivierlambert Vates 🪐 Co-Founder CEO
              last edited by

              Eg: xe pool-ha-enable heartbeat-sr-uuids=<SR-UUID> ha-config:timeout=90

              1 Reply Last reply Reply Quote 1

              Hello! It looks like you're interested in this conversation, but you don't have an account yet.

              Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.

              With your input, this post could be even better 💗

              Register Login
              • First post
                Last post