Hi we had an issue over the weekend, 2 of the hosts in a 5 host pool rebooted due to HA activating and I'm trying to work out what caused it.
We're running XCP-NG 8.1 and heartbeat / shared storage is NFS on FreeNAS
On the master I see this
Sep 27 01:27:44 compute-013 xapi: [debug||33 ha_monitor||xapi_ha] Liveset: online c179beef-47fe-44b1-98a1-cbc6fc5dca2d [ L A ]; c1a44772-22fd-428e-ba96-958467d836a1 [*LM A ]; e6dcb3fb-484b-494a-a232-fbb5bd1ffc5a [ L A ]; e30cdba9-f3e8-497a-8996-d794c7c19504 [ L A ]; 2f9779ca-6fd8-48a1-892d-12d0adc1ce86 [ L A ];
Sep 27 01:27:44 compute-013 xapi: [debug||33 ha_monitor||xapi_ha] Processing warnings
Sep 27 01:27:44 compute-013 xapi: [debug||33 ha_monitor||xapi_ha] Done with warnings
Sep 27 01:27:44 compute-013 xapi: [debug||33 ha_monitor||xapi_ha_vm_failover] restart_auto_run_vms called
Sep 27 01:27:44 compute-013 xapi: [debug||33 ha_monitor||xapi_ha_vm_failover] Setting all VMs running or paused to Halted
Sep 27 01:27:44 compute-013 xapi: [debug||33 ha_monitor||xapi_ha_vm_failover] Protected VMs: (huge list of VMs)
and on one of the hosts that restarted I found these weird log lines
Sep 27 01:28:43 compute-014 xapi: [debug||6 ha_monitor|HA monitor D:55db2d949753|xapi_ha] Liveset: online c179beef-47fe-44b1-98a1-cbc6fc5dca2d [*L A ]; c1a44772-22fd-428e-ba96-958467d836a1 [ LM A ]; e6dcb3fb-484b-494a-a232-fbb5bd1ffc5a [ L A ]; e30cdba9-f3e8-497a-8996-d794c7c19504 [ L A ]; 2f9779ca-6fd8-48a1-892d-12d0adc1ce86 [ L A ];
Sep 27 01:28:43 compute-014 xapi: [debug||6 ha_monitor|HA monitor D:55db2d949753|xapi_ha] Processing warnings
Sep 27 01:28:43 compute-014 xapi: [debug||6 ha_monitor|HA monitor D:55db2d949753|xapi_ha] Done with warnings
Sep 27 01:28:43 compute-014 xapi: [debug||6 ha_monitor|HA monitor D:55db2d949753|xapi_ha] The node we think is the master is still alive and marked as master; this is OK
Sep 27 01:28:44 compute-014 xapi: [ info||40018 UNIX /var/lib/xcp/xapi||cli] xe host-ha-xapi-healthcheck username=root password=(omitted)
Sep 27 01:28:44 compute-014 xapi: [debug||40018 UNIX /var/lib/xcp/xapi|session.slave_local_login_with_password D:7f8d3b61600e|xapi] Add session to local storage
Sep 27 01:28:46 compute-014 xcp-rrdd: [ info||7 ||rrdd_main] memfree has changed to 81024 in domain 21
Sep 27 01:28:46 compute-014 xenopsd-xc: [debug||4 ||xenops_server] Received an event on managed VM ede7422e-e34f-d547-e27d-063aa20395b7
Sep 27 01:28:46 compute-014 xenopsd-xc: [debug||4 |queue|xenops_server] Queue.push ["VM_check_state","ede7422e-e34f-d547-e27d-063aa20395b7"] onto ede7422e-e34f-d547-e27d-063aa20395b7:[ ]
Sep 27 01:28:46 compute-014 xenopsd-xc: [debug||16 ||xenops_server] Queue.pop returned ["VM_check_state","ede7422e-e34f-d547-e27d-063aa20395b7"]
Sep 27 01:28:46 compute-014 xenopsd-xc: [debug||16 |events|xenops_server] Task 110715 reference events: ["VM_check_state","ede7422e-e34f-d547-e27d-063aa20395b7"]
Sep 27 01:28:46 compute-014 xenopsd-xc: [debug||16 |events|xenops_server] VM ede7422e-e34f-d547-e27d-063aa20395b7 is not requesting any attention
Sep 27 01:28:46 compute-014 xenopsd-xc: [debug||16 |events|xenops_server] VM_DB.signal ede7422e-e34f-d547-e27d-063aa20395b7
Sep 27 01:28:46 compute-014 xenopsd-xc: [debug||16 |events|task_server] Task 110715 completed; duration = 0
Sep 27 01:28:46 compute-014 xenopsd-xc: [debug||16 ||xenops_server] TASK.signal 110715 (object deleted)
Sep 27 01:28:46 compute-014 xapi: [debug||245 |org.xen.xapi.xenops.classic events D:66ada224147d|xenops] Processing event: ["Vm","ede7422e-e34f-d547-e27d-063aa20395b7"]
Sep 27 01:28:46 compute-014 xapi: [debug||245 |org.xen.xapi.xenops.classic events D:66ada224147d|xenops] xenops event on VM ede7422e-e34f-d547-e27d-063aa20395b7
Sep 27 01:28:46 compute-014 xenopsd-xc: [debug||196565 |org.xen.xapi.xenops.classic events D:66ada224147d|xenops_server] VM.stat ede7422e-e34f-d547-e27d-063aa20395b7
Sep 27 01:28:46 compute-014 xapi: [debug||245 |org.xen.xapi.xenops.classic events D:66ada224147d|xenops] xenopsd event: processing event for VM ede7422e-e34f-d547-e27d-063aa20395b7
Sep 27 01:28:46 compute-014 xapi: [debug||245 |org.xen.xapi.xenops.classic events D:66ada224147d|xenops] Supressing VM.allowed_operations update because guest_agent data is largely the same
Sep 27 01:28:46 compute-014 xapi: [debug||245 |org.xen.xapi.xenops.classic events D:66ada224147d|xenops] xenopsd event: Updating VM ede7422e-e34f-d547-e27d-063aa20395b7 domid 21 guest_agent
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
Sep 27 01:35:45 compute-014 xcp-rrdd: [debug||0 ||rrdd_main] Reading configuration file ..
The switch they're all in doesn't have anything in the logs so I don't think any ports flapped. Anywhere else I can check to see what happened on XCP?
Thanks