Right, so yeah I did just that - disabled HA and have been keeping an eye on the logs as well as general performance in the env. Glad to know my actions align with your recommendations at least 😆
There are a couple of hosts that get a lot of these sorts of messages in the kern.log:
Apr 25 08:15:33 oryx kernel: [276757.645457] vif vif-21-3 vif21.3: Guest Rx ready
Apr 25 08:15:54 oryx kernel: [276778.735509] vif vif-22-3 vif22.3: Guest Rx stalled
Apr 25 08:15:54 oryx kernel: [276778.735522] vif vif-21-3 vif21.3: Guest Rx stalled
Apr 25 08:16:04 oryx kernel: [276788.780828] vif vif-21-3 vif21.3: Guest Rx ready
Apr 25 08:16:04 oryx kernel: [276788.780836] vif vif-22-3 vif22.3: Guest Rx ready
Am I wrong to attribute this to issues within specific VMs (i.e. not a hv performance issue)?
I know one of the VMs that causes these is a very old centos 5 testing VM one of my devs use and the messages stop when it's powered down.
Is there any way to easily associate those vifs with the actual VMs they are attached to? My google-foo failed me for that.
Other than that, I noticed my nic firmware is a bit old on the X710-da2's I use so I'm going through and upgrading those with no noticeable changes. I'm fairly hesitant to re-enable HA without tracking down the root cause.