@jshiells I was wrong, open-vm-tools is installed on a lot of the systems we migrated. I just assumed it wasn't instead of checking. We'll remove it from all the systems, test further, and report back. Thank you for the insight!
![](/forum/assets/uploads/profile/uid-11776/11776-profileavatar-1719924414194.jpeg)
Best posts made by jgrafton
-
RE: CPU pegged at 100% in several Rocky Linux 8 VMs without workload in guest
-
RE: CPU pegged at 100% in several Rocky Linux 8 VMs without workload in guest
@olivierlambert That was my initial thought, PV driver in the older kernel. No process is using very much CPU in the guest though the total CPU is at 100% (when running top in the VM).
Haven't been able to get Rocky 9 to fail yet, but it can take a day or two.
Latest posts made by jgrafton
-
RE: CPU pegged at 100% in several Rocky Linux 8 VMs without workload in guest
@jshiells I was wrong, open-vm-tools is installed on a lot of the systems we migrated. I just assumed it wasn't instead of checking. We'll remove it from all the systems, test further, and report back. Thank you for the insight!
-
RE: CPU pegged at 100% in several Rocky Linux 8 VMs without workload in guest
@olivierlambert Nothing out of the ordinary in
xl dmesg
that I can tell.@jshiells I'm pretty sure the VMs have had the Vmware tools removed since that's a part of our migration procedure but I'll double check.
Annoyingly, we haven't been able to get a VM to fail all day.
-
RE: CPU pegged at 100% in several Rocky Linux 8 VMs without workload in guest
@olivierlambert That was my initial thought, PV driver in the older kernel. No process is using very much CPU in the guest though the total CPU is at 100% (when running top in the VM).
Haven't been able to get Rocky 9 to fail yet, but it can take a day or two.
-
RE: CPU pegged at 100% in several Rocky Linux 8 VMs without workload in guest
@olivierlambert We have an existing ticket (7726289) and the first suggestion was to validate the template. I created a new VM with the correct Rocky 8 template and attached the existing disk to it. Unfortunately, the problem still occurred a couple of days later.
I want to make clear I have no problem with the support we've received. It's just that this is such an intermittent and difficult to diagnose problem I wanted to see if anyone in the community had run into it.
-
CPU pegged at 100% in several Rocky Linux 8 VMs without workload in guest
We recently encountered this issue during a migration from VMware.
Unfortunately, we've had to halt our migration until we can figure out what is happening to the VMs.
When the problem occurs, we see in the XOA interface (version 5.94.2) the guest VM CPU pegged at 100%.
The spike in CPU often happens after a migration to another host within a pool or to a different pool.
Sometimes the spike in CPU occurs randomly without an accompanying host to host migration.
With the pegged CPU, the guest VM is no longer accessible in any meaningful way.
All the services running in the VM go offline and the VM is no longer pingable.
Each of our pools uses lvmohba storage with several LUNs attached to each host in the pool.
We've seen the CPU spike occur on 5 VMs so far, all running Rocky 8.10 with the latest kernel (4.18.0-553.5.1.el8_10.x86_64).
We tested several older kernel revisions and encountered the same problem. (4.18.0-513.24.1.el8_9.x86_64)
It seems only the primary CPU (CPU0) is pegged at 100%.
On systems with more than a single cpu, we are able to ssh (or console) into the VM but it runs extremely slow. The guest is effectively unusable.
Running top on the VM shows no load from processes but CPU0 is at 100%. There is no appreciable I/O on the system.
Interestingly, on the XCP-ng host, the qemu process running the VM with the pegged CPU does not have a high load itself.
The pegged CPU appears to be contained entirely within the guest.
All of our XCP-ng hosts are running version 8.2.1.
All of the affected VMs are running version 8.2.0-2 of the management agent.
All the affected hosts were migrated from ESXi.
The affected VMs use a mix of UEFI and BIOS.
We've upgraded one of our problematic systems to Rocky 9 that has a 5.14 kernel to see if the newer kernel is affected.
We have roughly 100 VMs split across two pools.
Has anyone experienced a problem similar to this?