VM's going really slow after 3 - 4 weeks
-
Evening All,
Would be grateful if anyone can offer guidance on how to fault find this issue. Or solve it :). The google Fu I have used thus far hasn't really help.
All appear to be working great. However over a period of about 3 - 4 weeks the VM's slow down. For example the Windows 10 VM will become noticeably slower.
Rebooting the physical server instantly brings the VM's back to their zippy self.
This is a pretty much an out of the box installation
I haven't found anything obviously wrong.
The only thing I have seen is that some of the CPU's are getting very busy when the issue presents. Whereas when "all" appears OK all CPU's are pretty much around 12% usage.I have just applied the following pool patch's
- Linux firmware 20190314
- microcode_cti 2.1
- xen-dom0-libs 4.13.4
- xen-dom0-tools 4.13.4
- xen-hypervisor 4.13.4
- xen-libs 4.13.4
- xen-tools 4.13.4
Below is the spec of the physical and XCP setup
**Physical Server** CPU model Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz GPUs MGA G200EH Core (socket) 32 (2) Hyper-threading (SMT) Enabled Manufacturer info HP (ProLiant DL380p Gen8) BIOS info HP (P70) **Network** HP Ethernet 1Gb 4-port 366FLR Adapter **Storage** Smart Array P420i Controller Raid 5 4 x ST9900805SS
**XCP-ng** Build Date 2022-02-11 Version: 8.2 DBV: 0.0.1 17.4 GB RAM available (48.0 GB total) Local storage Ext No 36% (893.9 GB used) 2.4 TB 2.9 TB **Networking** Dedicate NIC 0 for management LACP for all VM's using NIC 2 & 3 **VM's** CheckMK - Ubuntu Bionic Beaver 18.04 (1): using 3.0 GB graylog: using 4.0 GB Windows Server 2012 R2 (64-bit) (1): using 3.0 GB Windows 10 (64-bit) : using 5.0 GB Exchange: using 4.0 GB DC: using 4.0 GB XO Ubuntu Focal Fossa 20.04: using 3.0 GB Vendor: GenuineIntel Model: Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz Speed: 2892 MHz
Many thanks
-
Hi, So Only been about 2 weeks since last server reboot and already VM's are noticeably slower.
Still struggling to make any head way regards how to fault find this issue....
Is Dom0 the most likely culprit. Or could it be storage?
-
Hi,
Why do you think it's dom0 fault? have you took a look at the logs?
-
@Berrick Have your virtual machines started to use swap space?
-
Thanks for the replies.
@olivierlambert I have looked at the logs but as mentioned I can seen nothing which to me indicates an issue. But to my peers; I could be talking rubbish.
As all VM's appear to suffer and rebooting the physical server corrects the "issue" short term, am thinking what is common to all, hence Dom0.
As testament to xneserver/xcp it has pretty much just worked and until now, any issue's that have occurred I have been able to find the answer. So haven't really learnt a whole lot regards trouble shooting.
@Anonabhar No, I dont believe so. See below
# cat /proc/swaps Filename Type Size Used Priority /dev/sda6 partition 1048572 0 -2 # grep Swap /proc/meminfo SwapCached: 0 kB SwapTotal: 1048572 kB SwapFree: 1048572 kB
-
Have you tried using xentop to see what th VMs are doing?
-
We ran xentop after you suggested it but dont believe it shows anything?
Just recently we had notification of another lot of patch's which have been applied.
So now we wait!
-
This doesn't look normal to me --
Can you tell us more about this VM?
-
@Danp
I have the same strange values on my server to
-
100% = 1 vCPU full.
-
this is from one server with one VM
-
@olivierlambert
so this means that in the last case I have 6 vCPU at 100% loade -
How many vCPUs you have assigned to this VM?
-
@olivierlambert
Server has 24 core and the VM 20 -
So (roughly and IIRC), you are using 6,23 vCPUs at 100%.
-
@Gheppy I was surprised to see a graylog VM using so much CPU, but maybe it's normal.
-
Yes, it is a VM with a heavily used database.
At the moment I am trying to convince them to buy an XO license for this server.
I work for a public service and I have no say when it comes to money. -
@Danp said in VM's going really slow after 3 - 4 weeks:
@Gheppy I was surprised to see a graylog VM using so much CPU, but maybe it's normal.
Not sure if Gheppy DB is graylog. Mine was and when I searched for an answer as to why the strange CPU utilization came up with the same answer Oliver supplied.
I would like to point out, as I didn't earlier, that the graylog cpu utilization in the xentop image I up loaded has been fixed so CPU util is much much less now.
However, the CPU utilization of that vm was also at the high levels after a server reboot so dont think its the answer to why all vm's slowly slow down
-
What's the hardware behind it by the way?
-
My server is HP DL380 G9, CPU 2 x E5-2620 v3 @ 2.40GHz with 4xSSD and 16xHDD 2.5" and 128Gb RAM.
System ( XCP-ng and OS ) is on 4 x SSD RAID 10, DB is on 14 x HDD RAID 10.