Memory Consumption goes higher day by day
-
-
What could be interesting is to know which process is leaking in the dom0
-
In which case it's time to do a "top" command on both hypervisor servers.
-
attaching both the servers htop and top output.
-
@dhiraj26683 Providing here with output of below commands
slabtop -o -s c
cat /proc/meminfo
-
@dhiraj26683 Providing both servers ixgbe module info and rpm info, it's stock driver came along.
[13:59 xen-srv2 Dell-Drivers]$ modinfo ixgbe
filename: /lib/modules/4.19.0+1/updates/ixgbe.ko
version: 5.9.4
license: GPL
description: Intel(R) 10GbE PCI Express Linux Network Driver
author: Intel Corporation, linux.nics@intel.com
srcversion: AA8061C6A752528BD6CFE19[13:45 xen-srv1 ~]$ modinfo ixgbe
filename: /lib/modules/4.19.0+1/updates/ixgbe.ko
version: 5.9.4
license: GPL
description: Intel(R) 10GbE PCI Express Linux Network Driver
author: Intel Corporation, linux.nics@intel.com
srcversion: AA8061C6A752528BD6CFE19We tried below version update of ice modules as well,
ice-1.10.1.2.2
ice-1.12.7It's the same behaviour, hence we downloaded ice drivers from Dell and installed available version which is as given below. But it's still the same.
ice-1.11.14 -
@dhiraj26683
[14:26 xen-srv2 Dell-Drivers]$ rpm -qf /lib/modules/4.19.0+1/updates/ixgbe.ko
intel-ixgbe-5.9.4-1.xcpng8.2.x86_64 -
@dhiraj26683 tried to find out the process. But nothing to be identified as such. There are only three guests are running on this server and it is almost there to reach the limit. After all the memory goes into cache, we will start getting notifications/alerts about Control Domain Load reached 100% and there may be a service degradation.
-
Do you have any extra stuff installed in your Dom0? It's very important to know it.
-
@dhiraj26683 Thanks for replying back @olivierlambert
Nothing as such other than Ice drivers.But for now, we are not running any virtual GPU workstation from last 3-4 months, so that kind of load is not there on any of our XCP hosts.
But as i could say, this memory issue started resently and and the only changes that we do is to push the patches via xoa.
Considering this kind of issue, where memory gets fullly utilized (get into cache) and notifications start about Control Domain Load reached 100%, we didn't pushed any patches for now.
-
Let me ping @psafont in case he got an idea on what could cause this
edit: also @gduperrey if he got an idea how to see what's eating all the memory
-
@dhiraj26683 the cached memory is not used by any particular process, it is used to keep eg. recently-accessed in memory to avoid reading them again from disk if the need arises. The OS is trying to make good use of otherwise-unused memory in hope of better performance, instead of letting unused memory just sitting idle.
If you launch a new process that would require more memory than what's currently free, the OS should happily free old cached pages for immediate reuse.
Did you observe anything specifically wrong, that turned you to observing memory consumption?
-
@olivierlambert i believe it's something related to nic drivers as we are running network intensive guests on both the servers.
We have a third Server, which is runing standalone. Below is it's config and only one guests runs on this host, which is XOA
CPU - AMD Tyzen Threadripper PRO 3975WX 32-Cores 3500 MHz
Memory - 320G
Ethernet - 1G Ethernet
10G Fiber
intel-ixgbe-5.9.4-1.xcpng8.2.x86_64As XOA does uses 10G ethernet for backup/migration operations. It seems to be caching not that much memory, but it is caching though. But not ending up utilizing all memory in cache because less operations happens here.
-
-
@dhiraj26683 Would you like to try a newer
ixgbe
? We've got 5.18.6 available in our repositories. -
@dhiraj26683 if it was used by a process it would be counted in
used
not inbuff/cache
. Those are used by the kernel's Virtual Filesystem subsystem.Now if your problem is that a given process fails to allocate memory while there is so much of the memory in buff/cache, then there may be something to dig in that direction, but we'll need specific symptoms to be able to help.
-
@stormi Sure, we can try that. Thank you
-
@dhiraj26683 It's available as the
intel-ixgbe-alt
RPM, that you can install withyum install
.However, I second Yann's comment: growing cache usage is not an issue, as long as it's reclaimed when another process needs more than what's available, and this is what should happen whenever such a need arises. Unless you have evidence of actual issues caused by this cache usage.
-
@yann I can understand the buff/cache part but on this server which is with 1TB physical memory and only three VM's running with 8G, 32G and 64G as their alloted memory, eating up and alloting all memory in cache is not understandable. It's getting cache means something is using it. Not sure if that makes sence though.
Initially both our XCP hosts were with 16G Control domain memory. We started to face issue and alerts, we increased to 32G, then 64G, and then 128G, and it's like that for a while now.
Now we are not using vGPU, so it's not getting full within 2 days where alerts starts saying Control domain memory reached it's limit
-