An update from my side.
I have tried 8.0 alt kernel, 8.2 standard kernel and 8.2 alt kernel and in each case the memory usage increased over time
Below the first increase is 8.0 alt kernel, 2nd increase was 8.2 standard and 3rd 8.2 alt kernel.
An update from my side.
I have tried 8.0 alt kernel, 8.2 standard kernel and 8.2 alt kernel and in each case the memory usage increased over time
Below the first increase is 8.0 alt kernel, 2nd increase was 8.2 standard and 3rd 8.2 alt kernel.
I have another production box that has this issue.. and noticed this
[10:41 host ~]# free -m
total used free shared buff/cache available
Mem: 7913 6445 77 210 1390 284
Swap: 1023 41 982
[10:41 host ~]# ps -ef | grep sadc | wc -l
6337
[10:41 host ~]# ps -ef | grep CROND | wc -l
6337
[10:41 host ~]# ps -ef | grep 32766
root 306 32766 0 Jan31 ? 00:00:00 /usr/lib64/sa/sadc -F -L -S DISK 1 1 -
root 32766 2898 0 Jan31 ? 00:00:00 /usr/sbin/CROND -n
Not sure why I have 6337 processes for CROND and sadc, but going to do some investigations
The boxes that I have do not have dynamic memory (never used it), and we are getting the issue.
Some feedback on my test box running 8.0 alternative kernel. Been running it for a week and getting this.
As you can see there was a increase in memory of the first few days, but then it seemed to level off. I'll continue to do some tests, then I'm intending to upgrade to 8.2 and see if I can replicate (both with the standard and alternative kernels).
I'll provide feedback once I have it.
Gary
Still checking. Going to run for another day and see.
Should the memory usage in DOM0 be dropping as part of normal use?
I understand ram usage going up and down just like a normal OS, but a constant increase in memory usage doesn't seem right to me or am I misunderstanding how this work?
Gary
@olivierlambert
I installed that this morning and running my tests for a few hours (so only a short period of time)
So far I have seen this on dom0 (via free -m)
I'm running XCP-ng 8.0
Linux cpt-dc-xen02 4.19.68 #1 SMP Fri Sep 27 10:14:57 CEST 2019 x86_64 x86_64 x86_64 GNU/Linux
@fasterfourier
Can I ask what version of XCP-ng that you are running along with the OS version / Kubernetes version you are running? Still trying to work out what may be causing this.
Some feedback on some testing that I have done.
I have spun up some vms and done some stress testing on them, with a combination of stress-ng, s-tui and iperf and though slow, I can see a drop over time of DOM0 free memory
This is my setup - XCP-ng 18.0 standard kernel
Below are some notes on what I did to do this test.
DOM0 Memory issue.txt
I'll try now with different kernels / upgrading to XCP-ng 8.2 (both standard and alternative kernels) and see if I can continue to replicate the issue.
Re: kubernetes - not sure 100% if that is causing it, it just seems to be a common factor but based on the stress testing, lots of cpu/memory/io seems to be causing DOM0 memory usage to increase.
Good morning.
Long time lurker, first time commenter.
Just wanted to add my 2c worth to this conversation, that may assist.
We are running 37 XCP-ng servers, most on Xen 8.0, mixture of Dell R630/640 and some odd Supermicro servers, and we have been experiencing this issue on some of them where DOM0 runs out of memory (free -m like the first post shows very little RAM left).
We see a performance impact (but the VMS still run) with DOM0 - just trying to SSH to DOM0 / using xsconsole is slow, and then eventually DOM0's network fails and whatever we try doesn't restart the networking. When DOM0's network fails all the vms also loose network connectivity. The only resolution is to manually stop each vm via command line and then reboot the xen host.
With one exception, all Xen servers that have experienced this issue generally has an uptime of at least 200 days, but the thing I find interesting is the servers that also have issues has a kubernetes data node on them.
I assume something that kubernetes does is causing the issue. The boxes that do not have kubernetes on them (with 1 exception) never has had this issue.
I have a spare Dell R640 that I'm currently doing some testing on to see if I can create lots of VM and do a heap of CPU/Memory/IO on it to see if I can replicate the issue and if I can try the alternative kernel to see if that makes any difference.
I'll provide feedback on what I find.
Gary