Alert: Control Domain Memory Usage
- 
 Nice to see Citrix are also getting to the same conclusions  edit: thanks @fasterfourier for your feedback! 
- 
 Official Citrix update has been posted: https://support.citrix.com/article/CTX306529 
- 
 \o/ What I still find really weird is the fact we had report of the issue far longer before Citrix. And we had roughly 10 people affected while Citrix got only 1 report  
- 
 Probably plenty of Citrix customers were affected, but they would rather reboot on schedule than spend months working through the support process  
- 
 haha that might be the answer indeed⦠
- 
 Hello, Does this fix has been released or is to be released ? 
- 
 @jcastang It is being tested and you can join the effort: yum update intel-ixgbe --enablerepo=xcp-ng-testing. The results are very good, I just want a bit more feedback.
- 
 @stormi Ok, I will update one of our pools and get some results. 
- 
 @delaf Can you point me the tool you are using to get memory graphs ? (I want to check my upgraded pool). 
 I was searching in Advance live Telemetry with no luck.
- 
 Netdata will only give you the last hour. If you want longer metrics, you need to send the data in Prometheus/Grafana. 
- 
 @jcastang we are using a netdata/prometheus/grafana stack. @olivierlambert you can change the retention method and keep much more data on netdata. There is also (since netdata 1.18 i think) a dbengine that allows you to store data on disk. 
- 
 PS: we are not using the netdata config from "Advanced telemetry": we are installing our own netdata config. 
- 
 dbengine is a bit dangerous on dom0. There used to be a bug where it would keep growing forever, so I don't trust it anymore. 
- 
 @stormi oh I did not know that as I never use it: I only know that it exists  
- 
 @stormi Hello, some week after, I can confirm that the problem is solved here by using intel-ixgbe.x86_64@5.5.2-2.1.xcpng8.1 or intel-ixgbe.x86_64@5.5.2-2.1.xcpng8.2 
- 
 PS: i'm using these 2 scripts to list all interfaces drivers version accross our servers : $ cat get_network_drivers_info.sh #!/bin/bash format="| %-13.13s | %-20.20s | %-20.20s | %-10.10s | %-7.7s | %-10.10s | %-30.30s | %-s \n" printf "${format}" "date" "hostname" "OS" "interface" "driver" "version" "firmware" "yum" printf "${format}" "----------------------------" "----------------------------" "----------------------------" "----------------------------" "----------------------------" "----------------------------" "----------------------------" "----------------------------" if [ $# -gt 0 ]; then servers=($(echo ${BASH_ARGV[*]})) else servers=($(cat host.json | jq -r '.[] | .address' | egrep -v "^192.168.124.9$")) fi for line in ${servers[@]}; do scp get_network_drivers_info.sh.tpl ${line}:/tmp/get_network_drivers_info.sh > /dev/null 2>&1; ssh -n ${line} bash /tmp/get_network_drivers_info.sh 2> /dev/null; if [ $? -ne 0 ]; then echo "${line} fail" >&2 fi done$ cat get_network_drivers_info.sh.tpl #!/bin/bash format="| %-13.13s | %-20.20s | %-20.20s | %-10.10s | %-7.7s | %-10.10s | %-30.30s | %-s \n" d=$(date '+%Y%m%d-%H%M') name=$(hostname) cd /sys/class/net/ for interface in $(ls -l /sys/class/net/ | awk '/\/pci/ {print $9}'); do version=$(ethtool -i ${interface} | awk '/^version:/ {$1=""; print}') firmware=$(ethtool -i ${interface} | awk '/^firmware-version:/ {$1=""; print}') driver=$(ethtool -i ${interface} | awk '/^driver:/ {$1=""; print}') YUM=$(which yum) if [ $? -eq 0 ]; then packages=$(yum list installed | awk '/ixgbe/ {print $1"@"$2}' | tr '\n' ',') else packages="NA" fi os_version=$(lsb_release -d | awk '{$1=""} 1' | sed 's/XenServer/XS/; s/ (xenenterprise)//; s/release //') printf "${format}" "${d}" "${name}" "${os_version}" "${interface}" "${driver}" "${version}" "${firmware}" "${packages}" donePS: host.jsonfile is generated via :xo-cli --list-objects type=host
- 
 FYI, I have just published security updates today PLUS the fixed ixgbedriver as an official update to XCP-ng 8.1 and 8.2.We made it. This is the end of this huge thread. A big thank you to everyone involved in debugging the issue. And this is not a  :D. :D.
- 
 Its not solving it, but you can run echo 3 > /proc/sys/vm/drop_caches to release some of the cache again, without interfering with running processes. [root@host2 ~]# free -m 
 total used free shared buff/cache available
 Mem: 15958 3308 158 8 12491 2355
 Swap: 1023 177 846
 [root@host2 ~]# echo 3 > /proc/sys/vm/drop_caches
 [root@host2 ~]# free -m
 total used free shared buff/cache available
 Mem: 15958 3308 2598 10 10051 2751
 Swap: 1023 177 846


