XCP-ng

    • Register
    • Login
    • Search
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups

    Solved Alert: Control Domain Memory Usage

    Compute
    21
    194
    50530
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • olivierlambert
      olivierlambert Vates ๐Ÿช Co-Founder๐Ÿฆธ CEO ๐Ÿง‘โ€๐Ÿ’ผ last edited by olivierlambert

      Nice to see Citrix are also getting to the same conclusions ๐Ÿ™‚

      edit: thanks @fasterfourier for your feedback!

      F 1 Reply Last reply Reply Quote 1
      • F
        fasterfourier @olivierlambert last edited by

        @olivierlambert

        Official Citrix update has been posted: https://support.citrix.com/article/CTX306529

        1 Reply Last reply Reply Quote 2
        • olivierlambert
          olivierlambert Vates ๐Ÿช Co-Founder๐Ÿฆธ CEO ๐Ÿง‘โ€๐Ÿ’ผ last edited by

          \o/

          What I still find really weird is the fact we had report of the issue far longer before Citrix. And we had roughly 10 people affected while Citrix got only 1 report ๐Ÿค”

          F 1 Reply Last reply Reply Quote 0
          • F
            fasterfourier @olivierlambert last edited by

            @olivierlambert

            Probably plenty of Citrix customers were affected, but they would rather reboot on schedule than spend months working through the support process ๐Ÿ™‚

            1 Reply Last reply Reply Quote 2
            • olivierlambert
              olivierlambert Vates ๐Ÿช Co-Founder๐Ÿฆธ CEO ๐Ÿง‘โ€๐Ÿ’ผ last edited by

              haha that might be the answer indeedโ€ฆ

              1 Reply Last reply Reply Quote 0
              • J
                JCastang last edited by

                Hello,

                Does this fix has been released or is to be released ?

                stormi 1 Reply Last reply Reply Quote 0
                • stormi
                  stormi Vates ๐Ÿช XCP-ng Team ๐Ÿš€ @JCastang last edited by

                  @jcastang It is being tested and you can join the effort: yum update intel-ixgbe --enablerepo=xcp-ng-testing. The results are very good, I just want a bit more feedback.

                  J 1 Reply Last reply Reply Quote 1
                  • J
                    JCastang @stormi last edited by

                    @stormi Ok, I will update one of our pools and get some results.

                    1 Reply Last reply Reply Quote 1
                    • J
                      JCastang @delaf last edited by

                      @delaf Can you point me the tool you are using to get memory graphs ? (I want to check my upgraded pool).
                      I was searching in Advance live Telemetry with no luck.

                      delaf 1 Reply Last reply Reply Quote 0
                      • olivierlambert
                        olivierlambert Vates ๐Ÿช Co-Founder๐Ÿฆธ CEO ๐Ÿง‘โ€๐Ÿ’ผ last edited by

                        Netdata will only give you the last hour.

                        If you want longer metrics, you need to send the data in Prometheus/Grafana.

                        1 Reply Last reply Reply Quote 1
                        • delaf
                          delaf @JCastang last edited by

                          @jcastang we are using a netdata/prometheus/grafana stack.

                          @olivierlambert you can change the retention method and keep much more data on netdata. There is also (since netdata 1.18 i think) a dbengine that allows you to store data on disk.

                          delaf 1 Reply Last reply Reply Quote 0
                          • delaf
                            delaf @delaf last edited by

                            PS: we are not using the netdata config from "Advanced telemetry": we are installing our own netdata config.

                            1 Reply Last reply Reply Quote 0
                            • stormi
                              stormi Vates ๐Ÿช XCP-ng Team ๐Ÿš€ last edited by

                              dbengine is a bit dangerous on dom0. There used to be a bug where it would keep growing forever, so I don't trust it anymore.

                              delaf 1 Reply Last reply Reply Quote 0
                              • delaf
                                delaf @stormi last edited by

                                @stormi oh I did not know that as I never use it: I only know that it exists ๐Ÿ˜‰

                                delaf 1 Reply Last reply Reply Quote 0
                                • delaf
                                  delaf @delaf last edited by

                                  @stormi Hello, some week after, I can confirm that the problem is solved here by using intel-ixgbe.x86_64@5.5.2-2.1.xcpng8.1 or intel-ixgbe.x86_64@5.5.2-2.1.xcpng8.2

                                  delaf 1 Reply Last reply Reply Quote 1
                                  • delaf
                                    delaf @delaf last edited by

                                    PS: i'm using these 2 scripts to list all interfaces drivers version accross our servers :

                                    $ cat get_network_drivers_info.sh
                                    #!/bin/bash                                                                                                                                                                                                                                                                                                   
                                    
                                    format="| %-13.13s | %-20.20s | %-20.20s | %-10.10s | %-7.7s | %-10.10s | %-30.30s | %-s \n"
                                    printf "${format}" "date" "hostname" "OS" "interface" "driver" "version" "firmware" "yum"
                                    printf "${format}" "----------------------------" "----------------------------" "----------------------------" "----------------------------" "----------------------------" "----------------------------" "----------------------------" "----------------------------"
                                    
                                    if [ $# -gt 0 ]; then
                                        servers=($(echo ${BASH_ARGV[*]}))
                                    else
                                        servers=($(cat host.json | jq -r '.[] | .address' | egrep -v "^192.168.124.9$"))
                                    fi
                                    
                                    for line in ${servers[@]}; do
                                        scp get_network_drivers_info.sh.tpl ${line}:/tmp/get_network_drivers_info.sh  > /dev/null 2>&1;
                                        ssh -n ${line} bash /tmp/get_network_drivers_info.sh 2> /dev/null;
                                        if [ $? -ne 0 ]; then
                                            echo "${line} fail" >&2
                                        fi
                                    done
                                    
                                    $ cat get_network_drivers_info.sh.tpl
                                    #!/bin/bash                                                                                                                                                                                                                                                                                                   
                                    
                                    format="| %-13.13s | %-20.20s | %-20.20s | %-10.10s | %-7.7s | %-10.10s | %-30.30s | %-s \n"
                                    d=$(date '+%Y%m%d-%H%M')
                                    name=$(hostname)
                                    cd  /sys/class/net/
                                    for interface in $(ls -l /sys/class/net/ | awk '/\/pci/ {print $9}'); do
                                        version=$(ethtool -i ${interface} | awk '/^version:/ {$1=""; print}')
                                        firmware=$(ethtool -i ${interface} | awk '/^firmware-version:/ {$1=""; print}')
                                        driver=$(ethtool -i ${interface} | awk '/^driver:/ {$1=""; print}')
                                        YUM=$(which yum)
                                        if [ $? -eq 0 ]; then
                                            packages=$(yum list installed | awk '/ixgbe/ {print $1"@"$2}' | tr '\n' ',')
                                        else
                                            packages="NA"
                                        fi
                                        os_version=$(lsb_release -d | awk '{$1=""} 1' | sed 's/XenServer/XS/; s/ (xenenterprise)//; s/release //')
                                        printf "${format}" "${d}" "${name}" "${os_version}" "${interface}" "${driver}" "${version}" "${firmware}" "${packages}"
                                    done
                                    

                                    PS: host.json file is generated via : xo-cli --list-objects type=host

                                    1 Reply Last reply Reply Quote 1
                                    • stormi
                                      stormi Vates ๐Ÿช XCP-ng Team ๐Ÿš€ last edited by stormi

                                      FYI, I have just published security updates today PLUS the fixed ixgbe driver as an official update to XCP-ng 8.1 and 8.2.

                                      We made it. This is the end of this huge thread.

                                      A big thank you to everyone involved in debugging the issue.

                                      And this is not a ๐ŸŸ :D.

                                      1 Reply Last reply Reply Quote 5
                                      • F
                                        frankz last edited by

                                        Its not solving it, but you can run

                                        echo 3 > /proc/sys/vm/drop_caches

                                        to release some of the cache again, without interfering with running processes.

                                        [root@host2 ~]# free -m
                                        total used free shared buff/cache available
                                        Mem: 15958 3308 158 8 12491 2355
                                        Swap: 1023 177 846
                                        [root@host2 ~]# echo 3 > /proc/sys/vm/drop_caches
                                        [root@host2 ~]# free -m
                                        total used free shared buff/cache available
                                        Mem: 15958 3308 2598 10 10051 2751
                                        Swap: 1023 177 846

                                        1 Reply Last reply Reply Quote 0
                                        • First post
                                          Last post