XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Alert: Control Domain Memory Usage

    Scheduled Pinned Locked Moved Solved Compute
    194 Posts 21 Posters 200.6k Views 16 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • J Offline
      JCastang
      last edited by

      Hello,

      Does this fix has been released or is to be released ?

      stormiS 1 Reply Last reply Reply Quote 0
      • stormiS Offline
        stormi Vates 🪐 XCP-ng Team @JCastang
        last edited by

        @jcastang It is being tested and you can join the effort: yum update intel-ixgbe --enablerepo=xcp-ng-testing. The results are very good, I just want a bit more feedback.

        J 1 Reply Last reply Reply Quote 1
        • J Offline
          JCastang @stormi
          last edited by

          @stormi Ok, I will update one of our pools and get some results.

          1 Reply Last reply Reply Quote 1
          • J Offline
            JCastang @delaf
            last edited by

            @delaf Can you point me the tool you are using to get memory graphs ? (I want to check my upgraded pool).
            I was searching in Advance live Telemetry with no luck.

            delafD 1 Reply Last reply Reply Quote 0
            • olivierlambertO Offline
              olivierlambert Vates 🪐 Co-Founder CEO
              last edited by

              Netdata will only give you the last hour.

              If you want longer metrics, you need to send the data in Prometheus/Grafana.

              1 Reply Last reply Reply Quote 1
              • delafD Offline
                delaf @JCastang
                last edited by

                @jcastang we are using a netdata/prometheus/grafana stack.

                @olivierlambert you can change the retention method and keep much more data on netdata. There is also (since netdata 1.18 i think) a dbengine that allows you to store data on disk.

                delafD 1 Reply Last reply Reply Quote 0
                • delafD Offline
                  delaf @delaf
                  last edited by

                  PS: we are not using the netdata config from "Advanced telemetry": we are installing our own netdata config.

                  1 Reply Last reply Reply Quote 0
                  • stormiS Offline
                    stormi Vates 🪐 XCP-ng Team
                    last edited by

                    dbengine is a bit dangerous on dom0. There used to be a bug where it would keep growing forever, so I don't trust it anymore.

                    delafD 1 Reply Last reply Reply Quote 0
                    • delafD Offline
                      delaf @stormi
                      last edited by

                      @stormi oh I did not know that as I never use it: I only know that it exists 😉

                      delafD 1 Reply Last reply Reply Quote 0
                      • delafD Offline
                        delaf @delaf
                        last edited by

                        @stormi Hello, some week after, I can confirm that the problem is solved here by using intel-ixgbe.x86_64@5.5.2-2.1.xcpng8.1 or intel-ixgbe.x86_64@5.5.2-2.1.xcpng8.2

                        delafD 1 Reply Last reply Reply Quote 1
                        • delafD Offline
                          delaf @delaf
                          last edited by

                          PS: i'm using these 2 scripts to list all interfaces drivers version accross our servers :

                          $ cat get_network_drivers_info.sh
                          #!/bin/bash                                                                                                                                                                                                                                                                                                   
                          
                          format="| %-13.13s | %-20.20s | %-20.20s | %-10.10s | %-7.7s | %-10.10s | %-30.30s | %-s \n"
                          printf "${format}" "date" "hostname" "OS" "interface" "driver" "version" "firmware" "yum"
                          printf "${format}" "----------------------------" "----------------------------" "----------------------------" "----------------------------" "----------------------------" "----------------------------" "----------------------------" "----------------------------"
                          
                          if [ $# -gt 0 ]; then
                              servers=($(echo ${BASH_ARGV[*]}))
                          else
                              servers=($(cat host.json | jq -r '.[] | .address' | egrep -v "^192.168.124.9$"))
                          fi
                          
                          for line in ${servers[@]}; do
                              scp get_network_drivers_info.sh.tpl ${line}:/tmp/get_network_drivers_info.sh  > /dev/null 2>&1;
                              ssh -n ${line} bash /tmp/get_network_drivers_info.sh 2> /dev/null;
                              if [ $? -ne 0 ]; then
                                  echo "${line} fail" >&2
                              fi
                          done
                          
                          $ cat get_network_drivers_info.sh.tpl
                          #!/bin/bash                                                                                                                                                                                                                                                                                                   
                          
                          format="| %-13.13s | %-20.20s | %-20.20s | %-10.10s | %-7.7s | %-10.10s | %-30.30s | %-s \n"
                          d=$(date '+%Y%m%d-%H%M')
                          name=$(hostname)
                          cd  /sys/class/net/
                          for interface in $(ls -l /sys/class/net/ | awk '/\/pci/ {print $9}'); do
                              version=$(ethtool -i ${interface} | awk '/^version:/ {$1=""; print}')
                              firmware=$(ethtool -i ${interface} | awk '/^firmware-version:/ {$1=""; print}')
                              driver=$(ethtool -i ${interface} | awk '/^driver:/ {$1=""; print}')
                              YUM=$(which yum)
                              if [ $? -eq 0 ]; then
                                  packages=$(yum list installed | awk '/ixgbe/ {print $1"@"$2}' | tr '\n' ',')
                              else
                                  packages="NA"
                              fi
                              os_version=$(lsb_release -d | awk '{$1=""} 1' | sed 's/XenServer/XS/; s/ (xenenterprise)//; s/release //')
                              printf "${format}" "${d}" "${name}" "${os_version}" "${interface}" "${driver}" "${version}" "${firmware}" "${packages}"
                          done
                          

                          PS: host.json file is generated via : xo-cli --list-objects type=host

                          1 Reply Last reply Reply Quote 1
                          • stormiS Offline
                            stormi Vates 🪐 XCP-ng Team
                            last edited by stormi

                            FYI, I have just published security updates today PLUS the fixed ixgbe driver as an official update to XCP-ng 8.1 and 8.2.

                            We made it. This is the end of this huge thread.

                            A big thank you to everyone involved in debugging the issue.

                            And this is not a 🐟 :D.

                            1 Reply Last reply Reply Quote 6
                            • F Offline
                              frankz
                              last edited by

                              Its not solving it, but you can run

                              echo 3 > /proc/sys/vm/drop_caches

                              to release some of the cache again, without interfering with running processes.

                              [root@host2 ~]# free -m
                              total used free shared buff/cache available
                              Mem: 15958 3308 158 8 12491 2355
                              Swap: 1023 177 846
                              [root@host2 ~]# echo 3 > /proc/sys/vm/drop_caches
                              [root@host2 ~]# free -m
                              total used free shared buff/cache available
                              Mem: 15958 3308 2598 10 10051 2751
                              Swap: 1023 177 846

                              1 Reply Last reply Reply Quote 0
                              • First post
                                Last post