XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Solved VM's going really slow after 3 - 4 weeks

    Compute
    9
    36
    1479
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • B
      Berrick
      last edited by

      Evening All,

      Would be grateful if anyone can offer guidance on how to fault find this issue. Or solve it :). The google Fu I have used thus far hasn't really help.

      All appear to be working great. However over a period of about 3 - 4 weeks the VM's slow down. For example the Windows 10 VM will become noticeably slower.

      Rebooting the physical server instantly brings the VM's back to their zippy self.

      This is a pretty much an out of the box installation
      I haven't found anything obviously wrong.
      The only thing I have seen is that some of the CPU's are getting very busy when the issue presents. Whereas when "all" appears OK all CPU's are pretty much around 12% usage.

      I have just applied the following pool patch's

      • Linux firmware 20190314
      • microcode_cti 2.1
      • xen-dom0-libs 4.13.4
      • xen-dom0-tools 4.13.4
      • xen-hypervisor 4.13.4
      • xen-libs 4.13.4
      • xen-tools 4.13.4

      Below is the spec of the physical and XCP setup

      **Physical Server**
      CPU model	Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz
      GPUs	MGA G200EH
      Core (socket)	32 (2)
      Hyper-threading (SMT)	Enabled
      Manufacturer info	HP (ProLiant DL380p Gen8)
      BIOS info	HP (P70)
      
      **Network**
      HP Ethernet 1Gb 4-port 366FLR Adapter
      
      **Storage**
      Smart Array P420i Controller
      Raid 5
      4 x ST9900805SS
      
      **XCP-ng**
      Build Date 2022-02-11
      Version: 8.2
      DBV: 0.0.1
      
      17.4 GB RAM available (48.0 GB total)
      
      Local storage 	Ext	No	36% (893.9 GB used)	2.4 TB	2.9 TB
      
      **Networking**
      Dedicate NIC 0 for management
      LACP for all VM's using NIC 2 & 3
      
      **VM's**
      CheckMK - Ubuntu Bionic Beaver 18.04 (1): using 3.0 GB
      graylog: using 4.0 GB
      Windows Server 2012 R2 (64-bit) (1): using 3.0 GB
      Windows 10 (64-bit) : using 5.0 GB
      Exchange: using 4.0 GB
      DC: using 4.0 GB
      XO Ubuntu Focal Fossa 20.04: using 3.0 GB
      
      Vendor: GenuineIntel
      Model: Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz
      Speed: 2892 MHz
      

      Many thanks

      1 Reply Last reply Reply Quote 0
      • B Berrick marked this topic as a question on
      • B
        Berrick
        last edited by

        Hi, So Only been about 2 weeks since last server reboot and already VM's are noticeably slower.

        Still struggling to make any head way regards how to fault find this issue....

        Is Dom0 the most likely culprit. Or could it be storage?

        AnonabharA 1 Reply Last reply Reply Quote 0
        • olivierlambertO
          olivierlambert Vates 🪐 Co-Founder🦸 CEO 🧑‍💼
          last edited by

          Hi,

          Why do you think it's dom0 fault? have you took a look at the logs?

          1 Reply Last reply Reply Quote 0
          • AnonabharA
            Anonabhar @Berrick
            last edited by

            @Berrick Have your virtual machines started to use swap space?

            1 Reply Last reply Reply Quote 0
            • B
              Berrick
              last edited by

              Thanks for the replies.

              @olivierlambert I have looked at the logs but as mentioned I can seen nothing which to me indicates an issue. But to my peers; I could be talking rubbish.

              As all VM's appear to suffer and rebooting the physical server corrects the "issue" short term, am thinking what is common to all, hence Dom0.

              As testament to xneserver/xcp it has pretty much just worked and until now, any issue's that have occurred I have been able to find the answer. So haven't really learnt a whole lot regards trouble shooting.

              @Anonabhar No, I dont believe so. See below

              # cat /proc/swaps
              Filename                                Type            Size    Used    Priority
              /dev/sda6                               partition       1048572 0       -2
              
              # grep Swap /proc/meminfo
              SwapCached:            0 kB
              SwapTotal:       1048572 kB
              SwapFree:        1048572 kB
              
              1 Reply Last reply Reply Quote 0
              • M
                mjtbrady
                last edited by

                Have you tried using xentop to see what th VMs are doing?

                B 1 Reply Last reply Reply Quote 1
                • B
                  Berrick @mjtbrady
                  last edited by

                  @mjtbrady

                  We ran xentop after you suggested it but dont believe it shows anything?

                  XentopAfterReboot.jpg

                  Just recently we had notification of another lot of patch's which have been applied.

                  So now we wait!

                  1 Reply Last reply Reply Quote 0
                  • DanpD
                    Danp Top contributor 💪
                    last edited by

                    This doesn't look normal to me --
                    05f778de-d7f5-40ac-a489-388327c8aff4-image.png

                    Can you tell us more about this VM?

                    G 1 Reply Last reply Reply Quote 0
                    • G
                      Gheppy @Danp
                      last edited by

                      @Danp
                      I have the same strange values on my server to
                      7b9159c9-30e6-40e6-b982-54a5aab158cf-image.png

                      DanpD 1 Reply Last reply Reply Quote 0
                      • olivierlambertO
                        olivierlambert Vates 🪐 Co-Founder🦸 CEO 🧑‍💼
                        last edited by

                        100% = 1 vCPU full.

                        G 1 Reply Last reply Reply Quote 0
                        • G
                          Gheppy
                          last edited by

                          this is from one server with one VM
                          6687a557-58b5-447a-8cd1-cd564c3387e8-image.png

                          1 Reply Last reply Reply Quote 0
                          • G
                            Gheppy @olivierlambert
                            last edited by

                            @olivierlambert
                            so this means that in the last case I have 6 vCPU at 100% loade

                            1 Reply Last reply Reply Quote 0
                            • olivierlambertO
                              olivierlambert Vates 🪐 Co-Founder🦸 CEO 🧑‍💼
                              last edited by

                              How many vCPUs you have assigned to this VM?

                              G 1 Reply Last reply Reply Quote 0
                              • G
                                Gheppy @olivierlambert
                                last edited by

                                @olivierlambert
                                Server has 24 core and the VM 20

                                1 Reply Last reply Reply Quote 0
                                • olivierlambertO
                                  olivierlambert Vates 🪐 Co-Founder🦸 CEO 🧑‍💼
                                  last edited by

                                  So (roughly and IIRC), you are using 6,23 vCPUs at 100%.

                                  1 Reply Last reply Reply Quote 0
                                  • DanpD
                                    Danp Top contributor 💪 @Gheppy
                                    last edited by

                                    @Gheppy I was surprised to see a graylog VM using so much CPU, but maybe it's normal. 🤷

                                    B 1 Reply Last reply Reply Quote 0
                                    • G
                                      Gheppy
                                      last edited by

                                      Yes, it is a VM with a heavily used database.
                                      At the moment I am trying to convince them to buy an XO license for this server.
                                      I work for a public service and I have no say when it comes to money.

                                      1 Reply Last reply Reply Quote 0
                                      • B
                                        Berrick @Danp
                                        last edited by Berrick

                                        @Danp said in VM's going really slow after 3 - 4 weeks:

                                        @Gheppy I was surprised to see a graylog VM using so much CPU, but maybe it's normal. 🤷

                                        Not sure if Gheppy DB is graylog. Mine was and when I searched for an answer as to why the strange CPU utilization came up with the same answer Oliver supplied.

                                        I would like to point out, as I didn't earlier, that the graylog cpu utilization in the xentop image I up loaded has been fixed so CPU util is much much less now.

                                        However, the CPU utilization of that vm was also at the high levels after a server reboot so dont think its the answer to why all vm's slowly slow down

                                        1 Reply Last reply Reply Quote 0
                                        • olivierlambertO
                                          olivierlambert Vates 🪐 Co-Founder🦸 CEO 🧑‍💼
                                          last edited by

                                          What's the hardware behind it by the way?

                                          1 Reply Last reply Reply Quote 0
                                          • G
                                            Gheppy
                                            last edited by Gheppy

                                            My server is HP DL380 G9, CPU 2 x E5-2620 v3 @ 2.40GHz with 4xSSD and 16xHDD 2.5" and 128Gb RAM.
                                            System ( XCP-ng and OS ) is on 4 x SSD RAID 10, DB is on 14 x HDD RAID 10.

                                            B 1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post