XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Issue with VM network dropping in and out

    Scheduled Pinned Locked Moved Hardware
    38 Posts 6 Posters 6.7k Views 6 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • G Offline
      glenlewis09
      last edited by

      I have 8 Host
      4 running on Beelink Ser5 Mini PC
      4 running on Geekom A5 Mini PC

      Any VM running on the Beelink run perfectly.
      Any VM running on the Geekom A5 the networks goes on and off. The VNC never disconnects but moving files or RDP in the VM's hosted on them just have constant network related issues.

      The Geekom A5 have the 2.5GB Realtek Nic
      The Beelinks have 1.0GB unknown manufacture of nic.

      I have tried a bunch of different scenarios of maybe bad switches/cables/ports on switches to isolate the issue. I have come back to the Host as the issue.

      I have installed windows 11 on the Mini PC and the Network has 0 issues. So the Nic isn't bad (On all 4 mini PC's)

      I need someones advice here to help resolve this, I love using these mini pc for my home labs and projects. I am hoping I didn't just buy 4 machines that will never work with the XEN infrastructure.

      1 Reply Last reply Reply Quote 0
      • olivierlambertO Offline
        olivierlambert Vates 🪐 Co-Founder CEO
        last edited by

        Hi,

        We need a bit more context to be able to assist 🙂 Which version of XCP-ng are you using?

        1 Reply Last reply Reply Quote 1
        • G Offline
          glenlewis09
          last edited by

          Sorry about that, I am rare to use any forums. Normally I can figure these small issues out myself.

          All of my host are running XCP-ng 8.2.1
          XEN Orchestra 5.90.0

          The Geekom A5 hosts are (4 Hosts):
          CPU: AMD Ryzen 7 5800H (8 Cores, 16 Threads, 16MB Cache, 3.2 GHz~ 4.4 GHz)
          RAM: Dual-channel DDR4-3200 SODIMM, 64GB
          NIC: Realtek 2.5 GB

          I also have 4 BeeLink Host running:
          CPU: AMD Ryzen 7 5800H (8 Cores, 16 Threads, 16MB Cache, 3.2 GHz~ 4.4 GHz)
          RAM: Dual-channel DDR4-3200 SODIMM, 64GB
          NIC: 1.0 GB Nic

          1. The VM's running on the Beelink host never have any issue.
            a) I only dislike that 1 GB nic so I looked for a new Mini PC with a built in 2.5GB
          2. When I migrate a VM to the Host running on the Geekom PC's the VM becomes unstable with the network connection.
            a) I can go through XEN and pull up the console just fine and work with the Desktop UI.
            b) When I RDP into the VM the RDP connections drops every couple of seconds and reestablishes quickly.
            c) My SQL and IIS services are unable to maintain a connection to each other while on the new Hosts
            d) Oddly enough my old laptop I converted to a VM works fine on either set of host. It is running Windows 11 .
          3. The VM's are mostly windows server 2019 and 2022.
          4. All the VM's have the PV drivers installed so the XEN can easily manage them.
            a) Management agent 9.3.2-110
          5. All 8 of my host share a NFS storage back to a TrueNas Core

          To make sure the issue wasn't the network I completely swapped out all patch cables and the switch the host where connected too, even change the uplink ports just incase.

          My first thought was network issue with the HW, but when I connected the original Beelink's to the same ports and cables and no issue I realized it has to be an issue on the host itself.

          I would love to figure this one out, since the Geekom have 2.5 GB Nic's I can get better bandwidth for the disk speeds and better overall performance.

          Thanks again.

          A 3 Replies Last reply Reply Quote 0
          • A Offline
            Andrew Top contributor @glenlewis09
            last edited by

            @glenlewis09 Does the XCP host report the ethernet link going down? dmesg

            1 Reply Last reply Reply Quote 0
            • G Offline
              glenlewis09
              last edited by glenlewis09

              I don't see any logs from the XEN Orchestra.

              Home -> Hosts -> GLS-XENHOST08 -> Logs
              I see no logs here, Also When I go to Network The status is always showing connected.

              A 1 Reply Last reply Reply Quote 0
              • A Offline
                Andrew Top contributor @glenlewis09
                last edited by

                @glenlewis09 You need to ssh into the XCP host.

                G 1 Reply Last reply Reply Quote 0
                • G Offline
                  glenlewis09 @Andrew
                  last edited by

                  @Andrew

                  I have found the ssh command to create the log using the xen-bugtool.

                  I am not a linux guru so I don't know how to transfer the log file from the host to the windows client via ssh. Nor do I know how to access the host files outside of ssh.

                  [09:14GLS-XENHOST08bug-report]# xen-bugtool --yestoall
                  Warning: '--yestoall' argument provided, will not prompt for individual files.

                  This application will collate the Xen dmesg output, details of the
                  hardware configuration of your machine, information about the build of
                  Xen that you are using, plus, if you allow it, various logs.

                  The collated information will be saved as a .tar.bz2 for archiving or
                  sending to a Technical Support Representative.

                  The logs may contain private information, and if you are at all
                  worried about that, you should exit now, or you should explicitly
                  exclude those logs from the archive.

                  Omitting /dev/shm/metrics/xcp-rrdd-xenpm, size constraint of xcp-rrdd-plugins exceeded
                  Omitting /dev/shm/metrics/xcp-rrdd-squeezed, size constraint of xcp-rrdd-plugins exceeded
                  Omitting /dev/shm/metrics/xcp-rrdd-mem_vms, size constraint of xcp-rrdd-plugins exceeded
                  Omitting /dev/shm/metrics/xcp-rrdd-mem_host, size constraint of xcp-rrdd-plugins exceeded
                  [01/31/24 09:14:52 CST] Creating output file
                  [01/31/24 09:14:52 CST] Running commands to collect data
                  Writing tarball /var/opt/xen/bug-report/bug-report-20240131091452.tar.bz2 successful.
                  [09:15GLS-XENHOST08bug-report]#

                  This is the output from the ssh. But again I am not sure how to copy or even open the log data.

                  Sorry for the lack of experience here.

                  gskgerG 1 Reply Last reply Reply Quote 0
                  • gskgerG Offline
                    gskger Top contributor @glenlewis09
                    last edited by

                    @glenlewis09 You can use WinSCP to connect to your host via SSH and browse the filesystem. WinSCP is a file manager and works in a similar way to Windows Explorer. Bonus: there is also a portable version.

                    J 1 Reply Last reply Reply Quote 0
                    • J Offline
                      john.c @gskger
                      last edited by john.c

                      @gskger said in Issue with VM network dropping in and out:

                      @glenlewis09 You can use WinSCP to connect to your host via SSH and browse the filesystem. WinSCP is a file manager and works in a similar way to Windows Explorer. Bonus: there is also a portable version.

                      @gskger @glenlewis09 You can also use use scp directly in PowerShell if on a up to date Windows 10 or Windows 11. You will need the SSH Client feature installed (enabled) though and running PowerShell 5 (or later).

                      1 Reply Last reply Reply Quote 1
                      • G Offline
                        glenlewis09
                        last edited by

                        Thank you so much, that is a great insight. Now I have the logs there are 100+ log files. Which file are you wanting me to look at.

                        gskgerG 1 Reply Last reply Reply Quote 0
                        • gskgerG Offline
                          gskger Top contributor @glenlewis09
                          last edited by gskger

                          @glenlewis09 Check out the troubleshooting guide on logs in the documentation. @Andrew was talking about the kernel message logs. At the CLI, you could also type dmsg to show the kernel messages since the last boot. You could narrow the result down further with dmesg | grep -i eth.

                          1 Reply Last reply Reply Quote 0
                          • A Offline
                            Andrew Top contributor @glenlewis09
                            last edited by

                            @glenlewis09 I'm running Windows 10 and 11 and it works correctly.

                            Are you only having problems with Windows 2019 and 2022 ?

                            G 1 Reply Last reply Reply Quote 0
                            • A Offline
                              Andrew Top contributor @glenlewis09
                              last edited by

                              @glenlewis09 I don't have the same system as you (also XCP 8.3), but I do have r8125 cards and a 2.5G switch. I installed Windows Server 2022 and have no problems keeping a Remote Desktop connection open and watching YouTube videos (with audio)...

                              I'll have to try it on my AMD system with the r8125 next.

                              1 Reply Last reply Reply Quote 0
                              • G Offline
                                glenlewis09
                                last edited by olivierlambert

                                @gskger said in Issue with VM network dropping in and out:

                                dmesg | grep -i eth

                                Last login: Wed Jan 31 09:11:37 2024 from 192.168.20.168
                                [13:59 GLS-XENHOST08 ~]#  dmesg | grep -i eth
                                [    1.042384] xen_netfront: Initialising Xen virtual ethernet driver
                                [    2.735978] r8125 Ethernet controller driver 9.012.03-NAPI-PTP-RSS loaded
                                [    3.823280] ACPI Error: Method parse/execution failed \_SB.UBTC._DSM, AE_NOT_FOUND (20180810/psparse-516)
                                [    3.860193] r8125 0000:02:00.0 side-2697-eth0: renamed from eth0
                                [    5.989957] r8125 0000:02:00.0 eth0: renamed from side-2697-eth0
                                [    7.086183] eth0: 0xffffc90040540000, 38:f7:cd:c6:d5:82, IRQ 177
                                [    7.114826] r8125 0000:02:00.0 eth0: registered PHC device on eth0
                                [    7.114829] r8125 0000:02:00.0 eth0: reset PHC clock
                                [    7.148004] device eth0 entered promiscuous mode
                                [    9.927534] r8125: eth0: link up
                                [133076.658774] NETDEV WATCHDOG: eth0 (r8125): transmit queue 1 timed out
                                [133076.685550] r8125 0000:02:00.0 eth0: reset PHC clock
                                [133076.706192] r8125: eth0: link down
                                [133079.932108] r8125: eth0: link up
                                [13:59GLS-XENHOST08~]#
                                

                                These are the logs for the NIC

                                gskgerG 1 Reply Last reply Reply Quote 0
                                • G Offline
                                  glenlewis09 @Andrew
                                  last edited by

                                  @Andrew

                                  It happens across all Windows OS, WIN 11/ Server 2016/2019/2022

                                  But on one of the VM it never has issue.

                                  But if I move the VM back to the other Host with the 1GB nic all the VM behave correctly.

                                  Thank you.

                                  1 Reply Last reply Reply Quote 0
                                  • gskgerG Offline
                                    gskger Top contributor @glenlewis09
                                    last edited by gskger

                                    @glenlewis09 Just to get more information on the NIC in your system, can you first identify the NICs with lspci | grep Ethernet (returns the ID 00:1f.6 on my system)

                                    # lspci | grep Ethernet
                                    00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (11) I219-LM
                                    

                                    and than get more details with lspci -s 00:1f.6 -vv using the ID of your NIC

                                    # lspci -s 00:1f.6 -vv
                                    00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (11) I219-LM
                                            Subsystem: Hewlett-Packard Company Device 8715
                                            Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
                                            Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
                                            Latency: 0
                                            Interrupt: pin A routed to IRQ 212
                                            Region 0: Memory at e1200000 (32-bit, non-prefetchable) [size=128K]
                                            Capabilities: [c8] Power Management version 3
                                                    Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                                                    Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
                                            Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
                                                    Address: 00000000fee00f98  Data: 0000
                                            Kernel driver in use: e1000e
                                            Kernel modules: e1000e
                                    

                                    While this does not address your issue, it gives more insight into your setup.

                                    Edit: some typos

                                    G 1 Reply Last reply Reply Quote 0
                                    • G Offline
                                      glenlewis09 @gskger
                                      last edited by glenlewis09

                                      @gskger [14:39 GLS-XENHOST08 ~]#  lspci | grep Ethernet
                                      02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8125 2.5GbE Controller (rev 05)
                                      [14:39 GLS-XENHOST08 ~]# ^C
                                      [14:39 GLS-XENHOST08 ~]# lspci -s 02:00.0 -vvv
                                      02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8125 2.5GbE Controller (rev 05)
                                              Subsystem: Realtek Semiconductor Co., Ltd. Device 0123
                                              Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
                                              Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
                                              Latency: 0, Cache Line Size: 64 bytes
                                              Interrupt: pin A routed to IRQ 36
                                              Region 0: I/O ports at f000 [size=256]
                                              Region 2: Memory at fce00000 (64-bit, non-prefetchable) [size=64K]
                                              Region 4: Memory at fce10000 (64-bit, non-prefetchable) [size=16K]
                                              Capabilities: [40] Power Management version 3
                                                      Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
                                                      Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
                                              Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
                                                      Address: 0000000000000000  Data: 0000
                                                      Masking: 00000000  Pending: 00000000
                                              Capabilities: [70] Express (v2) Endpoint, MSI 01
                                                      DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
                                                              ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 75.000W
                                                      DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                                                              RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop-
                                                              MaxPayload 256 bytes, MaxReadReq 2048 bytes
                                                      DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr+ TransPend-
                                                      LnkCap: Port #0, Speed 5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s unlimited, L1 <64us
                                                              ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
                                                      LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
                                                              ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
                                                      LnkSta: Speed 5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                                                      DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+, OBFF Via message/WAKE#
                                                      DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled
                                                      LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
                                                               Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                                                               Compliance De-emphasis: -6dB
                                                      LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
                                                               EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
                                              Capabilities: [b0] MSI-X: Enable+ Count=32 Masked-
                                                      Vector table: BAR=4 offset=00000000
                                                      PBA: BAR=4 offset=00000800
                                              Capabilities: [d0] Vital Product Data
                                      pcilib: sysfs_read_vpd: read failed: Input/output error
                                                      Not readable
                                              Capabilities: [100 v2] Advanced Error Reporting
                                                      UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                                                      UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
                                                      UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                                                      CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                                                      CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                                                      AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
                                              Capabilities: [148 v1] Virtual Channel
                                                      Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
                                                      Arb:    Fixed- WRR32- WRR64- WRR128-
                                                      Ctrl:   ArbSelect=Fixed
                                                      Status: InProgress-
                                                      VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
                                                              Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
                                                              Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=01
                                                              Status: NegoPending- InProgress-
                                              Capabilities: [168 v1] Device Serial Number 01-00-00-00-68-4c-e0-00
                                              Capabilities: [178 v1] Transaction Processing Hints
                                                      No steering table available
                                              Capabilities: [204 v1] Latency Tolerance Reporting
                                                      Max snoop latency: 1048576ns
                                                      Max no snoop latency: 1048576ns
                                              Capabilities: [20c v1] L1 PM Substates
                                                      L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
                                                                PortCommonModeRestoreTime=150us PortTPowerOnTime=150us
                                              Capabilities: [21c v1] Vendor Specific Information: ID=0002 Rev=4 Len=100 <?>
                                              Kernel driver in use: r8125
                                              Kernel modules: r8125
                                      
                                      [14:40GLS-XENHOST08~]#
                                      
                                      gskgerG 1 Reply Last reply Reply Quote 0
                                      • gskgerG Offline
                                        gskger Top contributor @glenlewis09
                                        last edited by

                                        @glenlewis09 Can you please edit your post and format the output as code (insert ``` before and after the output)? This improves readability.

                                        G 1 Reply Last reply Reply Quote 0
                                        • G Offline
                                          glenlewis09 @gskger
                                          last edited by

                                          @gskger done, thank you for the correction.

                                          gskgerG 1 Reply Last reply Reply Quote 1
                                          • gskgerG Offline
                                            gskger Top contributor @glenlewis09
                                            last edited by gskger

                                            @glenlewis09 Again just to double check: your XCP-ng 8.2.1 is fully up-to-date (yum update returns No packages marked for update)? The refreshed 8.2.1 ISO from December 2023 contained updated drivers contributed by @Andrew, including the r8125 driver.

                                            G 1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post