XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Issue with VM network dropping in and out

    Scheduled Pinned Locked Moved Hardware
    39 Posts 7 Posters 6.7k Views 6 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • gskgerG Offline
      gskger Top contributor @glenlewis09
      last edited by

      @glenlewis09 You can use WinSCP to connect to your host via SSH and browse the filesystem. WinSCP is a file manager and works in a similar way to Windows Explorer. Bonus: there is also a portable version.

      J 1 Reply Last reply Reply Quote 0
      • J Offline
        john.c @gskger
        last edited by john.c

        @gskger said in Issue with VM network dropping in and out:

        @glenlewis09 You can use WinSCP to connect to your host via SSH and browse the filesystem. WinSCP is a file manager and works in a similar way to Windows Explorer. Bonus: there is also a portable version.

        @gskger @glenlewis09 You can also use use scp directly in PowerShell if on a up to date Windows 10 or Windows 11. You will need the SSH Client feature installed (enabled) though and running PowerShell 5 (or later).

        1 Reply Last reply Reply Quote 1
        • G Offline
          glenlewis09
          last edited by

          Thank you so much, that is a great insight. Now I have the logs there are 100+ log files. Which file are you wanting me to look at.

          gskgerG 1 Reply Last reply Reply Quote 0
          • gskgerG Offline
            gskger Top contributor @glenlewis09
            last edited by gskger

            @glenlewis09 Check out the troubleshooting guide on logs in the documentation. @Andrew was talking about the kernel message logs. At the CLI, you could also type dmsg to show the kernel messages since the last boot. You could narrow the result down further with dmesg | grep -i eth.

            1 Reply Last reply Reply Quote 0
            • A Offline
              Andrew Top contributor @glenlewis09
              last edited by

              @glenlewis09 I'm running Windows 10 and 11 and it works correctly.

              Are you only having problems with Windows 2019 and 2022 ?

              G 1 Reply Last reply Reply Quote 0
              • A Offline
                Andrew Top contributor @glenlewis09
                last edited by

                @glenlewis09 I don't have the same system as you (also XCP 8.3), but I do have r8125 cards and a 2.5G switch. I installed Windows Server 2022 and have no problems keeping a Remote Desktop connection open and watching YouTube videos (with audio)...

                I'll have to try it on my AMD system with the r8125 next.

                1 Reply Last reply Reply Quote 0
                • G Offline
                  glenlewis09
                  last edited by olivierlambert

                  @gskger said in Issue with VM network dropping in and out:

                  dmesg | grep -i eth

                  Last login: Wed Jan 31 09:11:37 2024 from 192.168.20.168
                  [13:59 GLS-XENHOST08 ~]#  dmesg | grep -i eth
                  [    1.042384] xen_netfront: Initialising Xen virtual ethernet driver
                  [    2.735978] r8125 Ethernet controller driver 9.012.03-NAPI-PTP-RSS loaded
                  [    3.823280] ACPI Error: Method parse/execution failed \_SB.UBTC._DSM, AE_NOT_FOUND (20180810/psparse-516)
                  [    3.860193] r8125 0000:02:00.0 side-2697-eth0: renamed from eth0
                  [    5.989957] r8125 0000:02:00.0 eth0: renamed from side-2697-eth0
                  [    7.086183] eth0: 0xffffc90040540000, 38:f7:cd:c6:d5:82, IRQ 177
                  [    7.114826] r8125 0000:02:00.0 eth0: registered PHC device on eth0
                  [    7.114829] r8125 0000:02:00.0 eth0: reset PHC clock
                  [    7.148004] device eth0 entered promiscuous mode
                  [    9.927534] r8125: eth0: link up
                  [133076.658774] NETDEV WATCHDOG: eth0 (r8125): transmit queue 1 timed out
                  [133076.685550] r8125 0000:02:00.0 eth0: reset PHC clock
                  [133076.706192] r8125: eth0: link down
                  [133079.932108] r8125: eth0: link up
                  [13:59GLS-XENHOST08~]#
                  

                  These are the logs for the NIC

                  gskgerG 1 Reply Last reply Reply Quote 0
                  • G Offline
                    glenlewis09 @Andrew
                    last edited by

                    @Andrew

                    It happens across all Windows OS, WIN 11/ Server 2016/2019/2022

                    But on one of the VM it never has issue.

                    But if I move the VM back to the other Host with the 1GB nic all the VM behave correctly.

                    Thank you.

                    1 Reply Last reply Reply Quote 0
                    • gskgerG Offline
                      gskger Top contributor @glenlewis09
                      last edited by gskger

                      @glenlewis09 Just to get more information on the NIC in your system, can you first identify the NICs with lspci | grep Ethernet (returns the ID 00:1f.6 on my system)

                      # lspci | grep Ethernet
                      00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (11) I219-LM
                      

                      and than get more details with lspci -s 00:1f.6 -vv using the ID of your NIC

                      # lspci -s 00:1f.6 -vv
                      00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (11) I219-LM
                              Subsystem: Hewlett-Packard Company Device 8715
                              Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
                              Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
                              Latency: 0
                              Interrupt: pin A routed to IRQ 212
                              Region 0: Memory at e1200000 (32-bit, non-prefetchable) [size=128K]
                              Capabilities: [c8] Power Management version 3
                                      Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                                      Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
                              Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
                                      Address: 00000000fee00f98  Data: 0000
                              Kernel driver in use: e1000e
                              Kernel modules: e1000e
                      

                      While this does not address your issue, it gives more insight into your setup.

                      Edit: some typos

                      G 1 Reply Last reply Reply Quote 0
                      • G Offline
                        glenlewis09 @gskger
                        last edited by glenlewis09

                        @gskger [14:39 GLS-XENHOST08 ~]#  lspci | grep Ethernet
                        02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8125 2.5GbE Controller (rev 05)
                        [14:39 GLS-XENHOST08 ~]# ^C
                        [14:39 GLS-XENHOST08 ~]# lspci -s 02:00.0 -vvv
                        02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8125 2.5GbE Controller (rev 05)
                                Subsystem: Realtek Semiconductor Co., Ltd. Device 0123
                                Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
                                Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
                                Latency: 0, Cache Line Size: 64 bytes
                                Interrupt: pin A routed to IRQ 36
                                Region 0: I/O ports at f000 [size=256]
                                Region 2: Memory at fce00000 (64-bit, non-prefetchable) [size=64K]
                                Region 4: Memory at fce10000 (64-bit, non-prefetchable) [size=16K]
                                Capabilities: [40] Power Management version 3
                                        Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
                                        Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
                                Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
                                        Address: 0000000000000000  Data: 0000
                                        Masking: 00000000  Pending: 00000000
                                Capabilities: [70] Express (v2) Endpoint, MSI 01
                                        DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
                                                ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 75.000W
                                        DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                                                RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop-
                                                MaxPayload 256 bytes, MaxReadReq 2048 bytes
                                        DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr+ TransPend-
                                        LnkCap: Port #0, Speed 5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s unlimited, L1 <64us
                                                ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
                                        LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
                                                ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
                                        LnkSta: Speed 5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                                        DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+, OBFF Via message/WAKE#
                                        DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled
                                        LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
                                                 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                                                 Compliance De-emphasis: -6dB
                                        LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
                                                 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
                                Capabilities: [b0] MSI-X: Enable+ Count=32 Masked-
                                        Vector table: BAR=4 offset=00000000
                                        PBA: BAR=4 offset=00000800
                                Capabilities: [d0] Vital Product Data
                        pcilib: sysfs_read_vpd: read failed: Input/output error
                                        Not readable
                                Capabilities: [100 v2] Advanced Error Reporting
                                        UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                                        UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
                                        UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                                        CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                                        CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                                        AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
                                Capabilities: [148 v1] Virtual Channel
                                        Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
                                        Arb:    Fixed- WRR32- WRR64- WRR128-
                                        Ctrl:   ArbSelect=Fixed
                                        Status: InProgress-
                                        VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
                                                Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
                                                Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=01
                                                Status: NegoPending- InProgress-
                                Capabilities: [168 v1] Device Serial Number 01-00-00-00-68-4c-e0-00
                                Capabilities: [178 v1] Transaction Processing Hints
                                        No steering table available
                                Capabilities: [204 v1] Latency Tolerance Reporting
                                        Max snoop latency: 1048576ns
                                        Max no snoop latency: 1048576ns
                                Capabilities: [20c v1] L1 PM Substates
                                        L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
                                                  PortCommonModeRestoreTime=150us PortTPowerOnTime=150us
                                Capabilities: [21c v1] Vendor Specific Information: ID=0002 Rev=4 Len=100 <?>
                                Kernel driver in use: r8125
                                Kernel modules: r8125
                        
                        [14:40GLS-XENHOST08~]#
                        
                        gskgerG 1 Reply Last reply Reply Quote 0
                        • gskgerG Offline
                          gskger Top contributor @glenlewis09
                          last edited by

                          @glenlewis09 Can you please edit your post and format the output as code (insert ``` before and after the output)? This improves readability.

                          G 1 Reply Last reply Reply Quote 0
                          • G Offline
                            glenlewis09 @gskger
                            last edited by

                            @gskger done, thank you for the correction.

                            gskgerG 1 Reply Last reply Reply Quote 1
                            • gskgerG Offline
                              gskger Top contributor @glenlewis09
                              last edited by gskger

                              @glenlewis09 Again just to double check: your XCP-ng 8.2.1 is fully up-to-date (yum update returns No packages marked for update)? The refreshed 8.2.1 ISO from December 2023 contained updated drivers contributed by @Andrew, including the r8125 driver.

                              G 1 Reply Last reply Reply Quote 0
                              • G Offline
                                glenlewis09 @gskger
                                last edited by

                                @gskger said in Issue with VM network dropping in and out:

                                yum update

                                [15:25 GLS-XENHOST08 ~]# yum update
                                Loaded plugins: fastestmirror
                                Loading mirror speeds from cached hostfile
                                Excluding mirror: updates.xcp-ng.org
                                 * xcp-ng-base: mirrors.xcp-ng.org
                                Excluding mirror: updates.xcp-ng.org
                                 * xcp-ng-updates: mirrors.xcp-ng.org
                                No packages marked for update
                                [15:25GLS-XENHOST08~]#
                                
                                
                                gskgerG 1 Reply Last reply Reply Quote 0
                                • gskgerG Offline
                                  gskger Top contributor @glenlewis09
                                  last edited by

                                  @glenlewis09 The only thing that realy stands out is this error message:

                                  pcilib: sysfs_read_vpd: read failed: Input/output error
                                                  Not readable
                                  

                                  Can you please try dmesg | grep VPD and report the output (if any)?

                                  G 1 Reply Last reply Reply Quote 0
                                  • G Offline
                                    glenlewis09 @gskger
                                    last edited by

                                    @gskger said in Issue with VM network dropping in and out:

                                    dmesg | grep VPD

                                    [15:36 GLS-XENHOST08 ~]#  dmesg | grep VPD
                                    [    5.967152] r8125 0000:02:00.0: invalid short VPD tag 00 at offset 1
                                    [15:36GLS-XENHOST08~]#
                                    
                                    
                                    A 1 Reply Last reply Reply Quote 0
                                    • A Offline
                                      Andrew Top contributor @glenlewis09
                                      last edited by

                                      @glenlewis09 @gskger

                                      I seem to be able to reproduce the Windows 2022 RDP issue on my XCP 8.2.1/AMD/r8125 box (sometimes). It does not seem to happen on my other intel systems or with other OS's.... I'll see what I can find/fix.

                                      I don't think the VPD warning is an actual problem.

                                      G gskgerG 2 Replies Last reply Reply Quote 1
                                      • G Offline
                                        glenlewis09 @Andrew
                                        last edited by

                                        @Andrew

                                        Thank you, it is driving me crazy at first. I though my switch was bad so I bought a new one just incase. Then It didn't solve it so I though my fiber was causing TX/RX issues so I replaced it.

                                        I am glad you can somewhat reproduce the error.

                                        A 1 Reply Last reply Reply Quote 0
                                        • gskgerG Offline
                                          gskger Top contributor @Andrew
                                          last edited by

                                          @Andrew said in Issue with VM network dropping in and out:

                                          I don't think the VPD warning is an actual problem.

                                          Yes, I don't think so either.

                                          1 Reply Last reply Reply Quote 0
                                          • A Offline
                                            Andrew Top contributor @glenlewis09
                                            last edited by Andrew

                                            @glenlewis09 @gskger

                                            Welcome to using cheap commodity hardware for server class uses...

                                            It seems the r8125 acts differently on my lab test machines. I don't know why the same chip and driver acts differently. It seems the tx-checksumming is forced off by for one system, and on by default for another. It seems this is a problem for the r8125, but not other chipsets and affects some OS's but not all. It seems there must be a bug in the vendor driver code....

                                            You can make the feature change on the XCP command line using: ethtool -K eth0 tx off tso off to see if it fixes the problem. Please let me know if that fixes the Win 2022 RDP issue for you (it does for me). A host reboot will revert the change.

                                            G M 2 Replies Last reply Reply Quote 2
                                            • First post
                                              Last post