XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Issue with VM network dropping in and out

    Scheduled Pinned Locked Moved Hardware
    38 Posts 6 Posters 6.7k Views 6 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • G Offline
      glenlewis09 @gskger
      last edited by glenlewis09

      @gskger [14:39 GLS-XENHOST08 ~]#  lspci | grep Ethernet
      02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8125 2.5GbE Controller (rev 05)
      [14:39 GLS-XENHOST08 ~]# ^C
      [14:39 GLS-XENHOST08 ~]# lspci -s 02:00.0 -vvv
      02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8125 2.5GbE Controller (rev 05)
              Subsystem: Realtek Semiconductor Co., Ltd. Device 0123
              Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
              Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
              Latency: 0, Cache Line Size: 64 bytes
              Interrupt: pin A routed to IRQ 36
              Region 0: I/O ports at f000 [size=256]
              Region 2: Memory at fce00000 (64-bit, non-prefetchable) [size=64K]
              Region 4: Memory at fce10000 (64-bit, non-prefetchable) [size=16K]
              Capabilities: [40] Power Management version 3
                      Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
                      Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
              Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
                      Address: 0000000000000000  Data: 0000
                      Masking: 00000000  Pending: 00000000
              Capabilities: [70] Express (v2) Endpoint, MSI 01
                      DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
                              ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 75.000W
                      DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                              RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop-
                              MaxPayload 256 bytes, MaxReadReq 2048 bytes
                      DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr+ TransPend-
                      LnkCap: Port #0, Speed 5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s unlimited, L1 <64us
                              ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
                      LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
                              ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
                      LnkSta: Speed 5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                      DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+, OBFF Via message/WAKE#
                      DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled
                      LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
                               Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                               Compliance De-emphasis: -6dB
                      LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
                               EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
              Capabilities: [b0] MSI-X: Enable+ Count=32 Masked-
                      Vector table: BAR=4 offset=00000000
                      PBA: BAR=4 offset=00000800
              Capabilities: [d0] Vital Product Data
      pcilib: sysfs_read_vpd: read failed: Input/output error
                      Not readable
              Capabilities: [100 v2] Advanced Error Reporting
                      UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                      UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
                      UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                      CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                      CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                      AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
              Capabilities: [148 v1] Virtual Channel
                      Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
                      Arb:    Fixed- WRR32- WRR64- WRR128-
                      Ctrl:   ArbSelect=Fixed
                      Status: InProgress-
                      VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
                              Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
                              Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=01
                              Status: NegoPending- InProgress-
              Capabilities: [168 v1] Device Serial Number 01-00-00-00-68-4c-e0-00
              Capabilities: [178 v1] Transaction Processing Hints
                      No steering table available
              Capabilities: [204 v1] Latency Tolerance Reporting
                      Max snoop latency: 1048576ns
                      Max no snoop latency: 1048576ns
              Capabilities: [20c v1] L1 PM Substates
                      L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
                                PortCommonModeRestoreTime=150us PortTPowerOnTime=150us
              Capabilities: [21c v1] Vendor Specific Information: ID=0002 Rev=4 Len=100 <?>
              Kernel driver in use: r8125
              Kernel modules: r8125
      
      [14:40GLS-XENHOST08~]#
      
      gskgerG 1 Reply Last reply Reply Quote 0
      • gskgerG Offline
        gskger Top contributor @glenlewis09
        last edited by

        @glenlewis09 Can you please edit your post and format the output as code (insert ``` before and after the output)? This improves readability.

        G 1 Reply Last reply Reply Quote 0
        • G Offline
          glenlewis09 @gskger
          last edited by

          @gskger done, thank you for the correction.

          gskgerG 1 Reply Last reply Reply Quote 1
          • gskgerG Offline
            gskger Top contributor @glenlewis09
            last edited by gskger

            @glenlewis09 Again just to double check: your XCP-ng 8.2.1 is fully up-to-date (yum update returns No packages marked for update)? The refreshed 8.2.1 ISO from December 2023 contained updated drivers contributed by @Andrew, including the r8125 driver.

            G 1 Reply Last reply Reply Quote 0
            • G Offline
              glenlewis09 @gskger
              last edited by

              @gskger said in Issue with VM network dropping in and out:

              yum update

              [15:25 GLS-XENHOST08 ~]# yum update
              Loaded plugins: fastestmirror
              Loading mirror speeds from cached hostfile
              Excluding mirror: updates.xcp-ng.org
               * xcp-ng-base: mirrors.xcp-ng.org
              Excluding mirror: updates.xcp-ng.org
               * xcp-ng-updates: mirrors.xcp-ng.org
              No packages marked for update
              [15:25GLS-XENHOST08~]#
              
              
              gskgerG 1 Reply Last reply Reply Quote 0
              • gskgerG Offline
                gskger Top contributor @glenlewis09
                last edited by

                @glenlewis09 The only thing that realy stands out is this error message:

                pcilib: sysfs_read_vpd: read failed: Input/output error
                                Not readable
                

                Can you please try dmesg | grep VPD and report the output (if any)?

                G 1 Reply Last reply Reply Quote 0
                • G Offline
                  glenlewis09 @gskger
                  last edited by

                  @gskger said in Issue with VM network dropping in and out:

                  dmesg | grep VPD

                  [15:36 GLS-XENHOST08 ~]#  dmesg | grep VPD
                  [    5.967152] r8125 0000:02:00.0: invalid short VPD tag 00 at offset 1
                  [15:36GLS-XENHOST08~]#
                  
                  
                  A 1 Reply Last reply Reply Quote 0
                  • A Offline
                    Andrew Top contributor @glenlewis09
                    last edited by

                    @glenlewis09 @gskger

                    I seem to be able to reproduce the Windows 2022 RDP issue on my XCP 8.2.1/AMD/r8125 box (sometimes). It does not seem to happen on my other intel systems or with other OS's.... I'll see what I can find/fix.

                    I don't think the VPD warning is an actual problem.

                    G gskgerG 2 Replies Last reply Reply Quote 1
                    • G Offline
                      glenlewis09 @Andrew
                      last edited by

                      @Andrew

                      Thank you, it is driving me crazy at first. I though my switch was bad so I bought a new one just incase. Then It didn't solve it so I though my fiber was causing TX/RX issues so I replaced it.

                      I am glad you can somewhat reproduce the error.

                      A 1 Reply Last reply Reply Quote 0
                      • gskgerG Offline
                        gskger Top contributor @Andrew
                        last edited by

                        @Andrew said in Issue with VM network dropping in and out:

                        I don't think the VPD warning is an actual problem.

                        Yes, I don't think so either.

                        1 Reply Last reply Reply Quote 0
                        • A Offline
                          Andrew Top contributor @glenlewis09
                          last edited by Andrew

                          @glenlewis09 @gskger

                          Welcome to using cheap commodity hardware for server class uses...

                          It seems the r8125 acts differently on my lab test machines. I don't know why the same chip and driver acts differently. It seems the tx-checksumming is forced off by for one system, and on by default for another. It seems this is a problem for the r8125, but not other chipsets and affects some OS's but not all. It seems there must be a bug in the vendor driver code....

                          You can make the feature change on the XCP command line using: ethtool -K eth0 tx off tso off to see if it fixes the problem. Please let me know if that fixes the Win 2022 RDP issue for you (it does for me). A host reboot will revert the change.

                          G M 2 Replies Last reply Reply Quote 2
                          • G Offline
                            glenlewis09 @Andrew
                            last edited by

                            @Andrew

                            I think you did find a fix; how can I apply this to the host, so it has this setting on each reboot?

                            So far, no issues at all with the network! Truly I am happy and even if I have to run this code on each boot up at least the mini pc isn't useless.

                            A 1 Reply Last reply Reply Quote 0
                            • A Offline
                              Andrew Top contributor @glenlewis09
                              last edited by

                              @glenlewis09 I'll look at changing the default in the XCP r8125 driver until Realtek can fix it (which may be never).

                              G 1 Reply Last reply Reply Quote 0
                              • G Offline
                                glenlewis09 @Andrew
                                last edited by

                                @Andrew @gskger @olivierlambert @john-c

                                Truly Thank you so much for helping me resolve this issue.

                                I know I have to run the command on each reboot, but it is much better than having hardware I can't even use.

                                To all that help me and walked me through the process thank you 😊

                                I know you all have much on your plates but to take the time to dive into this problem is awesome!

                                If you do release an update to the driver let me know so I can test it. I owe you for the help!

                                A 1 Reply Last reply Reply Quote 0
                                • A Offline
                                  Andrew Top contributor @glenlewis09
                                  last edited by

                                  @glenlewis09 You can download an updated driver test from my page.

                                  Just log in to the XCP host(s), use wget to download the RPM files and then use yum install r8125.... to install it and reboot.

                                  G 1 Reply Last reply Reply Quote 1
                                  • olivierlambertO Offline
                                    olivierlambert Vates 🪐 Co-Founder CEO
                                    last edited by

                                    Darn crappy hardware, I'm not surprised but Realtek never disappoint 😆

                                    1 Reply Last reply Reply Quote 0
                                    • G Offline
                                      glenlewis09 @Andrew
                                      last edited by

                                      @Andrew said in Issue with VM network dropping in and out:

                                      yum install r8125

                                      What URL should I use for the wget, I guess this allows me to add a repository for the yum update to download from?

                                      I tried: wget https://xcp-ng.org which was just a random try.

                                      But that was most definitely not the correct URL. I don't think the index page will help me much lol.

                                      Again thanks for all the help on this.

                                      gskgerG 1 Reply Last reply Reply Quote 0
                                      • gskgerG Offline
                                        gskger Top contributor @glenlewis09
                                        last edited by

                                        @glenlewis09 There is a my page link in Andrews post?

                                        G 1 Reply Last reply Reply Quote 0
                                        • G Offline
                                          glenlewis09 @gskger
                                          last edited by

                                          @gskger @Andrew

                                          I will drink more coffee next time, I was excited and missed my page link.

                                          Thank you again.

                                          1 Reply Last reply Reply Quote 0
                                          • M Offline
                                            Mt_KEGan @Andrew
                                            last edited by Mt_KEGan

                                            @Andrew,

                                            I realize this is an old topic, but found it to be VERY helpful and maybe it will be for someone else.

                                            I decided to move to mini pcs (GMKTEC NucBox K6s) for my homelab/XCP-ng pool.

                                            Although my RDP was pretty solid, and the VM behaved properly via my XOA interface, my network was CLEARLY not working with my Realtek 8125 NIC drivers while on 8.3. SMB transfers getting stuck (basically not working at all), and my pre-Xentools internet speed test was at ~200mbps download, while post Xentools I was only getting 2-7mbps (Xentools not XCP-ng pre-release guest tools). I'm sure an iperf test would've revealed the same problem with a local transfer as well. A problem no doubt! I found that the ethtool -K eth0 tx off tso off command corrected my issue but doesn't persist on the host. Bummer.

                                            My interfaces basically matched the outputs already performed by the op.

                                            My system is fully up to date and patched, yet the yum installer stated my r8125-module-9.012.04-1.xcpng8.3.x86_64 was "already installed and latest version" and I had "Nothing to do". Well this simply isn't correct for my use case!

                                            I used Andrew's modified rpm package (notice the "-2" vs the "-1") in which he modified "with TXcsum/SG/TSO disabled by default. This fixes problems with Windows Server VMs an on-board RTL8125 chips".

                                            Once I ran (as he suggested): wget https://users.ntplx.net/~andrew/xcp/r8125-module-9.012.04-2.xcpng8.3.x86_64.rpm & yum install r8125-module-9.012.04-2.xcpng8.3.x86_64.rpm my problems simply went away.

                                            So I post this to help others (is it just me with this problem on the whole internet?), and pose the question to Vates: why is the updated installer not considered the most up to date package in your repo (in other words the ...-2 vs the ...-1)?

                                            I can only assume that my issue (by using cheap mini pcs...which is MY fault of course!) is limited and the team doesn't want "TXcsum/SG/TSO disabled by default" for the masses using Realtek 8125 drivers. If that is the case, makes sense. I won't pretend to be smart enough to know the implications of doing that globally for all of 8.3.

                                            Am I correct in assuming this? Is there something I'm missing?

                                            Thank you @olivierlambert, Andrew, @gskger, and all others associated with the Vates team for helping the homelab community (smart because I also have a real enterprise I.T. sysadmin job and if left to me I'd use officially supported XCP-ng instead of cough hyper-v) and providing an open source product which is developed in a smart, planned way that works beautifully. Your contributions don't go unnoticed even though your community may have a problem or two every once in a while that we don't understand. ...Or bring upon ourselves 😉

                                            A 1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post