XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    [HELP] XCP-ng 4.17.5 dom0 kernel panic — page fault in TCP stack, crashdump attached

    Scheduled Pinned Locked Moved XCP-ng
    31 Posts 7 Posters 1.3k Views 4 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • D Offline
      dnikola @olivierlambert
      last edited by

      @Andrew said in [HELP] XCP-ng 4.17.5 dom0 kernel panic — page fault in TCP stack, crashdump attached:

      @dnikola Please make sure your motherboard firmware is up to date (BIOS F30e). There are a LOT of stability issues with Intel CPUs for that board and old BIOS.

      If you still have r8125 crashes, then try a newer r8125 alt version (9.016.00) from my download page and see if it works better. I gave it a quick test and it installs and works, but YMMV... You can always uninstall it.

      Ok, that will be done and I will report!
      is it possible to have some quick user guide what has to be done, in which order to process with correct install - uninstall process

      @Andrew said in [HELP] XCP-ng 4.17.5 dom0 kernel panic — page fault in TCP stack, crashdump attached:

      @dnikola As for the other card you listed, no, it's still a 8125 card. The single port 10G card (from the same site) is a AQC113 chipset, you'll need to install the atlantic-module-alt to support it. If you must have 2.5G then the Intel i225/i226 card is the other choice (not from that site).

      I appreciate all your help so far — thank you. I noticed that the only 2.5G NIC currently available locally is the one with 8125, so it was "first aid", but didn't work . Since I’ll likely need to order a replacement online (not possible to find it in our country without purchase), could you kindly recommend a reliable source or a specific NIC model (our ISP is 2,5gbps so i prefer 2,5+ card) you’d personally suggest for this purpose (eBay or what every)?

      Of course, this would be just an informal recommendation — I fully respect your experience and advice, and I completely understand it wouldn’t imply any obligation or responsibility on your part for any potential purchase issues or problems later.

      my second option is MBO replacement with intel NIC (local wholesale have few models on stock) and it will be maybe fastest option

      • ASUS PRIME Z790-A WIFI
      • MSI PRO Z790-P WIFI
      • GIGABYTE Z790 AERO G rev. 1.x
      • MSI Z790 GAMING PLUS WIFI

      Thanks again in advance — any tip would be much appreciated.

      A 1 Reply Last reply Reply Quote 0
      • A Offline
        Andrew Top contributor @dnikola
        last edited by

        @dnikola The AQC113 10G card (from your vendor) also support 2.5G with the driver loaded.

        D 1 Reply Last reply Reply Quote 0
        • D Offline
          dnikola @Andrew
          last edited by

          Hi everyone,

          Just wanted to share a follow-up on the situation discussed earlier. The issues related to Realtek NIC and stability seem resolved for now.

          Changes Made:

          • Replaced onboard Realtek NIC with Intel X520 dual-port SFP+ card
          • Installed and verified Ubiquiti UACC-DAC-SFP10-3M cable (working fine)
          • Updated BIOS to the latest version: 1820 (May 2025)

          The system seems stable now, but if you'd like to review the full dmesg output, I've uploaded it here: https://pastebin.com/6Mgt7Nir

          Now that we have a working Intel NIC, what do you recommend regarding the onboard Realtek NIC?

          Would it be best to:

          • Disable it in BIOS?
          • Leave it enabled but unused (no cable)?
          • Leave it connected to a disabled port on the managed switch?

          We’re currently leaning toward just removing it from all bridges and leaving it connected on disabled managed switch port — but we’d appreciate your thoughts on the cleanest/safest long-term solution.

          Thanks again to @olivier and @Andrew for all the help so far.
          If there’s any other suggestion you might have for additional tuning or validation now that we're stable, please let us know.

          1 Reply Last reply Reply Quote 0
          • bleaderB Offline
            bleader Vates 🪐 XCP-ng Team
            last edited by

            I think whatever solution suits you will work.

            Personally, if I know there are issues with it, I would tend to disable it in the bios, to be sure nobody tries to use it later and waste their time, in a enterprise settings, that can be important.

            One thing to keep in mind if keeping it, is that if you want to add other hosts to the pool, they will need to have similar network topology, so if you endup having eth0 and eth1 with your current management network on eth1, any new host should be able to have its management on eth1 as well. You may work around it with interface renaming, but that tends to get messy over time.

            That being said, I'm unsure even removing the realtek nic from the bios will change the interface number now that eth1 exists already and is configured.

            If you don't plan to add hosts to the pool, and don't have a team with people that may act on these machines in the future without being aware of this setup history, leaving it connected and disabling the port on switch should not be an issue.

            D 1 Reply Last reply Reply Quote 1
            • D Offline
              dnikola @bleader
              last edited by

              Hello,

              we had one crash 3h before.


              here are log files

              A 1 Reply Last reply Reply Quote 0
              • A Offline
                Andrew Top contributor @dnikola
                last edited by

                @dnikola That machine is a hot mess.

                D 1 Reply Last reply Reply Quote 0
                • D Offline
                  dnikola @Andrew
                  last edited by

                  @Andrew

                  what do you want to say with this 🙂

                  there was backup running at the time of restart...

                  D A 2 Replies Last reply Reply Quote 0
                  • D Offline
                    dinhngtu Vates 🪐 XCP-ng Team @dnikola
                    last edited by dinhngtu

                    @dnikola Intel, Family 6 Model 183, that's a 14th gen desktop chip right? 16 cores and Z690 gives me pause, there's the instability issue that this generation has esp. with unlocked chips. Do you run any overclocking? (beware of some gaming BIOSes that overclock by default)

                    1 Reply Last reply Reply Quote 0
                    • A Offline
                      Andrew Top contributor @dnikola
                      last edited by

                      @dnikola almost 300 reports of Temperature above threshold. Lots of segfault and general protection errors in Dom0.

                      It's looking bad for the physical environment and hardware in use. Overheating, CPU issues, Memory issues, USB issues, etc... They can all be related to form an unstable system, very bad for a VM server.

                      D 1 Reply Last reply Reply Quote 1
                      • D Offline
                        DustinB @Andrew
                        last edited by

                        As @Andrew said, your host itself is unhealthy, you might be able to disassemble the CPU and heatseat, clean it up and add some new paste to address the issue with the CPU overheating (if the paste is shot).

                        As for the memory issue, run a memtest on the host and see what is reported.

                        1 Reply Last reply Reply Quote 0
                        • First post
                          Last post