XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Intel i40e drivers not working with X710-T2L on kernel-alt

    Scheduled Pinned Locked Moved Hardware
    5 Posts 3 Posters 70 Views 2 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • K Offline
      Kreeblah
      last edited by

      I have an ASRockRack W480D4U (BIOS version L2.23, BMC version 1.02) with an Intel W-1290P in it, 4x Samsung M391A4G43AB1-CVF RAM modules, 2x Solidigm P44 Pro NVMe drives, 2x Samsung 860 Pro SATA drives, a Sparkle A310 Eco, and an Intel X710-T2L NIC (firmware version 9.50).

      XCP-ng is new for me, as I left ESXi when Broadcom killed the free version, and I haven't been entirely happy with Proxmox. Both of those solutions worked for me on this hardware, but I'm hoping that XCP-ng might be something I could use longer-term without upgrade issues.

      However, what I'm finding with version 8.3 LTS and the full set of updates packages installed is that I'm seeing erroneous CPU thermal shutdowns with XCP-ng. I saw this previously when running FreeBSD 13.2-RELEASE on this same hardware (though before I added the A310 card), but never had an issue, but ESXi and Proxmox never had any issues. Additionally, I never see temperatures get above the mid-60s, and I can run the CPU at 100% indefinitely on one of those systems without any issues. I also never see any temperature warnings. The host just spontaneously shuts down, and I see a CPU_THERMTRIP event in the BMC event history.

      Unfortunately, ASRockRack's support can't help me with this, so I was hoping that maybe there was a fix in kernel-alt that might resolve my issue. What I'm running into with that, though, is that the kernel boots just fine, and picks up my NIC without issue, but never actually starts allows traffic on the NIC.

      I have a manual/static IP setup for the management IP in XCP-ng, and with the normal kernel, I can reach it as soon as the console indicates it's up and ready. However, with kernel-alt, it never works at all. The console indicates it's up, but I can't ping it or SSH to it, and if I try to ping the gateway from the console, it just times out (though pinging localhost works). Similarly, VMs aren't able to use the network either, from the other port on the same card.

      Is there something I need to do differently for the i40e drivers in kernel-alt to work with my NIC so I can see whether the erroneous thermal shutdown issues are resolved?

      F 1 Reply Last reply Reply Quote 0
      • F Offline
        flakpyro @Kreeblah
        last edited by

        @Kreeblah Not related to your i40e issue but when you are running the standard kernel, before the system shuts down due to a thermal shutdown do you see anything when running the command "xl dmesg" from ssh? I have seen xen thermal throttle cpu cores there and report when it happens. On some SFF systems (Minisforum MS-01) i have simply adjusted the boost clocks (which you can do from inside xcp-ng) to stop it from happening. Though you mention you are not going over 60C which should not trigger any throttling.

        K 1 Reply Last reply Reply Quote 0
        • K Offline
          Kreeblah @flakpyro
          last edited by

          @flakpyro It's hard to say what happens right before one of these events, as they really do come out of nowhere. But, just running that command right now, I'm not seeing any sort of throttling having happened since I last booted this system about 8 hours ago. I can check from time to time to see whether anything shows up, but I really doubt it will given the temperature situation. And, checking the CPU temp readings in the IPMI, right now they're at 29C, so, nowhere near any sort of throttling or shutdown threshold.

          1 Reply Last reply Reply Quote 0
          • gduperreyG Offline
            gduperrey Vates 🪐 XCP-ng Team
            last edited by

            The kernel-alt is there for debugging purposes and should not be used during normal operation, especially if you want to maintain optimal performance.

            Alternative driver versions may be offered instead of driver updates. This is the case for the Intel i40e. The default version is 2.25.11-2, and its alternative (-alt) version is 2.26.8-1. This is simply a more up-to-date version of the driver.

            You can therefore try this version (which is independent of the kernel-alt). It installs over the default XCP-ng installation and therefore over the standard kernel:

            yum install intel-i40e-alt
            

            Hopefully this more up-to-date version will help you.

            K 1 Reply Last reply Reply Quote 0
            • K Offline
              Kreeblah @gduperrey
              last edited by

              @gduperrey The thing is, the i40e driver works fine for me with the regular kernel. I was hoping to try kernel-alt to see whether it would resolve my erroneous CPU thermal shutdown issues (to narrow down whether it's a kernel issue or not), but then I ran into the driver issue.

              I can try that alt driver, though, and see if it works with kernel-alt for me.

              1 Reply Last reply Reply Quote 0
              • First post
                Last post