XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login
    1. Home
    2. ThierryEscande
    T
    Offline
    • Profile
    • Following 0
    • Followers 0
    • Topics 0
    • Posts 13
    • Groups 4

    ThierryEscande

    @ThierryEscande

    Vates 🪐 XCP-ng Team
    11
    Reputation
    16
    Profile views
    13
    Posts
    0
    Followers
    0
    Following
    Joined
    Last Online

    ThierryEscande Unfollow Follow
    Hypervisor & Kernel Team Xen Guru Vates 🪐 XCP-ng Team

    Best posts made by ThierryEscande

    • RE: High Fan Speed Issue on Lenovo ThinkSystem Servers

      Issue quick summary

      Symptoms

      On some Lenovo servers, DIMM temperature reading reports errors which results in higher fan speed.

      Known impacted systems

      • ThinkSystem SR635 V3
      • ThinkSystem SR655 V3

      Possible solutions (but not possible/recommended)

      Enable CONFIG_X86_AMD_PLATFORM_DEVICE in kernel configuration

      By enabling this driver, the hardware device is correctly initialized and this solution fixed the issue. Unfortunately, this is not a viable solution.

      Main kernel package

      Ideally, enabling CONFIG_X86_AMD_PLATFORM_DEVICE in XCP-ng main kernel would solve the issue but this is actually not possible as this also enables CONFIG_PINCTRL. Enabling CONFIG_PINCTRL modifies the device internal kernel structure, resulting in kernel ABI changes. Such changes in the kernel ABI would break all external driver packages.

      Alternate kernel package

      CONFIG_X86_AMD_PLATFORM_DEVICE is now enable in the kernel-alt package for XCP-ng 8.2 and 8.3 since we don't really care about ABI stability for this kernel.

      But this kernel is not meant to be used in production as it is less tested and no external driver package can be used with it. Therefore, even if it fixes the issue, using the kernel-alt package is highly discouraged.

      Actual workaround

      Blacklist i2c_designware_platform driver

      From my understanding, without CONFIG_X86_AMD_PLATFORM_DEVICE enabled in the kernel, some hardware initializations are missing resulting in incorrect values reported by the i2c_designware_platform driver. Preventing this driver from being probed by the kernel seems to solve the issue.

      Since this driver is built-in in the XCP-ng kernel and not compiled as a module, the blacklist method (shared by Riven in this post) is as follow:

      • Edit /etc/grub-efi.cfg and add initcall_blacklist=dw_i2c_init_driver to the line module2 /boot/vmlinuz-4.19-xen in the XCP-ng menu entry.
      ...
      menuentry 'XCP-ng' {
              search --label --set root root-pxdcvt
              multiboot2 /boot/xen.gz dom0_mem=7568M,max:7568M watchdog ucode=scan dom0_max_vcpus=1-16 crashkernel=256M,below=4G console=vga vga=mode-0x0311
              module2 /boot/vmlinuz-4.19-xen root=LABEL=root-pxdcvt ro nolvm hpet=disable rd.auto console=hvc0 console=tty0 quiet vga=785 splash plymouth.ignore-serial-consoles initcall_blacklist=dw_i2c_init_driver
              module2 /boot/initrd-4.19-xen.img
      }
      ...
      
      • Run grub-mkconfig and reboot.

      Lenovo support

      Lenovo is aware and is discussing with Citrix about this issue. As a reference, the Lenovo support entry can be found here.

      posted in Hardware
      T
      ThierryEscande
    • RE: High Fan Speed Issue on Lenovo ThinkSystem Servers

      Riven said in High Fan Speed Issue on Lenovo ThinkSystem Servers:

      grep CONFIG_X86_AMD_PLATFORM_DEVICE /boot/config-4.19.0+1
      # CONFIG_X86_AMD_PLATFORM_DEVICE is not set
      

      A rebuilt kernel with this enabled might be all is needed.

      Hi,

      We're building kernel-alt packages for both XCP-ng 8.2 and 8.3 with X86_AMD_PLATFORM_DEVICE enabled. We'll notify you when they are available for testing.

      posted in Hardware
      T
      ThierryEscande
    • RE: High Fan Speed Issue on Lenovo ThinkSystem Servers

      rmaclachlan If it's an ACPI issue and since Lenovo doesn't seem to be very cooperative, one could try to downgrade the firmware to a working version (i.e. one that runs fans at normal speed) and dump the ACPI table. Then upgrade to the latest firmware, dump the ACPI tables again, and then compare them.

      I don't know ACPI much but I can have a look if you can share them.

      The ACPI tools should be already installed on any XCP-ng host.

      • Dump the ACPI tables in binary format
        Do so in an empty folder as this produces numerous files
      # acpidump -b
      
      • Decompile the dsdt.dat file
      # iasl -e ssdt*.dat -d dsdt.dat
      
      • Do the same operations for both firmwares and share the dsdt.dsl files.
        The files are pretty big so don't hesitate to compress them before sharing.
      posted in Hardware
      T
      ThierryEscande
    • RE: High Fan Speed Issue on Lenovo ThinkSystem Servers

      LennertvdBerg To verify that the IPMI drivers are not messing up with the firmware, you could try to blacklist the IPMI modules.
      Create the file /etc/modprobe.d/blacklist-ipmi.conf containing the following and reboot

      blacklist ipmi_si
      blacklist ipmi_devintf
      blacklist ipmi_msghandler
      

      Actually add a line for whatever modules the command lsmod | grep ipmi gives you

      posted in Hardware
      T
      ThierryEscande
    • RE: Could not change "-machine" by 'xe vm-param-set'

      Hi lyan,

      This might be related to a known issue about PCI passthrough with nvme devices: the kernel tries to allocate more MSI-X vectors than the guest can handle. You can try to increase the number of guest IRQs with the Xen boot parameter extra_guest_irqs. The default is 64 and you can increase it to 128 with:

      /opt/xensource/libexec/xen-cmdline --set-xen "extra_guest_irqs=128"
      

      A reboot of the host is needed.

      posted in Compute
      T
      ThierryEscande

    Latest posts made by ThierryEscande

    • RE: High Fan Speed Issue on Lenovo ThinkSystem Servers

      Issue quick summary

      Symptoms

      On some Lenovo servers, DIMM temperature reading reports errors which results in higher fan speed.

      Known impacted systems

      • ThinkSystem SR635 V3
      • ThinkSystem SR655 V3

      Possible solutions (but not possible/recommended)

      Enable CONFIG_X86_AMD_PLATFORM_DEVICE in kernel configuration

      By enabling this driver, the hardware device is correctly initialized and this solution fixed the issue. Unfortunately, this is not a viable solution.

      Main kernel package

      Ideally, enabling CONFIG_X86_AMD_PLATFORM_DEVICE in XCP-ng main kernel would solve the issue but this is actually not possible as this also enables CONFIG_PINCTRL. Enabling CONFIG_PINCTRL modifies the device internal kernel structure, resulting in kernel ABI changes. Such changes in the kernel ABI would break all external driver packages.

      Alternate kernel package

      CONFIG_X86_AMD_PLATFORM_DEVICE is now enable in the kernel-alt package for XCP-ng 8.2 and 8.3 since we don't really care about ABI stability for this kernel.

      But this kernel is not meant to be used in production as it is less tested and no external driver package can be used with it. Therefore, even if it fixes the issue, using the kernel-alt package is highly discouraged.

      Actual workaround

      Blacklist i2c_designware_platform driver

      From my understanding, without CONFIG_X86_AMD_PLATFORM_DEVICE enabled in the kernel, some hardware initializations are missing resulting in incorrect values reported by the i2c_designware_platform driver. Preventing this driver from being probed by the kernel seems to solve the issue.

      Since this driver is built-in in the XCP-ng kernel and not compiled as a module, the blacklist method (shared by Riven in this post) is as follow:

      • Edit /etc/grub-efi.cfg and add initcall_blacklist=dw_i2c_init_driver to the line module2 /boot/vmlinuz-4.19-xen in the XCP-ng menu entry.
      ...
      menuentry 'XCP-ng' {
              search --label --set root root-pxdcvt
              multiboot2 /boot/xen.gz dom0_mem=7568M,max:7568M watchdog ucode=scan dom0_max_vcpus=1-16 crashkernel=256M,below=4G console=vga vga=mode-0x0311
              module2 /boot/vmlinuz-4.19-xen root=LABEL=root-pxdcvt ro nolvm hpet=disable rd.auto console=hvc0 console=tty0 quiet vga=785 splash plymouth.ignore-serial-consoles initcall_blacklist=dw_i2c_init_driver
              module2 /boot/initrd-4.19-xen.img
      }
      ...
      
      • Run grub-mkconfig and reboot.

      Lenovo support

      Lenovo is aware and is discussing with Citrix about this issue. As a reference, the Lenovo support entry can be found here.

      posted in Hardware
      T
      ThierryEscande
    • RE: High Fan Speed Issue on Lenovo ThinkSystem Servers

      The kernel-alt package with X86_AMD_PLATFORM_DEVICE enabled is available for testing.

      This is for XCP-ng 8.2.1 only for now.

      Use yum to install it:

      yum --enablerepo=xcp-ng-testing install kernel-alt
      

      Reboot and select XCP-ng kernel-alt 4.19.265 at the grub screen.

      posted in Hardware
      T
      ThierryEscande
    • RE: High Fan Speed Issue on Lenovo ThinkSystem Servers

      Riven said in High Fan Speed Issue on Lenovo ThinkSystem Servers:

      grep CONFIG_X86_AMD_PLATFORM_DEVICE /boot/config-4.19.0+1
      # CONFIG_X86_AMD_PLATFORM_DEVICE is not set
      

      A rebuilt kernel with this enabled might be all is needed.

      Hi,

      We're building kernel-alt packages for both XCP-ng 8.2 and 8.3 with X86_AMD_PLATFORM_DEVICE enabled. We'll notify you when they are available for testing.

      posted in Hardware
      T
      ThierryEscande
    • RE: High Fan Speed Issue on Lenovo ThinkSystem Servers

      LennertvdBerg no update on that, sorry. As said before, it will be hard to tell what's going on without feedback from Lenovo.

      posted in Hardware
      T
      ThierryEscande
    • RE: High Fan Speed Issue on Lenovo ThinkSystem Servers

      rmaclachlan Thanks a lot. Unfortunately I did not find any evidence of what could be wrong from the ACPI tables.

      It obviously does not come from the IPMI devices as there is no modification in this area.

      So without help from Lenovo it will be difficult for us to go further. If you manage to get Lenovo involved one way or another we will be happy to collaborate and help.

      posted in Hardware
      T
      ThierryEscande
    • RE: High Fan Speed Issue on Lenovo ThinkSystem Servers

      rmaclachlan Thanks for the files. Did not see anything obvious at first sight.

      I forgot to ask you for ssdt files too. Would it be possible to do the same with these files ?

      iasl -d ssdt*.dat
      

      (I hope you kept the old firmware ones somewhere, otherwise don't bother to downgrade again. Just share the new firmware ssdt files)

      posted in Hardware
      T
      ThierryEscande
    • RE: High Fan Speed Issue on Lenovo ThinkSystem Servers

      rmaclachlan If it's an ACPI issue and since Lenovo doesn't seem to be very cooperative, one could try to downgrade the firmware to a working version (i.e. one that runs fans at normal speed) and dump the ACPI table. Then upgrade to the latest firmware, dump the ACPI tables again, and then compare them.

      I don't know ACPI much but I can have a look if you can share them.

      The ACPI tools should be already installed on any XCP-ng host.

      • Dump the ACPI tables in binary format
        Do so in an empty folder as this produces numerous files
      # acpidump -b
      
      • Decompile the dsdt.dat file
      # iasl -e ssdt*.dat -d dsdt.dat
      
      • Do the same operations for both firmwares and share the dsdt.dsl files.
        The files are pretty big so don't hesitate to compress them before sharing.
      posted in Hardware
      T
      ThierryEscande
    • RE: High Fan Speed Issue on Lenovo ThinkSystem Servers

      And for those on XCP-ng 8.3, it might be worth giving a try to Xen 4.17. You'll find all the necessary information on how to upgrade in the thread Xen 4.17 on XCP-ng 8.3.

      posted in Hardware
      T
      ThierryEscande
    • RE: High Fan Speed Issue on Lenovo ThinkSystem Servers

      Even if this will probably be same as with XCP-ng 8.2, the kernel-alt package is available for 8.3.

      Same steps as in my previous post, except for the repo URL in xcp-ng.repo:

      [xcp-ng-tescande]
      name=XCP-ng 8.3 tescande User Repository
      baseurl=https://koji.xcp-ng.org/repos/user/8/8.3/tescande2/x86_64/
      enabled=0
      gpgcheck=0
      priority=1
      
      posted in Hardware
      T
      ThierryEscande
    • RE: High Fan Speed Issue on Lenovo ThinkSystem Servers

      RIX_IT Thanks for your feedback.

      Can you please share the output of dmesg and xl dmesg ?

      posted in Hardware
      T
      ThierryEscande