@olivierlambert is it already known in which update/release this problem will be solved?
Posts
-
RE: Epyc VM to VM networking slow
-
RE: High Fan Speed Issue on Lenovo ThinkSystem Servers
@bleader I've done exactly the same on my ThinkSystem SR665 V3, BMC Version
3.20 (Build ID: KAX334O), UEFI Version 4.20 (Build ID: KAE120J), LXPM Version 4.12 (Build ID: GNL114G) and it worked;# Found which driver to blacklist from kernel init fgrep i2c /boot/System.map-4.19.0+1 | grep init [10:10 xcp-ng-host1 ~]# fgrep i2c /boot/System.map-4.19.0+1 | grep init ffffffff815172f0 T drm_i2c_encoder_init ffffffff815574c0 T __regmap_init_i2c ffffffff81557510 T __devm_regmap_init_i2c ffffffff815d0700 t i2c_dw_init_master ffffffff815d14f0 t i2c_dw_init_slave ffffffff81ea9b40 r __ksymtab_drm_i2c_encoder_init ffffffff81eb0570 r __ksymtab___devm_regmap_init_i2c ffffffff81eb0838 r __ksymtab___regmap_init_i2c ffffffff81edc1a9 r __kstrtab_drm_i2c_encoder_init ffffffff81edffcd r __kstrtab___devm_regmap_init_i2c ffffffff81edffe4 r __kstrtab___regmap_init_i2c ffffffff8248156d t i2c_init ffffffff82481b65 t dw_i2c_init_driver ffffffff8255fe48 t __initcall_i2c_init2 ffffffff8255ffa8 t __initcall_dw_i2c_init_driver4
In /etc/grub-efi.cfg I added initcall_blacklist=dw_i2c_init_driver and ran grub-mkconfig then rebooted.
terminal_input serial console terminal_output serial console set default=0 set timeout=5 menuentry 'XCP-ng' { search --label --set root root-pxdcvt multiboot2 /boot/xen.gz dom0_mem=7568M,max:7568M watchdog ucode=scan dom0_max_vcpus=1-16 crashkernel=256M,below=4G console=vga vga=mode-0x0311 module2 /boot/vmlinuz-4.19-xen root=LABEL=root-pxdcvt ro nolvm hpet=disable rd.auto console=hvc0 console=tty0 quiet vga=785 splash plymouth.ignore-serial-consoles initcall_blacklist=dw_i2c_init_driver module2 /boot/initrd-4.19-xen.img }
Run grub-mkconfig then rebooted
grub-mkconfig
I first checked the current system temperatures and fan speeds (configured at fix RPM);
[10:17 xcp-ng-host1 ~]# ipmitool sdr | grep -i temp Ambient Temp | 30 degrees C | ok Exhaust Temp | 34 degrees C | ok CPU 1 Temp | 39 degrees C | ok CPU 2 Temp | no reading | ns DIMM 1 Temp | no reading | ns DIMM 2 Temp | no reading | ns DIMM 3 Temp | no reading | ns DIMM 4 Temp | no reading | ns DIMM 5 Temp | no reading | ns DIMM 6 Temp | 0 degrees C | ok DIMM 7 Temp | 0 degrees C | ok DIMM 8 Temp | no reading | ns DIMM 9 Temp | no reading | ns DIMM 10 Temp | no reading | ns DIMM 11 Temp | no reading | ns DIMM 12 Temp | no reading | ns DIMM 13 Temp | no reading | ns DIMM 14 Temp | no reading | ns DIMM 15 Temp | no reading | ns DIMM 16 Temp | no reading | ns DIMM 17 Temp | no reading | ns DIMM 18 Temp | no reading | ns DIMM 19 Temp | no reading | ns DIMM 20 Temp | no reading | ns DIMM 21 Temp | no reading | ns DIMM 22 Temp | no reading | ns DIMM 23 Temp | no reading | ns DIMM 24 Temp | no reading | ns PCIe 1 OverTemp | 0x00 | ok PCIe 2 OverTemp | 0x00 | ok PCIe 3 OverTemp | 0x00 | ok OCP OverTemp | 0x00 | ok [10:18 xcp-ng-host1 ~]# ipmitool sdr | grep -i fan Fan Mismatch | 0x00 | ok Fan 1 Front Tach | 6642 RPM | ok Fan 2 Front Tach | 6642 RPM | ok Fan 3 Front Tach | 6724 RPM | ok Fan 4 Front Tach | 6560 RPM | ok Fan 5 Front Tach | 6642 RPM | ok Fan 6 Tach | 0 RPM | ok Fan 1 Rear Tach | 6225 RPM | ok Fan 2 Rear Tach | 6150 RPM | ok Fan 3 Rear Tach | 6300 RPM | ok Fan 4 Rear Tach | 6150 RPM | ok Fan 5 Rear Tach | 6300 RPM | ok Sys Fan Pwr | 18 Watts | ok
I followed by a reboot of the Xclarity BMC controller and the new readings are;
[10:23 xcp-ng-host1 ~]# ipmitool sdr | grep -i temp Ambient Temp | 30 degrees C | ok Exhaust Temp | 34 degrees C | ok CPU 1 Temp | 40 degrees C | ok CPU 2 Temp | no reading | ns DIMM 1 Temp | no reading | ns DIMM 2 Temp | no reading | ns DIMM 3 Temp | no reading | ns DIMM 4 Temp | no reading | ns DIMM 5 Temp | no reading | ns DIMM 6 Temp | 38 degrees C | ok DIMM 7 Temp | 37 degrees C | ok DIMM 8 Temp | no reading | ns DIMM 9 Temp | no reading | ns DIMM 10 Temp | no reading | ns DIMM 11 Temp | no reading | ns DIMM 12 Temp | no reading | ns DIMM 13 Temp | no reading | ns DIMM 14 Temp | no reading | ns DIMM 15 Temp | no reading | ns DIMM 16 Temp | no reading | ns DIMM 17 Temp | no reading | ns DIMM 18 Temp | no reading | ns DIMM 19 Temp | no reading | ns DIMM 20 Temp | no reading | ns DIMM 21 Temp | no reading | ns DIMM 22 Temp | no reading | ns DIMM 23 Temp | no reading | ns DIMM 24 Temp | no reading | ns PCIe 1 OverTemp | 0x00 | ok PCIe 2 OverTemp | 0x00 | ok PCIe 3 OverTemp | 0x00 | ok OCP OverTemp | 0x00 | ok [10:26 xcp-ng-host1 ~]# ipmitool sdr | grep -i fan Fan Mismatch | 0x00 | ok Fan 1 Front Tach | 8528 RPM | ok Fan 2 Front Tach | 8446 RPM | ok Fan 3 Front Tach | 8446 RPM | ok Fan 4 Front Tach | 8610 RPM | ok Fan 5 Front Tach | 8446 RPM | ok Fan 6 Tach | 0 RPM | ok Fan 1 Rear Tach | 7950 RPM | ok Fan 2 Rear Tach | 7950 RPM | ok Fan 3 Rear Tach | 8025 RPM | ok Fan 4 Rear Tach | 7950 RPM | ok Fan 5 Rear Tach | 7875 RPM | ok Sys Fan Pwr | 24 Watts | ok
-
RE: High Fan Speed Issue on Lenovo ThinkSystem Servers
@LennertvdBerg Lenovo didn't want to provide any support. However, they just published a new UEFI/BIOS. Not sure if this is going to fix things;
-
RE: High Fan Speed Issue on Lenovo ThinkSystem Servers
@RIX_IT I've dropped today a ticket as well, hoping them to realise it would be beneficial for all parties if they could help solving this.
-
RE: High Fan Speed Issue on Lenovo ThinkSystem Servers
@ThierryEscande Has anyone made any progress on this? @Riven you got contact details at Lenovo for contacting regarding this?
-
RE: ISO modification with additional RPM for NIC
@stormi I’ll be back on Wednesday (just short holiday now), I’ll try your advice and see how it works
-
RE: ISO modification with additional RPM for NIC
@stormi I thought it’s convenient to have all in one as it’s easy for installation. But I can check this options as well. So you recommend to extract the iso to a separate USB drive and load drivers from there?
-
RE: High Fan Speed Issue on Lenovo ThinkSystem Servers
@olivierlambert would there be a way after GRUB to walk step by step through the boot and see where it goes wrong?
-
RE: ISO modification with additional RPM for NIC
@stormi Hi, some help is welcome Still haven’t found a solutions.
-
RE: High Fan Speed Issue on Lenovo ThinkSystem Servers
@ThierryEscande I've updated to
Xen 4.17
and it seems the upgrade went fine:host : xcp-ng-test1 release : 4.19.0+1 version : #1 SMP Wed Jan 24 17:19:11 CET 2024 machine : x86_64 nr_cpus : 64 max_cpu_id : 63 nr_nodes : 1 cores_per_socket : 32 threads_per_core : 2 cpu_mhz : 3245.126 hw_caps : 178bf3ff:7efa320b:2e500800:244037ff:0000000f:f1bf97a9:00405fce:00000780 virt_caps : pv hvm hvm_directio pv_directio hap gnttab-v1 gnttab-v2 total_memory : 130850 free_memory : 121721 sharing_freed_memory : 0 sharing_used_memory : 0 outstanding_claims : 0 free_cpus : 0 xen_major : 4 xen_minor : 17 xen_extra : .3-3 xen_version : 4.17.3-3 xen_caps : xen-3.0-x86_64 hvm-3.0-x86_32 hvm-3.0-x86_32p hvm-3.0-x86_64 xen_scheduler : credit xen_pagesize : 4096 platform_params : virt_start=0xffff800000000000 xen_changeset : $Format:%H$, pq ??? xen_commandline : dom0_mem=7568M,max:7568M watchdog ucode=scan dom0_max_vcpus=1-16 crashkernel=256M,below=4G console=vga vga=mode-0x0311 cc_compiler : gcc (GCC) 11.2.1 20210728 (Red Hat 11.2.1-1) cc_compile_by : mockbuild cc_compile_domain : [unknown] cc_compile_date : Wed Feb 28 10:12:19 CET 2024 build_id : 9a011a28e29a21a7643376b36aec959253587d42 xend_config_format : 4
However, the issues with the fan speeds and missing memory temperature readings still persist.
-
RE: High Fan Speed Issue on Lenovo ThinkSystem Servers
@ThierryEscande I'm experiencing difficulties with installing the
kernel-alt
package on my system. Currently, I am using XCP-ng version 8.3.0-beta2. It appears that there might be a problem with updategrub.py. Any guidance on how to resolve this would be greatly appreciated.I've update
xcp-ng.repo
and this is my output ofyum --enablerepo=xcp-ng-tescande install kernel-alt
:[20:12 xcp-ng-test1 ~]# yum --enablerepo=xcp-ng-tescande install kernel-alt Loaded plugins: fastestmirror Loading mirror speeds from cached hostfile Excluding mirror: updates.xcp-ng.org * xcp-ng-base: mirrors.xcp-ng.org Excluding mirror: updates.xcp-ng.org * xcp-ng-updates: mirrors.xcp-ng.org Resolving Dependencies --> Running transaction check ---> Package kernel-alt.x86_64 0:4.19.309-1.0.lenovotest.2.xcpng8.3 will be installed --> Finished Dependency Resolution Dependencies Resolved =========================================================================================================================================================================================================================================================================================== Package Arch Version Repository Size =========================================================================================================================================================================================================================================================================================== Installing: kernel-alt x86_64 4.19.309-1.0.lenovotest.2.xcpng8.3 xcp-ng-tescande 30 M Transaction Summary =========================================================================================================================================================================================================================================================================================== Install 1 Package Total download size: 30 M Installed size: 154 M Is this ok [y/d/N]: y Downloading packages: kernel-alt-4.19.309-1.0.lenovotest.2.xcpng8.3.x86_64.rpm | 30 MB 00:00:01 Running transaction check Running transaction test Transaction test succeeded Running transaction Installing : kernel-alt-4.19.309-1.0.lenovotest.2.xcpng8.3.x86_64 1/1 /var/tmp/rpm-tmp.l9rbzO: line 9: /opt/xensource/bin/updategrub.py: No such file or directory warning: %post(kernel-alt-4.19.309-1.0.lenovotest.2.xcpng8.3.x86_64) scriptlet failed, exit status 127 Non-fatal POSTIN scriptlet failure in rpm package kernel-alt-4.19.309-1.0.lenovotest.2.xcpng8.3.x86_64 Verifying : kernel-alt-4.19.309-1.0.lenovotest.2.xcpng8.3.x86_64 1/1 Installed: kernel-alt.x86_64 0:4.19.309-1.0.lenovotest.2.xcpng8.3
-
RE: High Fan Speed Issue on Lenovo ThinkSystem Servers
@ThierryEscande . When I do
lsmod | grep ipmi
I get the following resultsipmi_si 65536 0 ipmi_devintf 20480 0 ipmi_msghandler 61440 2 ipmi_devintf,ipmi_si
So, I created the file with
vi /etc/modprobe.d/blacklist-ipmi.conf
and added the following:blacklist ipmi_si blacklist ipmi_devintf blacklist ipmi_msghandler
I saved the file and rebooted the system using shutdown -r now. However, I still don't see the memory temperatures in Xclarity, and the server's fans are still running at over 13,000 RPM. The system is running XCP-NG 8.3 beta 2 with kernel 4.19.0+1.
-
RE: ISO modification with additional RPM for NIC
@stormi could you maybe advise what I'm doing wrong?
-
RE: ISO modification with additional RPM for NIC
@Danp, UPDATED: I tried booting with an alternate kernel in XCP-NG 8.2.1 and XCP-NG 8.3 beta 2, but it didn't load the Mellanox ConnectX-6 Lx 10/25GbE drivers.
Yes, I've read the documentation about creating a custom ISO and have detailed my procedure above. The only part I'm unsure about is this:
"you need to add new RPMs not just replace existing ones, they need to be pulled by another existing RPM as dependencies. If there's none suitable, you can add the dependency to the xcp-ng-deps RPM."
I couldn’t realize or understand this step. -
ISO modification with additional RPM for NIC
I'm fairly new to XCP-NG and would like to build a custom ISO for XCP-NG where I can add an additional RPM for a Mellanox ConnectX-6 Lx 10/25GbE SFP28. The problem is that I don't have other NICs installed and I can't install XCP-NG 8.2.1 because it detects during installation that there's no NIC in the system. I can install XCP-NG 8.3 beta 2 as the drivers are included there. So, I would like to include the drivers for the Mellanox in the ISO so that during installation the process will automatically detect it and I can run the installation.
In xcp-ng-8.3.0-beta2, there's an additional
mellanox-mlnxen-5.4_1.0.3.0-4.xcpng8.3.x86_64.rpm
in thePackages/
directory. In xcp-ng-8.2.1-20231130, there is no mellanox-mlnxen*.rpm at all. I found two Mellanox RPMs at Koji;mellanox-mlnxen-alt-5.4_1.0.3.0-1.xcpng8.2.x86_64.rpm
(https://koji.xcp-ng.org/buildinfo?buildID=2620)mellanox-mlnxen-alt-5.9_0.5.5.0-1.1.xcpng8.2.x86_64.rpm
(https://koji.xcp-ng.org/buildinfo?buildID=2868)
I tried following the instructions for ISO modification mentioned in the XCP-NG ISO modification documentation
First, I extracted the ISO using the following commands:
mkdir tmpmountdir/ mount -o loop filename.iso tmpmountdir/ # as root cp -a tmpmountdir/. iso umount tmpmountdir/ # as root chmod a+w iso/ -R
Then, I used
wget
to download the RPMs into thePackages/
directory. After this, I updated therepodata/
using the following command (remember to installcreaterepo-c
first):"sudo apt install createrepo-c rm repodata/ -rf createrepo_c . -o .
Finally, I built the ISO using the instructions given in the XCP-NG documentation:
#OUTPUT=/path/to/destination/iso/file # change me OUTPUT=/home/xcp-ng/new_iso/xcp-ng-8.2.1-20231130-mod.iso VERSION=8.2 # change me genisoimage -o $OUTPUT -v -r -J --joliet-long -V "XCP-ng $VERSION" -c boot/isolinux/boot.cat -b boot/isolinux/isolinux.bin \ -no-emul-boot -boot-load-size 4 -boot-info-table -eltorito-alt-boot -e boot/efiboot.img -no-emul-boot . isohybrid --uefi $OUTPUT
However, when I use this ISO, the Mellanox ConnectX-6 Lx drivers do not load during installation.
Also, I have seen on the Nvidia website that new drivers for the ConnectX-6 Lx are available for Citrix XenServer Host 8.2 in version
mlnx-en-23.10-2.1.3.1-xenserver8.2-x86_64.
So my questions are:
- What am I doing wrong with building the ISO and including the RPMs?
- Is it possible to include the
mlnx-en-23.10-2.1.3.1-xenserver8.2-x86_64
for XCP-NG 8.2? - What steps do I need to take, and how?
-
RE: High Fan Speed Issue on Lenovo ThinkSystem Servers
@rmaclachlan Thanks. I'm also unsure how we can determine what in the OS is causing this issue. Are there other installations or modifications we could try to help isolate the problem, such as another Linux distribution with the same kernel, to see if it's a kernel-related issue? @gduperrey or @olivierlambert any suggestions how we can help the team with identifying this?
-
RE: High Fan Speed Issue on Lenovo ThinkSystem Servers
@olivierlambert can you help us with providing the module for the xen kernel, which @Gheppy is talking about?
-
RE: High Fan Speed Issue on Lenovo ThinkSystem Servers
@Gheppy I've just reinstalled xcp-ng-8.3.0-beta2 after my Ubuntu experiment and installed lm_sensors. The output is indeed:
Driver `to-be-written': * ISA bus, address 0xcc0 Chip `IPMI BMC KCS' (confidence: 8) Note: there is no driver for IPMI BMC KCS yet. Check http://www.lm-sensors.org/wiki/Devices for updates. No modules to load, skipping modules configuration. Unloading i2c-dev... OK Unloading cpuid... OK
The complete output is:
What will be the solution for this?
-
RE: High Fan Speed Issue on Lenovo ThinkSystem Servers
@Gheppy I just installed Ubuntu 22.044 LTS with kernel 5.15.0-102-generic just to test if there could be anything like a 'vendor lock'. Using Ubuntu I just see my memory temperatures and all my fan speeds are around 6000 rpm. So it really seems to be something with XCP and Lenovo.