@ThierryEscande
Can confirm.
Applied the workaround to both Servers and upgraded to the latest XCC (KAX338N) & UEFI (KAE122D).
Temps are displayed correctly and the Servers are whisper quiet.
A huge thanks to anyone involved!
@ThierryEscande
Can confirm.
Applied the workaround to both Servers and upgraded to the latest XCC (KAX338N) & UEFI (KAE122D).
Temps are displayed correctly and the Servers are whisper quiet.
A huge thanks to anyone involved!
Thanks again for your effort!
That gives at least a little hope for it to be fixed in the future.
Perhaps @olivierlambert could give us some insides on whether the changes can or will be implemented if they're fixed from XenServer's side?
I did also find this:
XenServer 8 cannot acquire the DIMM temperature and fan speed is too fast on the AMD platform - Lenovo ThinkSystem
Looks like Lenovo finally acknowledged that there is a problem. They even provided a Workaround for this.
I'll see if I can test it. It might take some time because we already have our servers running in production, we just couldn't wait any longer for a solution.
@DeOccultist
That's pretty much what I was expecting from Lenovo at this point. Sad, but thanks a lot for trying!
You can find the older versions here:
Scroll all the way to the bottom to find the Button that says "Show Previous Versions":
@LennertvdBerg I just tested the new UEFI combined with the new XCC Firmware. Also installed all pending XCP-ng updates while being at it.
XCC Firmware didn't make any difference, so this one should be updated as Lenovo addressed quite a few CVE's.
However, the new UEFI is still not the solution to our problem, just as @Riven already figured by the changelogs. Fans are still sitting at around 12-13k RPMs. Also the DIMM Temps are still not shown in XCC.
@LennertvdBerg
We already tried getting into contact with Lenovo a while ago. But like I already stated, they weren't able to escalate the ticket because of the unsupported OS. That's the same response that @Riven got.
Maybe you could drop Lenovo a ticket as well and point them to this thread. Let's see if it helps if more people report this issue. Otherwise we seem to be pretty much out of Luck.
We have one of our two servers now in production running the old UEFI, Sound Level is not great, but bearable.
Still definitely far from an optimal solution.
@ThierryEscande
Of course. See the attached files:
What I also did notice was that networking isn't working at all with the 4.19.309 kernel. Settings were the same as before and I couldn't even ping the gateway. I tried an emergency reset and reconfiguring, but still no luck. Upon booting with the stable kernel, it worked again as expected. But that's an issue for another day, I guess.
@ThierryEscande
I've upgraded to the kernel-alt 4.19.309 while still using the old UEFI (kae110k 1.41). Currently the IPMI modules are not blacklisted in the modprobe config.
The fan speeds remain stuck at 9k RPMs, consistent with both the stable and previous alt kernel versions.
I also attempted to upgrade again to the newest UEFI (kae118m 4.11) but didn't notice any discernible difference in fan behavior - here it's still around 13k RPMs.
The ipmitool output also didn't change from the one @rmaclachlan provided.
If there are any other suggestions for testing or specific logs you'd like me to provide, please let me know.
Just did that on 8.2 (kernel-alt.x86_64 0:4.19.265-1.xcpng8.2), not the testing one yet.
Got a few errors on startup, not related afaik, but still fyi:
That didn't make a difference as it seems:
I'm on UEFI 1.41, could try the most recent somewhat later once people leave their offices.
@LennertvdBerg As mentioned, XCC firmware doesn't seem to change anything regarding the fan speed. So we're sticking with the latest version for now. You could try rolling back to UEFI 1.41 (Build ID: KAE110K) as well and see if it makes a difference.
Hey,
Just wanted to provide an update from our end as well.
We've conducted tests with various versions of the xClarity Controller firmware / UEFI.
Lenovo seems to be onto something, as they recently released a new version with the following changelog: (XClarity Controller Firmware 2.40 KAX326G).
They've also released a new UEFI in the meantime. However, the fan speed issue persists despite these updates.
We attempted to consolidate support from Lenovo, but they were unable to escalate the ticket due to XCP-ng being considered unsupported.
What seems to be working for us, as suggested by @rmaclachlan , is the UEFI version 1.41 (Build ID: KAE110K).
With this version, the fan speeds have decreased to around 9k RPMs, which, while still slightly high, is within acceptable sound levels.
That's really not the best option as they have adressed a few CVE's since this release, but at least we can start setting up the Server without getting angry calls all the time..
The XClarity Controller firmware doesn't have an impact on the fan speed at all as it seems.
@rmaclachlan have you found a solution in the meantime.
@bleader, I followed your suggestion and updated our installation from 8.2 to 8.3 and went through the steps outlined in the post you mentioned.
Unfortunately, there were no changes regarding our fan speed issue - as expected.
Also, I installed RHEL 8.8 (Kernel 4.18) and Ubuntu Server 18.06 (Kernel 5.4), both officially supported by Lenovo. However, the fans were still running at a comfortable 5000 RPMs in both cases. It appears that the problem is specific to XCP.
I've digged through the logs but haven't found anything really obvious.
In the kernel log, I noticed a couple of warnings:
Additionally, I observed that the mcelog service failed, also I found a warning checking journalctl.
Given that we're running a beta build, I'm uncertain if these issues are related, or if they even matter, but I wanted to share this information in case it may be helpful.
Thank you all for the input!
I'll be experimenting with other Linux distributions officially supported by Lenovo to see if the issue persists.
I'm not really optimistic about receiving assistance from Lenovo regarding XCP, but maybe if we are able to reproduce elsewhere, they will point us in the right direction.
Rest assured, I'll keep you updated on any progress made.
Hi everyone,
We recently bought two new servers, specifically the Lenovo ThinkSystem SR665 V3 running single AMD EPYC 9174F processors, and deployed XCP 8.2.1 on them. However, upon booting into XCP, we encountered an issue with excessively high fan speeds (around 16k rpms), resulting in unbearable noise levels.
The servers are sitting in our climate-controlled server room at about 20°C, with CPU temperatures around 35°C at idle. Considering these conditions, the fan speeds shouldn't be ramping up so aggressively.
To troubleshoot, I installed Windows Server 2019 directly on the bare metal to check if the issue persisted, but within Windows, the servers operate quietly without any noticeable fan noise.
We've ensured that both BIOS and firmware are up to date, and XCP has been updated using yum. Additionally, we've attempted to adjust performance settings within the BIOS, but to no avail.
What's strange about this is, that we've been running five other Lenovo servers with XCP for years without encountering similar issues. The other models include SR630’s and SR650’s, all Intel-based.
Has anyone else encountered something like this before? Are there any methods for controlling fan speeds within XCP or other options to solve this?
Thanks in advance.