@KFC-Netearth Surprising as it is still .477 but maybe the patch was backported, thanks for telling us!

Posts made by bleader
-
RE: Live migrate of Rocky Linux 8.8 VM crashes/reboots VM
-
RE: Updates announcements and testing
@TodorPetkov Yes, for now we do not know when this update will be released on XenServer side yet, but it will be published on XCP-ng side too.
What was released for now is suffering from the same issue as described in your link.
If I'm not mistaken:
- the linux-firmware update fixes the issues with zenbleed
- the kernel patch is working around the case where the updated firmware is not used by disabling features via the control register, and there were too much disabled in the previous patch.
- if you're using the updated firmware, this workaround will not be used, and therefore the updated patch is not critical.
You can check you're running the right microcode version via:
journalctl -k --grep=microcode
Without the
-k
you should be able to see previous boots and ensure thepatch_level=
has changed. I'm unsure which version to expect there as we do not have zen2 at hand for testing this.We will indeed provide an update later, likely not in a dedicated update, but with other fixes.
I hope that answers properly your question!
-
RE: Live migrate of Rocky Linux 8.8 VM crashes/reboots VM
Not entirely sure, but that may be related to what's happening on redhat side of things
-
RE: Live migrate of Rocky Linux 8.8 VM crashes/reboots VM
Makes sense, I was hoping they would go with a -488 on update... Sorry to hear its not the case
-
RE: kswapd0: page allocation failure under high load
When you say
under high load
Do you mean the export is creating a heavy load, or is there something else in your setup creating an heavy load at the same time?
As there is a nfs4 in the call trace, I would guess you're doing an export from XO, so downloading the export through HTTP, and that the VM is originally on a shared SR, is that right?
The bottom call trace is going kswapd -> shrink_slab, I would venture it is trying to free up memory, as in my first question, if there is something else putting heavy load, it could be expected, if not it could be a new leak or a higher memory consumption while doing the export for a reason or another. But I doubt about the leak as I would except it to end in an OOM more than this kind of warning.
If you're sure the setup is similar as well as the load outside of the export, it can indeed be something from the the updates.
-
RE: Live migrate of Rocky Linux 8.8 VM crashes/reboots VM
So, after our investigations, we were able to pinpoint the issue.
It seem to happen on most RHEL derivative distributions when migrating from 8.7 to 8.8. As suggested, the bug is in the kernel.
Starting with
4.18.0-466.el8
the patch: x86/idt: Annotate alloc_intr_gate() with __init is integrated and will create the issue. It is missing x86/xen: Split HVM vector callback setup and interrupt gate allocation that should have been integrated as well.The migration to 8.8 will move you to
4.18.0-477.*
versions that are also raising this issue, that's what you reported.We found that the
4.18.0-488
that can be found in CentOS 8 Stream integrates the missing patch, and do indeed work when installed manually.Your report helped us identify and reproduce the issues. That allowed us to provide a callstack to Xen devs. Then Roger Pau MonnΓ© found that it was this patch missing quickly, and we were able to find which versions of the kernel RPMs were integrating it and when the fix was integrated.
This means the issue was identified on RH side, and it is now a matter of having an updated kernel in derivative distributions like Rocky and Alma.
-
RE: Weird kern.log errors
It doesn't ring a bell as it is for me.
What I see from the first log is the segfault on
blktap
and inxcp-rrdd-xenpm
, likely that was while writing to a disk. In all cases, it is axen_mc_flush()
call.Given it happens on a single machine, I would venture it could be related to the disk controller, or disk itself, you could try to have a look at a
dmidecode
to see if the controllers are the same as on othe machines (sometimes there are small discrepencies between supposedly identical hardware), and check the drives withsmartctl
for any health issues. But especially as you were on raid1 originally, I doubt an issue with the drives themselves would lead to such an issue...