one server (268) with 4.19.19-184.108.40.206.xcpng8.1: no more problem!
Yeah, we need to be sure that this is a stable kernel and somewhere after this, the memory leak seems to have introduced.
@appollonius I think it will depend on hardware settings. See if you have option to change the mode in motherboard settings?
I don't have physical hardware access to validate this as I mostly use nested virtualization but XCP-ng
/boot has required files support both for UEFI and BIOS mode.
Maybe @stormi can confirm.
Yes, all drivers are stock kernel modules for
kernel-alt. It would be interesting to see the behavior by disabling
override. I think we can try both. 1st check if the downgraded kernel shows same symptoms and then disabling update drivers.
@olivierlambert @delaf what we know from
kmemleak so far is that it will only scan and report unreferenced objects. If any kernel module / kernel itself is still holding(referencing) the memory then it may not show up. We are evaluating other options to find this.
kernel-alt is more related to upstream, so either this issue is known and fixed in upstream or it might have been introduced from kernel updates.
The oldest kernel available is
4.19.19-220.127.116.11.xcpng8.1, is it possible to install it and see if the issue repeats?
After the system is running for some time, user can
# echo scan > /sys/kernel/debug/kmemleak and then
# cat /sys/kernel/debug/kmemleak to see if there are any unreferenced objects floating in memory.
CephFS is working nicely, but the update deleted my previous secret in /etc and I had to reinstall the extra packages and recreate the SR and then obviously move the virtual disks back across and refresh
Were you not able to attach the pre-existing SR on CephFS? Accordingly, I'll take a look in the documentation or the driver.
@vegarnilsen Thanks, you got the correct one.
Can you share
# modinfo bnx2x?
One of them will be loaded in above order depending on its presence.
@vegarnilsen Ok, that was helpful.
Can you try installing
broadcom-bnxt-en-alt.x86_64 and report the observations? You would need a reboot.
@vegarnilsen can you share
# dmesg and
# lsmod? We may have to try a different version of the driver to fix this. May be
# rpm -qa | grep bnx.
@Appollonius they are still pretty same. So this means that there is no issue for system to boot (with or without GPU) but must be something to do with network config.
Can you check if you have something in
/var/log/xensource.log indicating a service failure/network start fail?