XCP-ng 8.0.0 Release Candidate
Tracked down the necessary patch to make ipmi work again.
*When excuting a command like:
modprobe ipmi_si ports=0xffc0e3 type=bt
The system would get an oops.
The trouble here is that ipmi_si_hardcode_find_bmc() is called before ipmi_si_platform_init(), but initialization of the hard-coded device creates an IPMI platform device, which won't be initialized yet.
The real trouble is that hard-coded devices aren't created with any device, and the fixup is done later. So do it right, create the hard-coded devices as normal platform devices.
This required adding some new resource types to the IPMI platform code for passing information required by the hard-coded device
and adding some code to remove the hard-coded platform devices on module removal.
To enforce the "hard-coded devices passed by the user take priority over firmware devices" rule, some special code was added to check and see if a hard-coded device already exists.*
The patch was backported to 4.19.37. Used the backported version from here. https://github.com/raspberrypi/linux/commit/6bba17f6bce39e46fdf8c0fe190bdc3f57ef8f8f
Once applied to the 4.19.19 sources
modprobe type=kcs ports=0xca2 ipmitool sensor
works. I tried it with an debian usb system with pristen 4.19.19 sources.
Is it woth the effort to prepare the patch for the xcp-ng kernel "kernel-4.19.19-18.104.22.168.xcpng8.0.x86_64.rpm"?
I assume it has to be signed by microsoft to make it work with secure uefi.
Thanks. We could include it in an experimental kernel package indeed. No microsoft signature of our kernel so we're free to change it if we trust the change. You should also create this very detailed bug report on https://github.com/xcp-ng/xcp/issues so that we can track its status there and also on https://bugs.xenserver.org to let XenServer develpers know and maybe get their feedback about what they think of the fix.
We should also check if that patch brought regressions and subsequent commits fixed that afterwards.
Did an grep for ipmi: and ipmi_si: over kernel 4.19 Changeslogs:
ChangeLog-4.19.31: ipmi_si: fix use-after-free of resource->name ChangeLog-4.19.31: Fixes: 93c303d2045b ("ipmi_si: Clean up shutdown a bit") ChangeLog-4.19.33: ipmi_si: Fix crash when using hard-coded device ChangeLog-4.19.37: ipmi: fix sleep-in-atomic in free_user at cleanup SRCU user->release_barrier ChangeLog-4.19.37: Fixes: 77f8269606bf ("ipmi: fix use-after-free of user->release_barrier.rda") ChangeLog-4.19.44: ipmi: ipmi_si_hardcode.c: init si_type array to fix a crash ChangeLog-4.19.45: ipmi:ssif: compare block number correctly for multi-part return messages ChangeLog-4.19.45: Fixes: 7d6380cd40f79 ("ipmi:ssif: Fix handling of multi-part return messages").
The patch i mentioned earlier was applied in 4.19.33 afterwards the patch from 4.19.44 looks like an following fix for an related issue.
Yesterday I also tried to copy the whole drivers/char/ipmi folder from 4.19.57 to 4.19.19. The kernel successfully compiled and the ipmi modues also worked. Was just an quick test and needs to be compared to the above list of patches.
Patching only for 1 issue would leave other issues - just waiting to be found.
Linux kernel gets weekly updates for fixes to stable and longterm. New features get added in new versions. I think at XCP we can follow longterm branch of 4.19 and keep pushing updates of kernels on regular basis.
I have at tracked till 4.19.48 at link and at 2 instances I found that using the next patch on top of existing patches needs certain modifications. ( e.g. 37-pre and 47-pre )
Following Linux longterm in XCP also means going back to CH updates will not be possible. But I assume, CH will anyways take updates from Linux longterm.
@r1 I think we will provide the latest 4.19.x kernel as an alternate kernel for those who need it and keep the Citrix one by default.
Update: @r1 build an rpm for your kernel with the citrix ddk 8.0.0 (and an suitable zfs-kmod package). Installed both on the G7 Microserver and ipmi_si works fine again. So thank you for providing the sources on github!
Is there any possibility to make installer nonGUI?
On my Ryzen 2400G can't load, it boots up, and after choosing installer the screen is just blank. Tried the same image on VM it starts the installer, but with some graphics.
Addon: ok, i changed 8192M to 2048M in grub.cfg and was able to install the system. Everything is working fine. Now, i got a question: i did yum update and system told me that there is a new kernel with newer xcp-ng:
kernel x86_64 4.19.19-5.0.9.xcpng8.0 xcp-ng-base 30 M
xcp-ng-release x86_64 8.0.0-12 xcp-ng-base 15 k
xcp-ng-release-config x86_64 8.0.0-12 xcp-ng-base 350 k
xenserver-firstboot noarch 1.0.11-1.1.xcpng8.0 xcp-ng-base 20 k
After installing and rebooting, it still shows that kernel is 4.19.0 - any idea how to fix? (not sure if xcp-ng itself updated to 8.0.0-12, in xsconsole it shows as 8.0.0)
The new kernel has the same version, except that it has been patched for a security issue. All good.
There's no such thing as XCP-ng 8.0.0-12 either. 12 is the release number of the xcp-ng-release RPM package, version is 8.0.0.
Those having the 8.0 RC1 running, does live migration from shared storage to local storage (on the same host that is currently running the VM) work for you?
I tried today to update version 7.6 to version 8.0 RC. Sadly the installer did not start.
In normal upgrade mode the installer just hangs with a black screen and when i tried the installation in safe mode the installer halted with the message:
(XEN)[ 34.0279] Hardware Dom0 halted: halting maschine
I have uploaded an image of the boot screen in safe mode:
Today i found out, that my xcp was restarted at 3.51AM. There is no crash dump in xcp center. Where do i look to find out what happened?
@dave-opc whats the XCP-ng version?
Did you see -
I thought it's obvious if i am asking in 8.0RC1 thread, that i have this verison.
@dave-opc Making sure that the version is up to date.
Did you find anything in
@EliasSeccom Does ALT + Arrow keys (Left/Right) show you any erroneous messages? On one of the screen, there is a shell available.
If its a hang which is not letting you do anything, you can attach a serial console and see if it outputs any stacktraces.
Kern log file shows that it booted at that time, at least i don't see any errors. (file attached)
It looks like it was just reset and started booting, but i doubt there was an electricity outage, as my xcp is powered by ippon back basic 1050.
@dave-opc To know more on next incident, you may add
/etc/grub.cfgfile against default kernel boot entry.
Got another issue with RC1:
I got 2 xcp ng servers. 1st at home on 8.0rc1 and 2nd remote on 7.6
When i had at home 7.6 (before upgrade) my continuous replication task in XOA was working fine. Every night copying VM to home server.
After upgrading to 8.0RC1 XOA starts copying. I can see [importing....] in xcp center, but after transfering the file, xoa ends with error
I thought i had a problem with iscsi drive, connected to xcp, but i also tried copying to my main sr (nvme) and got the same error.