Help building kernel & kernel-alt, please
I wand to try my hand at building kernels with updated drivers to resolve issues around hardware that's too modern for the 4.19 LTS kernel.
I've been using the Docker scripts from the xcp-ng-build-env repo to build both the Xen kernel and the xapi parts, but simply using the analogous run.py command with "kernel" or "kernel-alt" won't work for those parts.
I've been trying to find some instructions on how to rebuild the kernel, but somehow they have eluded me...
Could someone please point me in the proper direction?
Question for @stormi
No matter if I am building xen, xapi or the kernels, there is one issue for which I'd like some help:
Just before the build finishes, all the sources used for the build get deleted, even when using the
--no-exitoption to stay in the container.
E.g. the final lines of the kernel build are like this:
Processing files: python2-perf-alt-4.19.227-1.xcpng8.2.x86_64 Executing(%license): /bin/sh -e /var/tmp/rpm-tmp.kWRZph + umask 022 + cd /home/builder/rpmbuild/BUILD + cd kernel-4.19.19 + LICENSEDIR=/home/builder/rpmbuild/BUILDROOT/kernel-alt-4.19.227-1.xcpng8.2.x86_64/usr/share/licenses/python2-perf-alt-4.19.227 + export LICENSEDIR + /usr/bin/mkdir -p /home/builder/rpmbuild/BUILDROOT/kernel-alt-4.19.227-1.xcpng8.2.x86_64/usr/share/licenses/python2-perf-alt-4.19.227 + cp -pr COPYING /home/builder/rpmbuild/BUILDROOT/kernel-alt-4.19.227-1.xcpng8.2.x86_64/usr/share/licenses/python2-perf-alt-4.19.227 + exit 0 Provides: gitsha(ssh://email@example.com/XS/linux.pg.git) = cb3c28f7e8213ef44e5c06369b577a18b86af291 gitsha(ssh://firstname.lastname@example.org/XSU/linux-stable.git) = dffbba4348e9686d6bf42d54eb0f2cd1c4fb3520 python2-perf-alt python2-perf-alt = 4.19.227-1.xcpng8.2 python2-perf-alt(x86-64) = 4.19.227-1.xcpng8.2 Requires(rpmlib): rpmlib(CompressedFileNames) <= 3.0.4-1 rpmlib(FileDigests) <= 4.6.0-1 rpmlib(PayloadFilesHavePrefix) <= 4.0-1 Requires: libc.so.6()(64bit) libc.so.6(GLIBC_2.14)(64bit) libc.so.6(GLIBC_2.2.5)(64bit) libc.so.6(GLIBC_2.3)(64bit) libc.so.6(GLIBC_2.3.4)(64bit) libc.so.6(GLIBC_2.4)(64bit) libc.so.6(GLIBC_2.7)(64bit) libc.so.6(GLIBC_2.8)(64bit) libpthread.so.0()(64bit) libpthread.so.0(GLIBC_2.2.5)(64bit) libpython2.7.so.1.0()(64bit) python(abi) = 2.7 rtld(GNU_HASH) Conflicts: python2-perf Processing files: kernel-alt-debuginfo-4.19.227-1.xcpng8.2.x86_64 Provides: kernel-alt-debuginfo = 4.19.227-1.xcpng8.2 kernel-alt-debuginfo(x86-64) = 4.19.227-1.xcpng8.2 Requires(rpmlib): rpmlib(FileDigests) <= 4.6.0-1 rpmlib(PayloadFilesHavePrefix) <= 4.0-1 rpmlib(CompressedFileNames) <= 3.0.4-1 Checking for unpackaged file(s): /usr/lib/rpm/check-files /home/builder/rpmbuild/BUILDROOT/kernel-alt-4.19.227-1.xcpng8.2.x86_64 Wrote: /home/builder/rpmbuild/SRPMS/kernel-alt-4.19.227-1.xcpng8.2.src.rpm Wrote: /home/builder/rpmbuild/RPMS/x86_64/kernel-alt-4.19.227-1.xcpng8.2.x86_64.rpm Wrote: /home/builder/rpmbuild/RPMS/x86_64/kernel-alt-headers-4.19.227-1.xcpng8.2.x86_64.rpm Wrote: /home/builder/rpmbuild/RPMS/x86_64/kernel-alt-devel-4.19.227-1.xcpng8.2.x86_64.rpm Wrote: /home/builder/rpmbuild/RPMS/x86_64/perf-alt-4.19.227-1.xcpng8.2.x86_64.rpm Wrote: /home/builder/rpmbuild/RPMS/x86_64/python2-perf-alt-4.19.227-1.xcpng8.2.x86_64.rpm Wrote: /home/builder/rpmbuild/RPMS/x86_64/kernel-alt-debuginfo-4.19.227-1.xcpng8.2.x86_64.rpm Executing(%clean): /bin/sh -e /var/tmp/rpm-tmp.iAlDiX + umask 022 + cd /home/builder/rpmbuild/BUILD + cd kernel-4.19.19 + /usr/bin/rm -rf /home/builder/rpmbuild/BUILDROOT/kernel-alt-4.19.227-1.xcpng8.2.x86_64 + exit 0 ~
and my problem is with the
%cleansection, which removes the patched source code, which I'd really love to read, because it's not available as plain source in any repository, only as a mix of upstream source repos and Vates patch files.
I've been trying to find out how I can avoid the %clean section from being executed as part of rpmbuild, but I've failed to find whence this ultimate
/usr/bin/rm -rf [...]is coming from or how to suppress it.
AFAIK after the build the
BUILDdirectory is not cleaned up.
Thanks Stormi, that's true and it does seem to include the patched source...
So I wonder (just a bit): What's the difference between the two?
But I suspect it's simply some rpmbuild magic...
BUILD is populated at %prep stage: tarball extracted then patches applied.
Then that's where the build (%build section of the spec file) happens.
You can control what steps you want to execute with options given to
@abufrejoval What are you working on adding? I too have run into hardware that's too new for 4.19 but works in 5.10 (like Debian 11 with Xen 4.14).
I'm messing around with all sorts of things:
Loss of video output on Gen10/11 iGPUs and Ryzen 3 (Cezanne) iGPU during the Xen-Dom0 handover (may be a grub issue with the UEFI frame buffer driver). It's on hold, for lack of time and because Xen doesn't make hacking boot stuff any easier and it would take me weeks I do not have to get deep enough. Also, apart from the missing console, the machines work just fine, after I transplant the installed system from the NUC8 to the other targets.
Lack of IOMMU support on my Ryzen 9 5950X with an Nvidia RTX 2080ti. It's been judged a BIOS issue here, but it works just fine with KVM and VMware. I've been going through the code, but unless I get kernel debugging going during the boot phase with a serial console, there is little chance of tracking down what's going on. From the logs alone, the code simply can't find the IOMMU device, or rather the data structures that describe it, but it's a ton of barely readable deeply cascaded spagetti of #define 'function calls'... written by an AMD guy, so XenSource will most likely point fingers, not invest into a work-around. Funny thing there is that this very system had been the first to run Xcp-ng in my lab, using a nested setup with VMware Workstation on Windows 2019 as a base....
Support for RealTek r8156 USB3 2.5 Gbit Ethernet adapters: I use those on Pentium Silver J5005 based passive mini-ITX machines with oVirt and want to transition them to something still alive. I got a version of the driver that compiles and works just fine, but Xensource uses all kinds of tricks to rename NICs to be consistent across a pool that might have widely different (and in the case of USB NICs, dynamic) device names assigned to NICs. Currently the Citrix code at the base of interface-rename cannot deal with NICs that aren't connected (directly) to the PCI bus. It doesn't look for USB devices and thus the bridge creation and the overlay network stuff just fails to use the device. I guess the only sensible thing to do is to open a ticket at XenSource and see if Andrew Cooper, who seems to have written all the xcp Python bindings, will incorporate USB NICs ...which I doubt, given the giant amount of extra support trouble hotplugging NICs might bring about, when there is no revenue in this space. Would be an interesting test of XenSource collaboration dynamics to see if an xcp-ng based addition would be accepted upstream
If you're confident hacking XenSource Python2 library scripts, have a look at
PCIDevices(line 259), where it's using
lspci -mnto find NICs.
I've also been looking a bit at support for the newer Intel NICs, which are built into my NUC10 and NUC11 devices, which aren't supported by the 4.19 kernel e1000e device driver. Again, it's not a priority for me, because I am using TB3 connected Aquantia 10GBase-T NICs for these faster NUCs with NVMe storage, as they are just the better match and literally zero trouble, if you disable the onboard NICs before installing.
The technical evolution in the mobile/desktop based edge appliance space currently is at a pace that completely overwhelms the XenSource roadmap, even Linux itself in many ways, because only Windows support sells that hardware. It's a bit of a nasty turn on NUCs, which for many years were a nicely conservative platform with great Linux support and plenty of efficient power for the home lab.
@abufrejoval I built the XCP 8.2 packages for the newer I225 and E1000e ethernet so that's solved (maybe, give it a try). As for the video in the new NUC, I got one and see the same video issue. I updated GRUB and that did not help. I booted newer Xen/Linux (debian 11) and it works. Seems to be an issue with the EFIFB detection on the new UEFI only machines. 4.19 does not detect it under Xen so there is no console video support. The older UEFI/CSM and legacy BIOS machines are fine. I'm still playing with the kernel. I use both live XCP Dom0 and the build container in a VM to compile new code.
The NUCs and other small machines update hardware often with commodity components so drivers fall behind quickly. They are not servers so will never have the same support but they are great as home/dev/lab machines.
It's not hard to move to a new version of Xen and Linux kernel but then you lose the huge invaluable support from upstream Citrix. So until they update, XCP is stuck.
Thanks for your response!
In the mean-time I've finally found the hints on how to work around the USB NIC renaming issues, both on the forum (even directed at me, but somehow not read) and by Eric Eikrem on his site, so I'll try that next to make the r8156 2.5Gbit USB3 NICs work (got lots of those) on the Atom boxes.
I'm not touching the NUCs (for igb/e1000 testing) at the moment, because I need them very stable to play with LINSTOR without a VGA console.
Just to illustrate: for weeks my NUC10 would disappear from network after a couple of days without issues and even if it was still visibly running (normal HDD LED activity) nothing but a hard reset would bring it back online. Just couldn't understand what was going on and if it was some type of hardware issue with the box (just out of warranty).
In the end it was one of the myriad of BIOS settings, could have been 'modern standby' or ASPM, which was reactivated after a firmware update and caused these problems days later.
Just a quick word to say I'm glad to see you experiment with all this. That kind of hardware is not within Citrix' target as far as I can tell but we would be pleased to support it in XCP-ng. However there's only so many hours in a day and we don't have all the hardware to test ourselves, so it's great to see the community make this move forward.
If test results for
igcand the newer
e1000eare positive, maybe it's time to make them additional/alternate driver packages in XCP-ng 8.2's repositories? @Andrew, would you like to contribute them to our git repositories? There will be a few changes to be done to the spec files to comply with the kernel module policy but nothing impossible. Actually, in my eyes the most important part is not contributing them once, but rather maintaining them afterwards when users report issues, need updates, etc. I would of course walk you through the process.
@abufrejoval I'm interested in most of what you're trying to do. For USB NIC support, I'm all for supporting it and I'm convinced we can upstream it rather easily (to Citrix, not XenSource which was bought by Citrix in 2007) if we can provide an elegant enough solution. We can also have XCP-ng specific patches to the software, when we really want to support something that Citrix won't. But we'd try to upstream it first anyway and we often manage to.
However, the solution will likely have to come from us, and when I say "us" it's more likely to happen with the help of the community who possesses and needs this kind of devices.
A first useful step would also be to write an up to date guide to using USB NICs at https://xcp-ng.org/docs/networking.html
@stormi I'm happy to support XCP and community for these drivers and add them to the repository. I do have the hardware on hand so I can test changes and work on updates. Let me know what changes you need. It would be nice to have them during the install since they may be required network drivers.