XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    what is kernel-alt?

    Scheduled Pinned Locked Moved Development
    8 Posts 3 Posters 1.2k Views 3 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • P Offline
      petr.bena
      last edited by

      Hello,

      Because of a number of bugs and issues in current dom0 kernel, I am currently backporting ceph code from kernel 4.19.295 to https://github.com/xcp-ng-rpms/kernel - it works great, but while I was upgrading the kernel on my test cluster, I noticed there is a package kernel-alt which contains 4.19.265

      What is that kernel? Is it stable? Does it contain all usual patches? Which repo does it come from?

      1 Reply Last reply Reply Quote 0
      • olivierlambertO Online
        olivierlambert Vates 🪐 Co-Founder CEO
        last edited by

        Hi,

        This is documented: https://docs.xcp-ng.org/installation/hardware/#alternate-kernel

        1 Reply Last reply Reply Quote 0
        • P Offline
          petr.bena
          last edited by

          great, anyway, here is the patch I made - https://github.com/xcp-ng-rpms/kernel/pull/9

          I am still testing it though, but it compiles and works on XCP-ng lab I have just fine. It's going to take a while until I can confirm if it fixes the issues I was encountering though, they were rather rare.

          benapetr opened this pull request in xcp-ng-rpms/kernel

          closed ceph code from kernel 4.19.295 #9

          1 Reply Last reply Reply Quote 0
          • olivierlambertO Online
            olivierlambert Vates 🪐 Co-Founder CEO
            last edited by

            Adding @stormi

            I think it would be more reasonable to start patching on 8.3, because it seems to be a lot of modification for a LTS. Anyway, we'll take a look and come back to you, please do the same in terms of Ceph stability in your own tests! Thanks 🙂

            P 1 Reply Last reply Reply Quote 0
            • stormiS Offline
              stormi Vates 🪐 XCP-ng Team
              last edited by

              Updating the driver on 8.3 would be an option indeed, if we can establish that patching it alone, without any other changes in the kernel, is enough and doesn't bring regressions over the current driver.

              An other option would be packaging the newer driver as a separate ceph-module-alt RPM (or ceph-modules-alt if there are several kernel drivers).

              1 Reply Last reply Reply Quote 0
              • P Offline
                petr.bena @olivierlambert
                last edited by

                @olivierlambert

                Hello, I can confirm that the patches I made contain significant stability improvements, I faced again kernel crashes related to CEPH but only on xcp-ng hosts that aren't patched, for example this is one of the bugs I am hitting on unpatched kernel:

                [Fri Oct 13 11:10:32 2023] libceph: osd1 up
                [Fri Oct 13 11:10:34 2023] libceph: osd1 up
                [Fri Oct 13 11:10:39 2023] libceph: osd7 up
                [Fri Oct 13 11:10:40 2023] libceph: osd7 up
                [Fri Oct 13 11:10:41 2023] WARNING: CPU: 6 PID: 32615 at net/ceph/osd_client.c:554 request_reinit+0x128/0x150 [libceph]
                [Fri Oct 13 11:10:41 2023] Modules linked in: btrfs xor zstd_compress lzo_compress raid6_pq zstd_decompress xxhash rbd tun ebtable_filter ebtables ceph libceph rpcsec_gss_krb5 nfsv4 nfs fscache bnx2fc(O) cnic(O) uio fcoe libfcoe libfc scsi_transport_fc bonding bridge 8021q garp mrp stp llc dm_multipath ipt_REJECT nf_reject_ipv4 xt_tcpudp xt_multiport xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c iptable_filter intel_powerclamp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc dm_mod aesni_intel aes_x86_64 crypto_simd cryptd glue_helper sg ipmi_si ipmi_devintf ipmi_msghandler video backlight acpi_power_meter nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc ip_tables x_tables rndis_host cdc_ether usbnet mii hid_generic usbhid hid raid1 md_mod sd_mod ahci libahci xhci_pci igb(O) libata
                [Fri Oct 13 11:10:41 2023]  ixgbe(O) xhci_hcd scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc scsi_dh_alua scsi_mod ipv6 crc_ccitt
                [Fri Oct 13 11:10:41 2023] CPU: 6 PID: 32615 Comm: kworker/6:19 Tainted: G        W  O      4.19.0+1 #1
                [Fri Oct 13 11:10:41 2023] Hardware name: Supermicro Super Server/X12STH-LN4F, BIOS 1.2 06/23/2022
                [Fri Oct 13 11:10:41 2023] Workqueue: ceph-msgr ceph_con_workfn [libceph]
                [Fri Oct 13 11:10:41 2023] RIP: e030:request_reinit+0x128/0x150 [libceph]
                [Fri Oct 13 11:10:41 2023] Code: 5d 41 5e 41 5f c3 48 89 f9 48 c7 c2 b1 77 83 c0 48 c7 c6 96 ad 83 c0 48 c7 c7 98 5b 85 c0 31 c0 e8 ed a8 b9 c0 e9 37 ff ff ff <0f> 0b e9 41 ff ff ff 0f 0b e9 60 ff ff ff 0f 0b 0f 1f 84 00 00 00
                [Fri Oct 13 11:10:41 2023] RSP: e02b:ffffc90045b67b88 EFLAGS: 00010202
                [Fri Oct 13 11:10:41 2023] RAX: 0000000000000002 RBX: ffff8881c6704f00 RCX: ffff8881f27a10e0
                [Fri Oct 13 11:10:41 2023] RDX: ffffffff00000002 RSI: ffff8881c7e97448 RDI: ffff8881c7d5b780
                [Fri Oct 13 11:10:41 2023] RBP: ffff8881c6704700 R08: ffff8881c7e97450 R09: ffff8881c7e97450
                [Fri Oct 13 11:10:41 2023] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8881c7d5b780
                [Fri Oct 13 11:10:41 2023] R13: fffffffffffffffe R14: 0000000000000000 R15: 0000000000000001
                [Fri Oct 13 11:10:41 2023] FS:  0000000000000000(0000) GS:ffff8881f2780000(0000) knlGS:0000000000000000
                [Fri Oct 13 11:10:41 2023] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
                [Fri Oct 13 11:10:41 2023] CR2: 00007f7ef7bf2000 CR3: 0000000136dbc000 CR4: 0000000000040660
                [Fri Oct 13 11:10:41 2023] Call Trace:
                [Fri Oct 13 11:10:41 2023]  send_linger+0x55/0x200 [libceph]
                [Fri Oct 13 11:10:41 2023]  ceph_osdc_handle_map+0x4e7/0x6b0 [libceph]
                [Fri Oct 13 11:10:41 2023]  dispatch+0x2ff/0xbc0 [libceph]
                [Fri Oct 13 11:10:41 2023]  ? read_partial_message+0x265/0x810 [libceph]
                [Fri Oct 13 11:10:41 2023]  ? ceph_tcp_recvmsg+0x6f/0xa0 [libceph]
                [Fri Oct 13 11:10:41 2023]  ceph_con_workfn+0xa51/0x24f0 [libceph]
                [Fri Oct 13 11:10:41 2023]  ? xen_hypercall_xen_version+0xa/0x20
                [Fri Oct 13 11:10:41 2023]  ? xen_hypercall_xen_version+0xa/0x20
                [Fri Oct 13 11:10:41 2023]  ? __switch_to_asm+0x34/0x70
                [Fri Oct 13 11:10:41 2023]  ? xen_force_evtchn_callback+0x9/0x10
                [Fri Oct 13 11:10:41 2023]  ? check_events+0x12/0x20
                [Fri Oct 13 11:10:41 2023]  process_one_work+0x165/0x370
                [Fri Oct 13 11:10:41 2023]  worker_thread+0x49/0x3e0
                [Fri Oct 13 11:10:41 2023]  kthread+0xf8/0x130
                [Fri Oct 13 11:10:41 2023]  ? rescuer_thread+0x310/0x310
                [Fri Oct 13 11:10:41 2023]  ? kthread_bind+0x10/0x10
                [Fri Oct 13 11:10:41 2023]  ret_from_fork+0x1f/0x40
                [Fri Oct 13 11:10:41 2023] ---[ end trace 1ac50e4ca0f4e449 ]---
                [Fri Oct 13 11:10:50 2023] libceph: osd4 up
                [Fri Oct 13 11:10:50 2023] libceph: osd4 up
                [Fri Oct 13 11:10:51 2023] WARNING: CPU: 11 PID: 3500 at net/ceph/osd_client.c:554 request_reinit+0x128/0x150 [libceph]
                [Fri Oct 13 11:10:51 2023] Modules linked in: btrfs xor zstd_compress lzo_compress raid6_pq zstd_decompress xxhash rbd tun ebtable_filter ebtables ceph libceph rpcsec_gss_krb5 nfsv4 nfs fscache bnx2fc(O) cnic(O) uio fcoe libfcoe libfc scsi_transport_fc bonding bridge 8021q garp mrp stp llc dm_multipath ipt_REJECT nf_reject_ipv4 xt_tcpudp xt_multiport xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c iptable_filter intel_powerclamp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc dm_mod aesni_intel aes_x86_64 crypto_simd cryptd glue_helper sg ipmi_si ipmi_devintf ipmi_msghandler video backlight acpi_power_meter nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc ip_tables x_tables rndis_host cdc_ether usbnet mii hid_generic usbhid hid raid1 md_mod sd_mod ahci libahci xhci_pci igb(O) libata
                [Fri Oct 13 11:10:51 2023]  ixgbe(O) xhci_hcd scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc scsi_dh_alua scsi_mod ipv6 crc_ccitt
                [Fri Oct 13 11:10:51 2023] CPU: 11 PID: 3500 Comm: kworker/11:16 Tainted: G        W  O      4.19.0+1 #1
                [Fri Oct 13 11:10:51 2023] Hardware name: Supermicro Super Server/X12STH-LN4F, BIOS 1.2 06/23/2022
                [Fri Oct 13 11:10:51 2023] Workqueue: ceph-msgr ceph_con_workfn [libceph]
                [Fri Oct 13 11:10:51 2023] RIP: e030:request_reinit+0x128/0x150 [libceph]
                [Fri Oct 13 11:10:51 2023] Code: 5d 41 5e 41 5f c3 48 89 f9 48 c7 c2 b1 77 83 c0 48 c7 c6 96 ad 83 c0 48 c7 c7 98 5b 85 c0 31 c0 e8 ed a8 b9 c0 e9 37 ff ff ff <0f> 0b e9 41 ff ff ff 0f 0b e9 60 ff ff ff 0f 0b 0f 1f 84 00 00 00
                [Fri Oct 13 11:10:51 2023] RSP: e02b:ffffc900461d7b88 EFLAGS: 00010202
                [Fri Oct 13 11:10:51 2023] RAX: 0000000000000002 RBX: ffff8881c62b6d00 RCX: 0000000000000000
                [Fri Oct 13 11:10:51 2023] RDX: ffff8881c59c0740 RSI: ffff888137a50200 RDI: ffff8881c59c04a0
                [Fri Oct 13 11:10:51 2023] RBP: ffff8881c62b6b00 R08: ffff8881f17c2e00 R09: ffff8881f162ba00
                [Fri Oct 13 11:10:51 2023] R10: 0000000000000000 R11: 000000000000cb1b R12: ffff8881c59c04a0
                [Fri Oct 13 11:10:51 2023] R13: fffffffffffffffe R14: 0000000000000000 R15: 0000000000000001
                [Fri Oct 13 11:10:51 2023] FS:  0000000000000000(0000) GS:ffff8881f28c0000(0000) knlGS:0000000000000000
                [Fri Oct 13 11:10:51 2023] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
                [Fri Oct 13 11:10:51 2023] CR2: 00007fa6188896c8 CR3: 00000001b7014000 CR4: 0000000000040660
                [Fri Oct 13 11:10:51 2023] Call Trace:
                [Fri Oct 13 11:10:51 2023]  send_linger+0x55/0x200 [libceph]
                [Fri Oct 13 11:10:51 2023]  ceph_osdc_handle_map+0x4e7/0x6b0 [libceph]
                [Fri Oct 13 11:10:51 2023]  dispatch+0x2ff/0xbc0 [libceph]
                [Fri Oct 13 11:10:51 2023]  ? read_partial_message+0x265/0x810 [libceph]
                [Fri Oct 13 11:10:51 2023]  ? ceph_tcp_recvmsg+0x6f/0xa0 [libceph]
                [Fri Oct 13 11:10:51 2023]  ceph_con_workfn+0xa51/0x24f0 [libceph]
                [Fri Oct 13 11:10:51 2023]  ? check_preempt_curr+0x84/0x90
                [Fri Oct 13 11:10:51 2023]  ? ttwu_do_wakeup+0x19/0x140
                [Fri Oct 13 11:10:51 2023]  process_one_work+0x165/0x370
                [Fri Oct 13 11:10:51 2023]  worker_thread+0x49/0x3e0
                [Fri Oct 13 11:10:51 2023]  kthread+0xf8/0x130
                [Fri Oct 13 11:10:51 2023]  ? rescuer_thread+0x310/0x310
                [Fri Oct 13 11:10:51 2023]  ? kthread_bind+0x10/0x10
                [Fri Oct 13 11:10:51 2023]  ret_from_fork+0x1f/0x40
                [Fri Oct 13 11:10:51 2023] ---[ end trace 1ac50e4ca0f4e44a ]---
                [Fri Oct 13 11:11:00 2023] rbd: rbd1: no lock owners detected
                [Fri Oct 13 11:11:07 2023] rbd: rbd1: no lock owners detected
                

                patched hosts don't see any of them, logs just say:

                [Fri Oct 13 11:11:47 2023] libceph: osd1 up
                [Fri Oct 13 11:11:54 2023] libceph: osd7 up
                [Fri Oct 13 11:12:05 2023] libceph: osd4 up
                

                And this is only the less severe crash, I am sometimes facing crashes that make CEPH completely inaccessible on unpatched hypervisors, requiring host reboot.

                I strongly recommend incorporating these patches to anyone who is using CEPH with xcp-ng

                1 Reply Last reply Reply Quote 0
                • olivierlambertO Online
                  olivierlambert Vates 🪐 Co-Founder CEO
                  last edited by

                  @stormi so it sounds reasonable to integrate it into 8.3 then?

                  stormiS 1 Reply Last reply Reply Quote 0
                  • stormiS Offline
                    stormi Vates 🪐 XCP-ng Team @olivierlambert
                    last edited by

                    @olivierlambert I think so. ceph.ko anyway isn't a core module for XCP-ng, so I don't think there's a high risk in patching it, especially with patches coming from upstream kernel.org.

                    Regarding the initial question, XCP-ng 8.3 will also have a newer kernel-alt. However, I don't recommend it for production, because it is a lot less tested in the context of XCP-ng.

                    1 Reply Last reply Reply Quote 0
                    • First post
                      Last post