XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Slow boot on rocky linux 10 latest kernel

    Scheduled Pinned Locked Moved Compute
    25 Posts 7 Posters 1.2k Views 6 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • poddingueP Offline
      poddingue Vates 🪐 @henri9813
      last edited by

      I'm not familiar with the Rocky 10 kernel internals, but a spinlock storm on a vCPU during boot often comes from how the guest kernel handles its paravirtual clock/timer with the host, rather than anything Rocky-specific.
      Could you share your XCP-ng version, and whether an older RL10 kernel (or an RL9 one) boots normally on the same VM? That would tell us if it's genuinely the latest kernel that regressed.
      If it only happens on that kernel and it's hard to test elsewhere, a mention to @Team-Hypervisor-Kernel might help, since they can check whether it reproduces on their side.

      D 1 Reply Last reply Reply Quote 0
      • D Offline
        dinhngtu Vates 🪐 XCP-ng Team @poddingue
        last edited by

        @poddingue FWIW I've seen the same on FC44 as well, but I don't yet know where it came from.

        1 Reply Last reply Reply Quote 1
        • TeddyAstieT Offline
          TeddyAstie Vates 🪐 XCP-ng Team Xen Guru
          last edited by TeddyAstie

          Can reproduce on Fedora 44 and Alpine Linux (6.18.22-0-virt).
          But doesn't occur on Debian 13 (6.12).

          acebmxerA 1 Reply Last reply Reply Quote 0
          • acebmxerA Offline
            acebmxer @TeddyAstie
            last edited by acebmxer

            @TeddyAstie

            From Debian 13 cloud inti. Does on every reboot from fresh image.
            Screenshot_20260607_064937.png

            added more vcpus.
            Screenshot_20260607_065144.png

            TeddyAstieT 1 Reply Last reply Reply Quote 0
            • TeddyAstieT Offline
              TeddyAstie Vates 🪐 XCP-ng Team Xen Guru @acebmxer
              last edited by

              @acebmxer which kernel version you have in your Debian guest (uname -a) ?

              acebmxerA M 2 Replies Last reply Reply Quote 0
              • acebmxerA Offline
                acebmxer @TeddyAstie
                last edited by acebmxer

                @TeddyAstie

                This is on fresh install Debian 13 deployed from XO Hub - 6.12.38+deb13-amd64. I do not see this behavior on Ubuntu.

                after update 6.12.90+deb13.1-amd64
                still happens.
                Screenshot_20260607_071338.png

                Screenshot_20260607_071824.png

                TeddyAstieT 1 Reply Last reply Reply Quote 0
                • TeddyAstieT Offline
                  TeddyAstie Vates 🪐 XCP-ng Team Xen Guru @acebmxer
                  last edited by TeddyAstie

                  @acebmxer I don't observe the same issue on Debian 13 Cloud-Init (both 6.12.38+deb13-amd64 and updated 6.12.90+deb13.1-amd64).

                  Though it still takes some time to boot (especially at loading the ramdisk) but it's not related to this pv spinlock issue and mostly a "BIOS guest" related issue.
                  But I'm testing on a Intel machine.

                  henri9813H 1 Reply Last reply Reply Quote 0
                  • henri9813H Offline
                    henri9813 @TeddyAstie
                    last edited by henri9813

                    Hello,

                    To gave more details abouit my case:

                    • XCPNG: 8.3.0
                    • CPU: AMD EPYC 4464P 12-Core Processor
                    • Kernel: 6.12.0-211.18.1.el10_2.x86_64
                    1 Reply Last reply Reply Quote 0
                    • acebmxerA Offline
                      acebmxer
                      last edited by

                      I was going to suggest it might be a amd issue. I can try later on work host that are intel.

                      1 Reply Last reply Reply Quote 0
                      • M Offline
                        MajorP93 @TeddyAstie
                        last edited by

                        @TeddyAstie These spinlock events that cause slow boot also happen on my Debian VMs. Also AMD Epyc CPU in my case.

                        I noticed that this only happens for UEFI enabled VMs. VMs booting in BIOS mode do not have this issue.

                        TeddyAstieT 1 Reply Last reply Reply Quote 0
                        • TeddyAstieT Offline
                          TeddyAstie Vates 🪐 XCP-ng Team Xen Guru @MajorP93
                          last edited by

                          @MajorP93 can you give the kernel version of all the affected vs non-affected guests ?

                          1 Reply Last reply Reply Quote 0
                          • olivierlambertO Online
                            olivierlambert Vates 🪐 Co-Founder CEO
                            last edited by

                            I can reproduce on Debian 13 with stock kernel and a Ryzen 5 7600.

                            1 Reply Last reply Reply Quote 0
                            • olivierlambertO Online
                              olivierlambert Vates 🪐 Co-Founder CEO
                              last edited by

                              I'm currently bisecting various kernel builds, I think I'm close to find the culprit.

                              1 Reply Last reply Reply Quote 0
                              • olivierlambertO Online
                                olivierlambert Vates 🪐 Co-Founder CEO
                                last edited by

                                Luckily I have enough cores to build a kernel in 2 minutes, because it's been a LOT of them already built to find the culprit 😅

                                1 Reply Last reply Reply Quote 1
                                • olivierlambertO Online
                                  olivierlambert Vates 🪐 Co-Founder CEO
                                  last edited by olivierlambert

                                  • 6.12.90 -> bad
                                  • 6.12.45 -> bad
                                  • 6.12.22 -> bad
                                  • 6.12.11 -> bad
                                  • 6.12.5 -> bad

                                  BUT 6.12.2 is GOOD! 😓 Almost there!

                                  edit: now 6.12.3 is also good. Getting close…

                                  edit: since 6.12.4 is good, then the issue is within 6.12.5. Investigating now.

                                  1 Reply Last reply Reply Quote 4
                                  • olivierlambertO Online
                                    olivierlambert Vates 🪐 Co-Founder CEO
                                    last edited by olivierlambert

                                    Found the culprit, made & tested a patch that works.

                                    Recap at https://notes.vates.tech/share/v5jtq0iytw/p/slow-hvm-boot-on-linux-6-12-5wrvOvZKJ7

                                    I will let the rest of my team to try to get it fixed in upstream.

                                    M 1 Reply Last reply Reply Quote 1
                                    • M Offline
                                      MajorP93 @olivierlambert
                                      last edited by

                                      @olivierlambert Awesome work on tracking down the issue!
                                      Very nice, detailed, technical writeup.

                                      1 Reply Last reply Reply Quote 1
                                      • olivierlambertO Online
                                        olivierlambert Vates 🪐 Co-Founder CEO
                                        last edited by

                                        Ping @Team-Hypervisor-Kernel for reference.

                                        1 Reply Last reply Reply Quote 0
                                        • TeddyAstieT Offline
                                          TeddyAstie Vates 🪐 XCP-ng Team Xen Guru
                                          last edited by

                                          @majorp93 @henri9813 @acebmxer
                                          Do you observe the same behavior after setting this for the VM ?

                                          xe vm-param-add uuid=$UUID param-name=platform tsc_mode=2
                                          xe vm-param-add uuid=$UUID param-name=platform nomigrate=true
                                          

                                          (beware you lose live migration support doing this, you can cancel these changes with matching vm-param-remove like xe vm-param-remove uuid=$UUID param-name=platform param-key=nomigrate)

                                          M 1 Reply Last reply Reply Quote 0
                                          • M Offline
                                            MajorP93 @TeddyAstie
                                            last edited by MajorP93

                                            @TeddyAstie said:

                                            param-name=platform nomigrate=true

                                            Hi @teddyastie , thanks for working on this.

                                            As per policy I am not allowed to test these parameters in production which is why I had to create a small test setup for being able to try your settings.

                                            I deployed a Debian 13 VM via Cloud-Init on a XCP-ng test host using the official Debian 13 cloud image.

                                            After deploying the VM I had the issue of slow boot.

                                            After shutting the VM down, applying the settings that you just sent and starting it again I can say that you are on the right track!

                                            In my case the boot time is completely normal now and on par with Debian 13 VMs that use BIOS instead of UEFI (for booting).

                                            As this is a workaround and disables live migration this is not an option for production environments but good to have a workaround available anyways for sure!

                                            Do you think it is possible to fix this on hypervisor level while still having live migration etc. enabled or do we have to wait for an upstream fix within Linux kernel tree?

                                            TeddyAstieT 1 Reply Last reply Reply Quote 0

                                            Hello! It looks like you're interested in this conversation, but you don't have an account yet.

                                            Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.

                                            With your input, this post could be even better 💗

                                            Register Login
                                            • First post
                                              Last post