XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Slow boot on rocky linux 10 latest kernel

    Scheduled Pinned Locked Moved Compute
    25 Posts 7 Posters 1.2k Views 6 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • D Offline
      dinhngtu Vates 🪐 XCP-ng Team @poddingue
      last edited by

      @poddingue FWIW I've seen the same on FC44 as well, but I don't yet know where it came from.

      1 Reply Last reply Reply Quote 1
      • TeddyAstieT Offline
        TeddyAstie Vates 🪐 XCP-ng Team Xen Guru
        last edited by TeddyAstie

        Can reproduce on Fedora 44 and Alpine Linux (6.18.22-0-virt).
        But doesn't occur on Debian 13 (6.12).

        acebmxerA 1 Reply Last reply Reply Quote 0
        • acebmxerA Online
          acebmxer @TeddyAstie
          last edited by acebmxer

          @TeddyAstie

          From Debian 13 cloud inti. Does on every reboot from fresh image.
          Screenshot_20260607_064937.png

          added more vcpus.
          Screenshot_20260607_065144.png

          TeddyAstieT 1 Reply Last reply Reply Quote 0
          • TeddyAstieT Offline
            TeddyAstie Vates 🪐 XCP-ng Team Xen Guru @acebmxer
            last edited by

            @acebmxer which kernel version you have in your Debian guest (uname -a) ?

            acebmxerA M 2 Replies Last reply Reply Quote 0
            • acebmxerA Online
              acebmxer @TeddyAstie
              last edited by acebmxer

              @TeddyAstie

              This is on fresh install Debian 13 deployed from XO Hub - 6.12.38+deb13-amd64. I do not see this behavior on Ubuntu.

              after update 6.12.90+deb13.1-amd64
              still happens.
              Screenshot_20260607_071338.png

              Screenshot_20260607_071824.png

              TeddyAstieT 1 Reply Last reply Reply Quote 0
              • TeddyAstieT Offline
                TeddyAstie Vates 🪐 XCP-ng Team Xen Guru @acebmxer
                last edited by TeddyAstie

                @acebmxer I don't observe the same issue on Debian 13 Cloud-Init (both 6.12.38+deb13-amd64 and updated 6.12.90+deb13.1-amd64).

                Though it still takes some time to boot (especially at loading the ramdisk) but it's not related to this pv spinlock issue and mostly a "BIOS guest" related issue.
                But I'm testing on a Intel machine.

                henri9813H 1 Reply Last reply Reply Quote 0
                • henri9813H Offline
                  henri9813 @TeddyAstie
                  last edited by henri9813

                  Hello,

                  To gave more details abouit my case:

                  • XCPNG: 8.3.0
                  • CPU: AMD EPYC 4464P 12-Core Processor
                  • Kernel: 6.12.0-211.18.1.el10_2.x86_64
                  1 Reply Last reply Reply Quote 0
                  • acebmxerA Online
                    acebmxer
                    last edited by

                    I was going to suggest it might be a amd issue. I can try later on work host that are intel.

                    1 Reply Last reply Reply Quote 0
                    • M Offline
                      MajorP93 @TeddyAstie
                      last edited by

                      @TeddyAstie These spinlock events that cause slow boot also happen on my Debian VMs. Also AMD Epyc CPU in my case.

                      I noticed that this only happens for UEFI enabled VMs. VMs booting in BIOS mode do not have this issue.

                      TeddyAstieT 1 Reply Last reply Reply Quote 0
                      • TeddyAstieT Offline
                        TeddyAstie Vates 🪐 XCP-ng Team Xen Guru @MajorP93
                        last edited by

                        @MajorP93 can you give the kernel version of all the affected vs non-affected guests ?

                        1 Reply Last reply Reply Quote 0
                        • olivierlambertO Offline
                          olivierlambert Vates 🪐 Co-Founder CEO
                          last edited by

                          I can reproduce on Debian 13 with stock kernel and a Ryzen 5 7600.

                          1 Reply Last reply Reply Quote 0
                          • olivierlambertO Offline
                            olivierlambert Vates 🪐 Co-Founder CEO
                            last edited by

                            I'm currently bisecting various kernel builds, I think I'm close to find the culprit.

                            1 Reply Last reply Reply Quote 0
                            • olivierlambertO Offline
                              olivierlambert Vates 🪐 Co-Founder CEO
                              last edited by

                              Luckily I have enough cores to build a kernel in 2 minutes, because it's been a LOT of them already built to find the culprit 😅

                              1 Reply Last reply Reply Quote 1
                              • olivierlambertO Offline
                                olivierlambert Vates 🪐 Co-Founder CEO
                                last edited by olivierlambert

                                • 6.12.90 -> bad
                                • 6.12.45 -> bad
                                • 6.12.22 -> bad
                                • 6.12.11 -> bad
                                • 6.12.5 -> bad

                                BUT 6.12.2 is GOOD! 😓 Almost there!

                                edit: now 6.12.3 is also good. Getting close…

                                edit: since 6.12.4 is good, then the issue is within 6.12.5. Investigating now.

                                1 Reply Last reply Reply Quote 4
                                • olivierlambertO Offline
                                  olivierlambert Vates 🪐 Co-Founder CEO
                                  last edited by olivierlambert

                                  Found the culprit, made & tested a patch that works.

                                  Recap at https://notes.vates.tech/share/v5jtq0iytw/p/slow-hvm-boot-on-linux-6-12-5wrvOvZKJ7

                                  I will let the rest of my team to try to get it fixed in upstream.

                                  M 1 Reply Last reply Reply Quote 1
                                  • M Offline
                                    MajorP93 @olivierlambert
                                    last edited by

                                    @olivierlambert Awesome work on tracking down the issue!
                                    Very nice, detailed, technical writeup.

                                    1 Reply Last reply Reply Quote 1
                                    • olivierlambertO Offline
                                      olivierlambert Vates 🪐 Co-Founder CEO
                                      last edited by

                                      Ping @Team-Hypervisor-Kernel for reference.

                                      1 Reply Last reply Reply Quote 0
                                      • TeddyAstieT Offline
                                        TeddyAstie Vates 🪐 XCP-ng Team Xen Guru
                                        last edited by

                                        @majorp93 @henri9813 @acebmxer
                                        Do you observe the same behavior after setting this for the VM ?

                                        xe vm-param-add uuid=$UUID param-name=platform tsc_mode=2
                                        xe vm-param-add uuid=$UUID param-name=platform nomigrate=true
                                        

                                        (beware you lose live migration support doing this, you can cancel these changes with matching vm-param-remove like xe vm-param-remove uuid=$UUID param-name=platform param-key=nomigrate)

                                        M 1 Reply Last reply Reply Quote 0
                                        • M Offline
                                          MajorP93 @TeddyAstie
                                          last edited by MajorP93

                                          @TeddyAstie said:

                                          param-name=platform nomigrate=true

                                          Hi @teddyastie , thanks for working on this.

                                          As per policy I am not allowed to test these parameters in production which is why I had to create a small test setup for being able to try your settings.

                                          I deployed a Debian 13 VM via Cloud-Init on a XCP-ng test host using the official Debian 13 cloud image.

                                          After deploying the VM I had the issue of slow boot.

                                          After shutting the VM down, applying the settings that you just sent and starting it again I can say that you are on the right track!

                                          In my case the boot time is completely normal now and on par with Debian 13 VMs that use BIOS instead of UEFI (for booting).

                                          As this is a workaround and disables live migration this is not an option for production environments but good to have a workaround available anyways for sure!

                                          Do you think it is possible to fix this on hypervisor level while still having live migration etc. enabled or do we have to wait for an upstream fix within Linux kernel tree?

                                          TeddyAstieT 1 Reply Last reply Reply Quote 0
                                          • TeddyAstieT Offline
                                            TeddyAstie Vates 🪐 XCP-ng Team Xen Guru @MajorP93
                                            last edited by

                                            @MajorP93 said:
                                            Do you think it is possible to fix this on hypervisor level while still having live migration etc. enabled or do we have to wait for an upstream fix within Linux kernel tree?

                                            Yes it's possible to fix it on the hypervisor level (Invariant TSC in guest), but it's quite a bit of work that still needs to be done. A Linux upstream fix for the underlying bug should come at some point hopefully.

                                            1 Reply Last reply Reply Quote 1

                                            Hello! It looks like you're interested in this conversation, but you don't have an account yet.

                                            Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.

                                            With your input, this post could be even better 💗

                                            Register Login
                                            • First post
                                              Last post