XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Live migrate of Rocky Linux 8.8 VM crashes/reboots VM

    Scheduled Pinned Locked Moved Solved Compute
    20 Posts 7 Posters 2.6k Views 8 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • stormiS Offline
      stormi Vates 🪐 XCP-ng Team @Weppel
      last edited by

      @Weppel It looks like you wrote KVM instead of Xen in the title.

      W 1 Reply Last reply Reply Quote 0
      • W Offline
        Weppel @stormi
        last edited by

        @stormi Thanks for noticing, my bad

        1 Reply Last reply Reply Quote 0
        • stormiS Offline
          stormi Vates 🪐 XCP-ng Team
          last edited by

          Adding @bleader to the discussion. He's trying to debug the issue.

          It looks like a simple VM suspend + resume also crashes the VM. Do you confirm, @Weppel?

          W 1 Reply Last reply Reply Quote 0
          • W Offline
            Weppel @stormi
            last edited by

            @stormi Confirmed

            1 Reply Last reply Reply Quote 3
            • bleaderB Offline
              bleader Vates 🪐 XCP-ng Team
              last edited by

              So, after our investigations, we were able to pinpoint the issue.

              It seem to happen on most RHEL derivative distributions when migrating from 8.7 to 8.8. As suggested, the bug is in the kernel.

              Starting with 4.18.0-466.el8 the patch: x86/idt: Annotate alloc_intr_gate() with __init is integrated and will create the issue. It is missing x86/xen: Split HVM vector callback setup and interrupt gate allocation that should have been integrated as well.

              The migration to 8.8 will move you to 4.18.0-477.* versions that are also raising this issue, that's what you reported.

              We found that the 4.18.0-488 that can be found in CentOS 8 Stream integrates the missing patch, and do indeed work when installed manually.

              Your report helped us identify and reproduce the issues. That allowed us to provide a callstack to Xen devs. Then Roger Pau Monné found that it was this patch missing quickly, and we were able to find which versions of the kernel RPMs were integrating it and when the fix was integrated.

              This means the issue was identified on RH side, and it is now a matter of having an updated kernel in derivative distributions like Rocky and Alma.

              W 1 Reply Last reply Reply Quote 3
              • W Offline
                Weppel @bleader
                last edited by

                @bleader Thank you very much for the quick discovery of this, impressive work! I'm glad I could help!

                1 Reply Last reply Reply Quote 2
                • olivierlambertO olivierlambert marked this topic as a question on
                • olivierlambertO olivierlambert has marked this topic as solved on
                • W Offline
                  Weppel
                  last edited by

                  FYI this is not fixed yet in the latest EL kernel 4.18.0-477.15.1.el8_8.x86_64

                  1 Reply Last reply Reply Quote 1
                  • olivierlambertO Offline
                    olivierlambert Vates 🪐 Co-Founder CEO
                    last edited by

                    Thanks for keeping us posted…

                    Hopefully things will be fixed at some point. Maybe Red Hat is focused on doing other things right now…

                    1 Reply Last reply Reply Quote 0
                    • bleaderB Offline
                      bleader Vates 🪐 XCP-ng Team
                      last edited by

                      Makes sense, I was hoping they would go with a -488 on update... Sorry to hear its not the case 😞

                      Q 1 Reply Last reply Reply Quote 0
                      • Q Offline
                        qnx @bleader
                        last edited by

                        @bleader Unfortunately, I think we're stuck on -477 until EL 8.9 comes out 😕

                        it's frustrating that it's taking them so long to fix this, especially when it seems like the bug was caused by a human error to begin with.

                        1 Reply Last reply Reply Quote 0
                        • bleaderB Offline
                          bleader Vates 🪐 XCP-ng Team
                          last edited by

                          Not entirely sure, but that may be related to what's happening on redhat side of things 😕

                          1 Reply Last reply Reply Quote 0
                          • K Offline
                            KFC-Netearth
                            last edited by

                            Just installed the latest kernel from Rocky and a live migrate seems to work on the 2 dev servers I have tried so far :
                            4.18.0-477.21.1.el8_8.x86_64 #1 SMP Tue Aug 8 21:30:09 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

                            bleaderB W 2 Replies Last reply Reply Quote 1
                            • bleaderB Offline
                              bleader Vates 🪐 XCP-ng Team @KFC-Netearth
                              last edited by

                              @KFC-Netearth Surprising as it is still .477 but maybe the patch was backported, thanks for telling us!

                              1 Reply Last reply Reply Quote 0
                              • W Offline
                                Weppel @KFC-Netearth
                                last edited by

                                @KFC-Netearth

                                The Rocky Linux bugtracker indeed mentions it's mostly fixed, but there are still some kernel errors present: https://bugs.rockylinux.org/view.php?id=3565#c4293

                                1 Reply Last reply Reply Quote 0
                                • First post
                                  Last post