XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Issue after latest host update

    Scheduled Pinned Locked Moved XCP-ng
    57 Posts 9 Posters 8.8k Views 9 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • RealTehrealR Offline
      RealTehreal @john.c
      last edited by

      @john-c
      Model: FUJITSU FUTRO S740/D3544-A1
      BIOS: V5.0.0.13 R1.13.0 for D3544-A1x (09/23/2022)

      J 1 Reply Last reply Reply Quote 0
      • J Offline
        john.c @RealTehreal
        last edited by

        @RealTehreal said in Issue after latest host update:

        @john-c
        Model: FUJITSU FUTRO S740/D3544-A1
        BIOS: V5.0.0.13 R1.13.0 for D3544-A1x (09/23/2022)

        Thanks that will help. As it enables identification if there's any issues, specific to that device. As well as its specific included CPU and its functions and features, especially its instruction set capabilities.

        RealTehrealR 1 Reply Last reply Reply Quote 0
        • RealTehrealR Offline
          RealTehreal @john.c
          last edited by

          @john-c All such information should be available in the dmesg file in post: https://xcp-ng.org/forum/post/74791

          Any ideas on how to revert the update? I would really like to have the setup running again. It may be "just" a home lab, but I was still using it (at least semi-) productively...

          1 Reply Last reply Reply Quote 0
          • RealTehrealR Offline
            RealTehreal
            last edited by

            I'd be even fine to only use two machines and keep one of them offline for further testing.

            1 Reply Last reply Reply Quote 0
            • RealTehrealR Offline
              RealTehreal
              last edited by

              For reference: I now decided to use a less intrusive approach and changed the default boot entry in grub config to the working failover entry. I will now try to get the pool up again.

              1 Reply Last reply Reply Quote 0
              • olivierlambertO Online
                olivierlambert Vates 🪐 Co-Founder CEO
                last edited by

                What's the CPU on this? I would suspect a micro code update issue then.

                1 Reply Last reply Reply Quote 0
                • olivierlambertO Online
                  olivierlambert Vates 🪐 Co-Founder CEO
                  last edited by

                  Could be related: https://xcp-ng.org/forum/topic/8736/wyse-5070-vm-won-t-booting-after-update-bios-1-27

                  RealTehrealR 1 Reply Last reply Reply Quote 0
                  • RealTehrealR Offline
                    RealTehreal @olivierlambert
                    last edited by

                    @olivierlambert Following info from /proc/cpuinfo:
                    Intel(R) Celeron(R) J4105 CPU @ 1.50GHz

                    True enough, regarding the Wyse topic. I'll try reverting only the microcode update and see, what happens.

                    1 Reply Last reply Reply Quote 0
                    • olivierlambertO Online
                      olivierlambert Vates 🪐 Co-Founder CEO
                      last edited by

                      @RealTehreal said in Issue after latest host update:

                      Intel(R) Celeron(R) J4105 CPU @ 1.50GHz

                      Another Gemini Lake… So it's clearly related.

                      RealTehrealR J 3 Replies Last reply Reply Quote 0
                      • RealTehrealR Offline
                        RealTehreal @olivierlambert
                        last edited by

                        @olivierlambert Yep, I can confirm that in this case the microcode update is the culprit, too.

                        I just downgraded
                        microcode_ctl-2.1-26.xs28.1.xcpng8.2.x86_64
                        to
                        microcode_ctl-2.1-26.xs26.2.xcpng8.2.x86_64

                        and it's working again. Man, what a mess.

                        RealTehrealR 1 Reply Last reply Reply Quote 0
                        • RealTehrealR Offline
                          RealTehreal @RealTehreal
                          last edited by RealTehreal

                          @RealTehreal
                          Step-by-step instructions, in case, someone else has the same issue:

                          1.: yum history list to get the transaction id of the last update.

                          2.: yum history info # with # being the id from step 1, to list the updates done in this transaction. The interesting part for me was

                          Updated microcode_ctl-2:2.1-26.xs26.2.xcpng8.2.x86_64  
                          Update                2:2.1-26.xs28.1.xcpng8.2.x86_64
                          

                          3.:yum downgrade microcode_ctl-2:2.1-26.xs26.2.xcpng8.2.x86_64 to downgrade to the previous version. You will have to enter the older version for this command.

                          4.: Wait until it's done, reboot, test, pray it'll work again.

                          This is just a workaround! Microcode updates are important security and/or functional updates. Downgrading can lead to security issues.

                          nikadeN 1 Reply Last reply Reply Quote 3
                          • RealTehrealR Offline
                            RealTehreal @olivierlambert
                            last edited by

                            @olivierlambert Thank you very much for pointing out the real issue.

                            1 Reply Last reply Reply Quote 0
                            • RealTehrealR Offline
                              RealTehreal
                              last edited by RealTehreal

                              What should happen now? Who should be informed about this issue with the microcode update? Is it still a XCP-NG issue, a Linux issue, or an Intel issue? Thank you in advance for clarification.

                              A 1 Reply Last reply Reply Quote 0
                              • A Offline
                                andyhhp Xen Guru @RealTehreal
                                last edited by

                                @RealTehreal It's an Intel issue, but while this is enough to show that there is an issue, it's not enough to figure out what is wrong.

                                Sadly, a VM falling into a busy loop can be one of many things. It's clearly on the (v)BSP prior to starting (v)APs, hence why it's only ever a single CPU spinning.

                                Can you switch to using the debug hypervisor (change the /boot/xen.gz symlink to point at the -d suffixed hypervisor), and then capture xl dmesg after trying to boot one VM. Depending on how broken things are, we might see some diagnostics.

                                Could you also try running xtf as described here: https://xcp-ng.org/forum/post/57804 It's a long-shot, but if it does happen to stumble on the issue, then it will be orders of magnitude easier to debug than something misc broken in the middle of OVMF.

                                RealTehrealR 2 Replies Last reply Reply Quote 1
                                • RealTehrealR Offline
                                  RealTehreal @andyhhp
                                  last edited by

                                  @andyhhp Sure thing. I'll just need some time, as I can only do such things in my free time.

                                  A 1 Reply Last reply Reply Quote 1
                                  • nikadeN Offline
                                    nikade Top contributor @RealTehreal
                                    last edited by

                                    @RealTehreal said in Issue after latest host update:

                                    @RealTehreal
                                    Step-by-step instructions, in case, someone else has the same issue:

                                    1.: yum history list to get the transaction id of the last update.

                                    2.: yum history info # with # being the id from step 1, to list the updates done in this transaction. The interesting part for me was

                                    Updated microcode_ctl-2:2.1-26.xs26.2.xcpng8.2.x86_64  
                                    Update                2:2.1-26.xs28.1.xcpng8.2.x86_64
                                    

                                    3.:yum downgrade microcode_ctl-2:2.1-26.xs26.2.xcpng8.2.x86_64 to downgrade to the previous version. You will have to enter the older version for this command.

                                    4.: Wait until it's done, reboot, test, pray it'll work again.

                                    This is just a workaround! Microcode updates are important security and/or functional updates. Downgrading can lead to security issues.

                                    Thanks for sharing the resolution, im sure it will help someone else in the future.

                                    M 2 Replies Last reply Reply Quote 0
                                    • J Offline
                                      john.c @olivierlambert
                                      last edited by john.c

                                      @olivierlambert said in Issue after latest host update:

                                      @RealTehreal said in Issue after latest host update:

                                      Intel(R) Celeron(R) J4105 CPU @ 1.50GHz

                                      Another Gemini Lake… So it's clearly related.

                                      I had already found this out (its code name) then unfortunately things got busy so was unable to check the microcode notes or post this to the forum. It was without using cat /proc/cpuinfo.

                                      It was from the CPU listed on this web page (https://www.fujitsu.com/uk/products/computing/pc/thin-clients/futro-s740/). Then using Intel Ark on the Intel Celeron processor J4105 revealed it's code name along with a whole wealth of other useful information (https://ark.intel.com/content/www/us/en/ark/products/128989/intel-celeron-j4105-processor-4m-cache-up-to-2-50-ghz.html).

                                      1 Reply Last reply Reply Quote 0
                                      • A Offline
                                        andyhhp Xen Guru @RealTehreal
                                        last edited by andyhhp

                                        @RealTehreal In addition to the XTF testing, could you also please try (with the bad microcode) booting Xen with spec-ctrl=no-verw on the command line, and seeing whether that changes the behaviour of your regular VMs? Please capture xl dmesg from this run too.

                                        1 Reply Last reply Reply Quote 1
                                        • stormiS Offline
                                          stormi Vates 🪐 XCP-ng Team
                                          last edited by

                                          Doc about XTF testing: https://docs.xcp-ng.org/project/development-process/tests/#test-the-xen-hypervisor-itself

                                          1 Reply Last reply Reply Quote 1
                                          • RealTehrealR Offline
                                            RealTehreal
                                            last edited by

                                            I'll do the testing on the weekend.

                                            A 1 Reply Last reply Reply Quote 1
                                            • First post
                                              Last post