XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Applied recent patches ... Now getting CPU errors

    Scheduled Pinned Locked Moved Compute
    19 Posts 5 Posters 1.7k Views 2 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • J Offline
      jcdick1 @stormi
      last edited by jcdick1

      @stormi I guess I don't know how to do that. Using the console "Resource Pool Configuration" -> "Designate a new pool master" results in a "This host is not a Pool member" message for both nodes.

      Edit: I found an older post with the xe commands for emergency transition.

      1 Reply Last reply Reply Quote 0
      • J Offline
        jcdick1 @stormi
        last edited by

        @stormi I got a new master, and recovered the other slave. I did a fresh reinstall of the 8.2.0 ISO and the host came up just fine. So there's definitely something in the new code that breaks these machines.

        1 Reply Last reply Reply Quote 0
        • stormiS Offline
          stormi Vates 🪐 XCP-ng Team
          last edited by

          Since reverting to the previous kernel did not solve your issue, I suspect this might be due to the microcode update.

          But this microcode comes from Intel directly so it's surprising (although not impossible, they do break things from time to time).

          If it's the microcode, this would also mean that you had not updated in a long time as the recent updates train did not contain any such microcode update.

          You may try to update everything but the microcode. You could add something like this (untested) in /etc/yum.conf:

          exclude=microcode_ctl*
          

          And then update everything else.

          J 2 Replies Last reply Reply Quote 0
          • J Offline
            jcdick1 @stormi
            last edited by

            @stormi I'll try that. If I update via CLI, I think I can use "yum update --exclude=microcode_ctl*" and see what happens. If I remember correctly, it was stating 22 patches missing when I started this process with the patches released last week. Now the fresh install is stating 43 missing patches.

            1 Reply Last reply Reply Quote 0
            • J Offline
              jcdick1 @stormi
              last edited by

              @stormi

              Just as an FYI, the latest patches - 20190314-2.xcp - seem to not have an issue. The system patched and booted up just fine with no issues bringing up CPUs or anything like that.

              stormiS 1 Reply Last reply Reply Quote 0
              • stormiS Offline
                stormi Vates 🪐 XCP-ng Team @jcdick1
                last edited by

                @jcdick1 They only update microcode for Fam 17h and 19h AMD CPUs 🙂

                J 1 Reply Last reply Reply Quote 0
                • J Offline
                  jcdick1 @stormi
                  last edited by

                  @stormi Yeah, I spoke too soon. Two of my hosts came right up fine after the patches. The third has been in a boot loop with CPU panics and fatal page faults.

                  DanpD 1 Reply Last reply Reply Quote 0
                  • DanpD Offline
                    Danp Pro Support Team @jcdick1
                    last edited by

                    @jcdick1 Could it be the same problem described here?

                    J 1 Reply Last reply Reply Quote 0
                    • J Offline
                      jcdick1 @Danp
                      last edited by jcdick1

                      @Danp No, but I seem to have figured it out. Its a weird one, considering the platform.

                      These are HP DL360s, with fully licensed iLOs. But ... the CPU errors are gone if I physically connect a keyboard to the server. If I reboot just monitoring via remote console in the iLO, I get the errors. If I go to the machines and connect a USB keyboard before the reboot, then go back to my workstation and do it all through XO and watch via remote console, they come up fine. My post earlier about two coming up fine, I remembered that coincidentally, I'd switched the KVM to them.

                      So just an FYI to anyone who might have the same problem, plug in a keyboard.

                      1 Reply Last reply Reply Quote 0
                      • A Offline
                        Andrew Top contributor @jcdick1
                        last edited by

                        @jcdick1 I have almost the same servers and everything has been fine (without keyboards)...

                        HP DL360p G8 with E5-2680 v2 and also some with E5-2680 (not v2). I did have one machine have problems but it was hardware failure (CPU fault), not XCP.

                        These HP G8 machines are about a decade old now, so hardware issues are not a surprise.

                        1 Reply Last reply Reply Quote 0
                        • First post
                          Last post