XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login
    1. Home
    2. RealTehreal
    3. Posts
    Offline
    • Profile
    • Following 0
    • Followers 0
    • Topics 1
    • Posts 24
    • Groups 0

    Posts

    Recent Best Controversial
    • RE: Issue after latest host update

      @andyhhp said in Issue after latest host update:

      @RealTehreal Sorry to keep adding to the list of diagnostics, but everything here will help. After you've tried the other options, could you try this:

      If the XTF testing shows any XTF test looping, use that single test, otherwise use your regular VM. Get one VM into the looping state. Check xl list to confirm that you've only got Domain-0 and the one other VM, and note it's domid (the "ID" column).

      In dom0, run xentrace to capture a system trace. It's looping so the dump file is going to be large, but it also means that you can CTRL-C as quickly as you can on the shell and it will be fine (a few hundred milliseconds of samples will almost certainly be enough).

      Anyway, run xentrace -D -e 0x0008f000 xentrace.dmp and then give me created xentrace.dmp file. If you're interested in what's in it, you can decode it using xenalyze -a xentrace.dmp |& less.

      Then, run xen-hvmctx $domid two or three times, and share the output of all.

      I sent you a pm.

      posted in XCP-ng
      RealTehrealR
      RealTehreal
    • RE: Issue after latest host update

      @andyhhp said in Issue after latest host update:

      @RealTehreal It's an Intel issue, but while this is enough to show that there is an issue, it's not enough to figure out what is wrong.

      Sadly, a VM falling into a busy loop can be one of many things. It's clearly on the (v)BSP prior to starting (v)APs, hence why it's only ever a single CPU spinning.

      Can you switch to using the debug hypervisor (change the /boot/xen.gz symlink to point at the -d suffixed hypervisor), and then capture xl dmesg after trying to boot one VM. Depending on how broken things are, we might see some diagnostics.

      Could you also try running xtf as described here: https://xcp-ng.org/forum/post/57804 It's a long-shot, but if it does happen to stumble on the issue, then it will be orders of magnitude easier to debug than something misc broken in the middle of OVMF.

      First things first: here some information.

      xl dmesg with debug kernel, bad microcode and after trying to run a VM: xl_dmesg_bad_microcode.txt

      xtf short: xtf_short.txt

      xtf long: xtf_long.txt

      posted in XCP-ng
      RealTehrealR
      RealTehreal
    • RE: Issue after latest host update

      I'll do the testing on the weekend.

      posted in XCP-ng
      RealTehrealR
      RealTehreal
    • RE: Issue after latest host update

      @andyhhp Sure thing. I'll just need some time, as I can only do such things in my free time.

      posted in XCP-ng
      RealTehrealR
      RealTehreal
    • RE: Issue after latest host update

      What should happen now? Who should be informed about this issue with the microcode update? Is it still a XCP-NG issue, a Linux issue, or an Intel issue? Thank you in advance for clarification.

      posted in XCP-ng
      RealTehrealR
      RealTehreal
    • RE: Issue after latest host update

      @olivierlambert Thank you very much for pointing out the real issue.

      posted in XCP-ng
      RealTehrealR
      RealTehreal
    • RE: Issue after latest host update

      @RealTehreal
      Step-by-step instructions, in case, someone else has the same issue:

      1.: yum history list to get the transaction id of the last update.

      2.: yum history info # with # being the id from step 1, to list the updates done in this transaction. The interesting part for me was

      Updated microcode_ctl-2:2.1-26.xs26.2.xcpng8.2.x86_64  
      Update                2:2.1-26.xs28.1.xcpng8.2.x86_64
      

      3.:yum downgrade microcode_ctl-2:2.1-26.xs26.2.xcpng8.2.x86_64 to downgrade to the previous version. You will have to enter the older version for this command.

      4.: Wait until it's done, reboot, test, pray it'll work again.

      This is just a workaround! Microcode updates are important security and/or functional updates. Downgrading can lead to security issues.

      posted in XCP-ng
      RealTehrealR
      RealTehreal
    • RE: Issue after latest host update

      @olivierlambert Yep, I can confirm that in this case the microcode update is the culprit, too.

      I just downgraded
      microcode_ctl-2.1-26.xs28.1.xcpng8.2.x86_64
      to
      microcode_ctl-2.1-26.xs26.2.xcpng8.2.x86_64

      and it's working again. Man, what a mess.

      posted in XCP-ng
      RealTehrealR
      RealTehreal
    • RE: Issue after latest host update

      @olivierlambert Following info from /proc/cpuinfo:
      Intel(R) Celeron(R) J4105 CPU @ 1.50GHz

      True enough, regarding the Wyse topic. I'll try reverting only the microcode update and see, what happens.

      posted in XCP-ng
      RealTehrealR
      RealTehreal
    • RE: Issue after latest host update

      For reference: I now decided to use a less intrusive approach and changed the default boot entry in grub config to the working failover entry. I will now try to get the pool up again.

      posted in XCP-ng
      RealTehrealR
      RealTehreal
    • RE: Issue after latest host update

      I'd be even fine to only use two machines and keep one of them offline for further testing.

      posted in XCP-ng
      RealTehrealR
      RealTehreal
    • RE: Issue after latest host update

      @john-c All such information should be available in the dmesg file in post: https://xcp-ng.org/forum/post/74791

      Any ideas on how to revert the update? I would really like to have the setup running again. It may be "just" a home lab, but I was still using it (at least semi-) productively...

      posted in XCP-ng
      RealTehrealR
      RealTehreal
    • RE: Issue after latest host update

      @john-c
      Model: FUJITSU FUTRO S740/D3544-A1
      BIOS: V5.0.0.13 R1.13.0 for D3544-A1x (09/23/2022)

      posted in XCP-ng
      RealTehrealR
      RealTehreal
    • RE: Issue after latest host update

      @olivierlambert I finally made some progress. And it really seems to be update related.

      I took one of the hosts and plugged a display and keyboard into it. When booting up, I can choose to use an older version of Xen from the boot menu. Doing so makes VMs work again.

      Culprit: Xen 4.13.5-9.39 (current default)
      Working: Xen 4.13.4-9.19.1 (which I can choose from boot menu)

      All three hosts are Fujitsu Futro 740 thin clients.

      posted in XCP-ng
      RealTehrealR
      RealTehreal
    • RE: Issue after latest host update

      @olivierlambert I didn't change anything, at least. Just yum update and it went down the flush.

      posted in XCP-ng
      RealTehrealR
      RealTehreal
    • RE: Issue after latest host update

      @olivierlambert That's what I'm doing, to make sure, it's not a network related issue.

      posted in XCP-ng
      RealTehrealR
      RealTehreal
    • RE: Issue after latest host update

      @john-c I already took a look at dmesg and /var/log/xensource.log (I crawled through >1k log lines) and couldn't find anything revealing. The NFS server is unrelated, because, as stated before, I currently only use host's local storage to eliminate possible external issues.

      posted in XCP-ng
      RealTehrealR
      RealTehreal
    • RE: Issue after latest host update

      @nikade said in Issue after latest host update:

      I cant really understand what happend to be honest, i've done this many times without issues.
      What can you see in the console tab of the VM when u start it? Or in the stats tab?

      I can'T see anything, because XOA itself is inaccessible, since it's a VM. And VMs won't start into a usable state.

      posted in XCP-ng
      RealTehrealR
      RealTehreal
    • RE: Issue after latest host update

      @nikade @john-c I'm not sure... how do I elaborate? At least, I can ssh into the hosts and never disconnect.

      posted in XCP-ng
      RealTehrealR
      RealTehreal
    • RE: Issue after latest host update

      xentop shows XOA consuming 100.0 CPU (%), meaning one core. But quick deployment is stuck at "almost there", until it times out. The VM is still consuming one CPU core, while not being accessible.

      posted in XCP-ng
      RealTehrealR
      RealTehreal