XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login
    1. Home
    2. andyhhp
    3. Posts
    A
    Offline
    • Profile
    • Following 0
    • Followers 1
    • Topics 0
    • Posts 40
    • Groups 1

    Posts

    Recent Best Controversial
    • RE: Question on CPU masking with qemu and xen

      For documentation purposes, there's a more general step of "Any VM you can shut down, do".

      Live Migration is great for VMs which need to stay up, but it's not free, and not even cheep. You will get done quicker if you can shut down VMs you don't need, migrate fewer things, and then (re)boot everything at the end.

      posted in Compute
      A
      andyhhp
    • RE: Question on CPU masking with qemu and xen

      @cg said in Question on CPU masking with qemu and xen:

      In the early days (~XenServer 6) it had to be done manually

      Yes, and I rewrote it entirely in XenServer 7 because doing it manually was absurd.

      tl;dr, for your case:

      1. Add the Gen12's to the pool
      2. Migrate remaining VMs off the Gen 9's
        2a. Any VMs which can't migrate for feature reasons, reboot first then migrate
      3. Remove the Gen9's from the pool
      4. Reboot all VMs

      The longer answer:

      When Xen boots, it calculates what it can offer to guests, feature wise. This takes into account the CPU, firmware settings, errata, command line parameters, etc. This feature information is made available to the toolstack/xapi to work with. On a per-VM basis, Xen knows the features that the guest was given. Different VMs can have different configurations, even if they're running on the same host.

      An individual VM's features are fixed during it's uptime (including migrate). The only point at which the features can safely change is when the VM reboots. All the migration safety checks are performed as "is the featureset this VM saw at boot compatible with the destination host it's trying to run on".

      At a pool level, Xapi always dynamically calculates the "pool level". i.e. the common subset[*] of features that will allow a VM to migrate to anywhere in the pool. Importantly, this is recalculated as pool members join and leave the pool, including a pool member rebooting (where it leaves temporarily, then rejoins. Feature information may change after the reboot, e.g. changing a firmware or command line setting).

      When a VM boots, it gets given the "pool level" by default, meaning that it should be able to migrate anywhere in the pool as the pool existed at the point of booting the VM. If you subsequently add a new host to the pool, the pool level may drop and already-running VMs will be unable to migrate to this new host, but will be able to migrate to other pool members.

      As you remove members from the pool, the pool level may rise. e.g. if you removed the only host that was lacking a certain feature. The final reboot in your case is to allow the VM's to start using the Gen10 feature baseline, now that it's not "levelled down" for compatibility with the Gen9's.

      ~Andrew

      [*] While subset is the intuitive way to think of this operation, it's not actually a subset in the mathematical sense. Some features behave differently to maintain safety for the VM.

      posted in Compute
      A
      andyhhp
    • RE: Non-server CPU compatibility - Ryzen and Intel

      Xen has no awareness of 3D V-Cache. All 16 cores will be considered equal. Your vCPU may be on a 3D V-Cache core one millisecond, then on a no-3D V-Cache core the next.

      If you really want to alter this, you can pin your VM to one group of cores or the other.

      However, do not make the mistake of thinking of some of these cores as "performance cores" while the others not. The ones with 3D V-Cache will outperform the others on a wide variety of workloads despite not being able to turbo to the same degree.

      posted in Compute
      A
      andyhhp
    • RE: XCP-ng 8.3 with VM crashing

      As I said before, this is looking like a buggy CPU, and you've proved it, given a week with no incident if CPU8 is excluded.

      posted in Hardware
      A
      andyhhp
    • RE: Diagnosing frequent crashes on host

      @the_jest Ok, so it's a logical bug in Linux. Have you updated the dom0 kernel recently? Can you revert back to the older build and see if that changes the behaviour?

      posted in XCP-ng
      A
      andyhhp
    • RE: Diagnosing frequent crashes on host

      @the_jest said in Diagnosing frequent crashes on host:

      but I figured I'd mention it. (Also, "Shot down" should be "Shut down".)

      Shot down is correct. It is the past tense of "Shoot down", because the companion message you get when something went wrong is "Failed to shoot down $CPUS", and is the single most valuable print message I've ever inserted into the code.

      @the_jest said in Diagnosing frequent crashes on host:

      I've looked at /var/crash, but there's so much stuff there I don't know where to start,

      The snippet of xen.log you've posted suggests it's a linux kernel crash, so look at dom0.log, and right at the end.

      posted in XCP-ng
      A
      andyhhp
    • RE: XCP-ng 8.3 with VM crashing

      @AlbertK Thanks. There's no nested-virt configured there.

      I have to admit this is looking more and more like buggy CPU. Memory corruption is a possibility, but this is a clearly corrupt field in the middle of otherwise sane-looking fields in the VMCB.

      Do you have any other identical systems? Can you swap this CPU out for another one to see what happens?

      posted in Hardware
      A
      andyhhp
    • RE: XCP-ng 8.3 with VM crashing

      @AlbertK None of those commands are relevant in a Xen system. You want xe vm-param-list uuid=$VM

      posted in Hardware
      A
      andyhhp
    • RE: XCP-ng 8.3 with VM crashing

      @AlbertK That looks suspiciously like you've enabled nested virt in the VM. Can you confirm whether you have or not?

      posted in Hardware
      A
      andyhhp
    • RE: Gpu passthrough on Asrock rack B650D4U3-2L2Q will not work

      @steff22 said in Gpu passthrough on Asrock rack B650D4U3-2L2Q will not work:

      what kind of magic have you put in the last 7 patches?

      You've got a very recent AMD processor, so it's probably this fix https://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=86001b3970fea4536048607ea6e12541736c48e1 from upstream.

      posted in Hardware
      A
      andyhhp
    • RE: Issue after latest host update

      @mgigirey said in Issue after latest host update:

      @andyhhp Any plans to update the intel-microcode for XCP-ng 8.3? latest know version working in my setup is intel-microcode-20231009-1.xcpng8.3.noarch.rpm

      I am not an XCP-ng developer. You'll have to ask @stormi for that.

      posted in XCP-ng
      A
      andyhhp
    • RE: XCP-ng 8.3 betas and RCs feedback 🚀

      @eb-xcp said in XCP-ng 8.3 betas and RCs feedback 🚀:

      Edit: Confirmed; after enabling execution disable option within bios, installer booted without issues and the install is currently ongoing.

      That is a bug. Xen is supposed to be able to detect this case and re-activate NX on it's own.

      For the EFI path in your screenshot, that one doesn't have logic to re-activate. IIRC, we weren't sure whether it was needed, because surely an EFI system wasn't still using Pentium4 compatibility. Clearly some wrong reasoning, and it's fairly easy to adjust.

      However, fixing that path wont fix the normal MB2 path, which does have logic to reactivate and should have been able to cope fine.

      What system do you have?

      posted in News
      A
      andyhhp
    • RE: XCP-ng 8.3 betas and RCs feedback 🚀

      @flakpyro If Singlewire have already fixed the bug, then just do what is is necessary to update the VM and be done with it.

      That screenshot of grub poses far more questions than it answered, and I doubt we want to get into any of them.

      posted in News
      A
      andyhhp
    • RE: XCP-ng 8.3 betas and RCs feedback 🚀

      @flakpyro

      This is ultimately a bug in Linux. There was a range of Linux kernels which did something unsafe on kexec which worked most of the time but only by luck. (Specifically - holding a 64bit value in a register while passing through 32bit mode, and expecting it to still be intact later; both Intel and AMD identify this as having model specific behaviour and not to rely on it).

      A consequence of a security fix in Xen (https://xenbits.xen.org/xsa/advisory-454.html) makes it reliably fail when depended upon in a VM.

      Linux fixed the bug years ago, but one distro managed to pick it up.

      Ideally, get SingleWire to fix their kernel. Failing that, adjust the VM's kernel command line to take any ,low or ,high off the crashkernel= line, because that was the underlying way to tickle the bug IIRC.

      The property you need to end up with is that /proc/iomem shows the Crash kernel range being below the 4G boundary, because the handover logic from one kernel to the other simply didn't work correctly if the new kernel was above 4G.

      posted in News
      A
      andyhhp
    • RE: Any way to know what features will be CPU masked before adding a host to a pool?

      Intel Xeon E5-2683 v4 CPUs vs E5-2697 v4 CPUs

      You are correct. These are adjacent rows in the SKU table; they've got the same core count, and only differ by 500MHz frequency. They're basically identical as far as software is concerned.

      posted in Compute
      A
      andyhhp
    • RE: Oops! We removed busybox

      I suggest using this as a learning opportunity. Look at the RPM log and see what depends on busybox, and therefore what (else) got uninstalled in order to keep the dependencies satisfied.

      (Hint: you uninstalled all of Xapi, hence why nothing works)

      posted in XCP-ng
      A
      andyhhp
    • RE: Dell Wyse FW update breaks VM booting; console frozen; TianoCore/EDK2 related?

      @rubberhose I've got a fix from Intel, and @stormi has packaged it.

      yum update microcode_ctl --enablerepo=xcp-ng-testing should get you microcode_ctl-2.1-26.xs29.2.xcpng8.2 which has the fixed microcode for this issue in it.

      When you've got that installed, it should be safe to update back to the latest firmware.

      posted in Compute
      A
      andyhhp
    • RE: Wyse 5070 VM won't booting after update bios 1.27

      @t-chamberlain I've got a fix from Intel, and @stormi has packaged it.

      yum update microcode_ctl --enablerepo=xcp-ng-testing should get you microcode_ctl-2.1-26.xs29.2.xcpng8.2 which has the fixed microcode for this issue in it.

      posted in Hardware
      A
      andyhhp
    • RE: Issue after latest host update

      @RealTehreal I've got a fix from Intel, and @stormi has packaged it.

      yum update microcode_ctl --enablerepo=xcp-ng-testing should get you microcode_ctl-2.1-26.xs29.2.xcpng8.2 which has the fixed microcode for this issue in it.

      posted in XCP-ng
      A
      andyhhp
    • RE: Issue after latest host update

      @RealTehreal Thank-you very much for that information. I'll follow up with Intel.

      In the short term, I'd recommend just using the old microcode.

      posted in XCP-ng
      A
      andyhhp