XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login
    1. Home
    2. TeddyAstie
    Online
    • Profile
    • Following 0
    • Followers 1
    • Topics 2
    • Posts 60
    • Groups 4

    TeddyAstie

    @TeddyAstie

    Vates 🪐 XCP-ng Team Xen Guru
    13
    Reputation
    24
    Profile views
    60
    Posts
    1
    Followers
    0
    Following
    Joined
    Last Online
    Location France

    TeddyAstie Unfollow Follow
    Hypervisor & Kernel Team Xen Guru Vates 🪐 XCP-ng Team

    Best posts made by TeddyAstie

    • RE: USB + GPU pass-though issue

      @gb.123 said in XCP-ng 8.3 updates announcements and testing:

      Here is the summary:

      If USB Keyboard & Mouse is passed-through along-with GPU:
      The GPU gets stuck in D3 state (on Shutdown/Restart of VM) (Classic GPU reset problem)

      If no vUSB is passed but GPU is passed through:
      The GPU works correctly and resets correctly (on Shutdown/Restart of VM)

      I have no clue what vUSB may change regarding GPU passthrough.

      When I run :

      $> lspci
      Extract of Output (Partial):

      07:00.0 USB controller: Advanced Micro Devices, Inc. [AMD] Device 15b8
      

      However, this controller does not show up when I run :
      xe pci-list

      Is it a bug that lspci & xe pci-list have different number of devices ?

      How can I pass this controller since xe pci-list does not show it so I can't get the UUID ?
      Will kernel parameters (like XCP-ng 8.2) work in this case ?

      Question for @Team-XAPI-Network regarding the filtering on PCI IDs.
      I don't think XAPI allows using arbitrary BDF, but I may be wrong.

      Is it safe to run on XCP-ng host ?

       echo 1 > /sys/bus/pci/rescan
      

      (I'm trying to find a way where the PCI card is reset by the host without complete reboot, though I am aware that the above command will not reset it.)

      Probably. But it's not going to change anything as the device doesn't completely leave the Dom0 when passed-through.
      FYI a function-level-reset is systematically performed by Xen when doing PCI passthrough, thus your device should be reset before entering another guest (aside reset bugs like you may have).

      Also is it advisable to use :

      xl pci-assignable-add 07:00.0
      

      in XCP-ng 8.3 ? or is this method deprecated ?

      I don't think XAPI supports this PCI passthrough approach.
      This is a command which allows dynamically to remove a device from Dom0 and put it into "quarantine domain", so that it will be ready to passthrough it.

      Current XAPI uses the approach of having a set of "passthrough-able" devices at boot time by modifying the xen-pciback.hide kernel parameter, which does the same but at boot time.

      posted in News
      TeddyAstieT
      TeddyAstie
    • Xen ERMS Patch - Call for performance testing

      Hello !

      I am looking to get some feedback and evaluation on a performance-related patch for Xen (XCP-ng 8.3 only).
      This patch changes the memcpy implementation of Xen to use the "ERMS variant" (aka REP MOVSB) instead of the current REP MOVSQ+B implementation.
      This is expected to perform better on the vast majority of Intel CPUs and modern AMD ones (Zen3+), but may perform worse on some older AMD CPUs.

      This change may impact the performance of PV drivers (especially network).

      You can find more details regarding this proposed change in : https://github.com/xcp-ng-rpms/xen/pull/54
      This change may be reworked in the future to take more in account the specificities of each CPUs (e.g check presence of ERMS flag).

      🚧 Keep in mind that this patched version is experimental and not officially supported. 🚧

      Installation :

      # Download repo file for XCP-ng 8.3
      wget https://koji.xcp-ng.org/repos/user/8/8.3/xcpng-users.repo -O /etc/yum.repos.d/xcpng-users.repo
      
      # Installing the patched Xen packages (you should see `.erms` packages)
      yum update --enablerepo=xcp-ng-tae1
      

      You can revert the changes by downgrading the Xen package with the ones in the default repos.

      yum downgrade --disablerepo=xcp-ng-tae1 "xen-*"
      
      TSnake41 opened this pull request in xcp-ng-rpms/xen

      draft Use ERMS variant for memcpy #54

      posted in Development
      TeddyAstieT
      TeddyAstie
    • RE: Wide VMs on XCP-ng

      @plaidypus I don't know a lot about NUMA on Xen, but we have a part in the docs regarding that
      https://docs.xcp-ng.org/compute/#numa-affinity

      And also other documentation on the subject
      https://xapi-project.github.io/new-docs/toolstack/features/NUMA/index.html
      there was a design session regarding NUMA in latest Xen Summit : https://youtu.be/KoNwEYMlhyU?list=PLQMQQsKgvLnvjRgDnb-5T51e1kGHgs1SO

      posted in XCP-ng
      TeddyAstieT
      TeddyAstie
    • RE: XCP-ng 8.3 & AMD Firepro S7150x2

      @tuxen said (https://xcp-ng.org/forum/topic/3652/no-free-virtual-function-found-vgpu-s7150/4?_=1731502751059)

      After some digging, could be the case of a GPU firmware being incompatible with UEFI. Do you have any spare server for testing XCP-ng boot in legacy/BIOS with this GPU?

      Perhaps it is the issue ?

      posted in Hardware
      TeddyAstieT
      TeddyAstie
    • RE: XCP-ng 8.3 & AMD Firepro S7150x2

      @ohajek

      Nov 13 11:30:21 xen03 kernel: [10188.720655] AMD IOMMUv2 driver by Joerg Roedel jroedel@suse.de
      Nov 13 11:30:21 xen03 kernel: [10188.720656] AMD IOMMUv2 functionality not available on this system
      

      This is expected, Dom0 Kernel (Linux) is not supposed to access the IOMMU when it is already used by Xen. To check if AMD-Vi is working, you need to check xl dmesg instead.

      I took a quick look at kern_gim_compiled.txt, and it look likes it timed-out somewhere

      Oct 23 20:49:32 xen03 kernel: [   80.657394]        gim error:(wait_cmd_complete:2387)  wait_cmd_complete -- time out after 0.003004460 sec
      Oct 23 20:49:32 xen03 kernel: [   80.657408]        gim error:(wait_cmd_complete:2390)   Cmd = 0x17, Status = 0x0, cmd_Complete=0
      

      3ms looks like a short timeout for me, but aside that, it looks like a driver(gim) or hardware issue

      posted in Hardware
      TeddyAstieT
      TeddyAstie
    • RE: XCP-ng 8.3 updates announcements and testing

      @abudef Note that even with this update, nested virtualization is still not really supported in XCP-ng 8.3.
      It's there, you can enable it at your own risk. It broke due to some change in XAPI (even though Xen hypervisor had "support" for it).
      It never actually got removed from Xen hypervisor (it was marked experimental in Xen 4.13 used in XCP-ng 8.2, it is also the case for Xen 4.17), although nothing really changed, it still has the same issues and limitations as said.

      The current state of nested virtualization in Xen is quite clumsy and there are future plans to remake it properly from ground without taking shortcuts and have proper tests to back it.

      Aside that, after some experiments, it seems that mostly nested EPT is incomplete/buggy, so your L1 hypervisor should not rely on it. You should add hap=0 to nested XCP-ng Xen cmdline. Beware that it will imply a pretty large performance hit, but I had more consistent results with this.
      I am quite suprised that Windows works while Linux don't, maybe it is somewhat related to PV drivers ?

      posted in News
      TeddyAstieT
      TeddyAstie
    • RE: Early testable PVH support

      @hoh said in Early testable PVH support:

      Then I tried to get rid of the (fake) UEFI magic.

      Well, it is actually a full standard UEFI implementation, but that works in PVH instead of HVM.

      I thought it should work to just change the PV-bootloader to pygrub. Calling pygrub on the disk image works fine and is able to extract the images and args

      # pygrub -l alpine.img
      Using <class 'grub.GrubConf.Grub2ConfigFile'> to parse /boot/grub/grub.cfg
      title: Alpine, with Linux virt
        root: None
        kernel: /boot/vmlinuz-virt
        args: root=UUID=4c6dcb06-20ff-4bcf-be4d-cb399244c4c6 ro  rootfstype=ext4 console=hvc0
        initrd: /boot/initramfs-virt
      

      But starting the VM fails. It looks like it starts but then immediately something calls force shutdown, I'll dive deeper into the logs later.

      But setting everything manually actually works. If extract the kernel and initrd to dom-0 and configure

      PV-kernel=/var/lib/xcp/guest/kernel
      PV-ramdisk=/var/lib/xcp/guest/ramdisk
      PV-args="root=/dev/xvda1 ro rootfstype=ext4 console=hvc0"
      

      it boots and I looks pretty much the same as with the pvh-ovmf magic. So perhaps the idea to use pygrub is wrong.

      I don't know how good is supported pygrub nowadays, especially since PV support got deprecated in XCP-ng 8.2 then completely dropped in XCP-ng 8.3; with the pv-shim (pv-in-pvh) being the only remaining (but not endorsed) way of booting some PV guests today.

      In my tests, pygrub was very clunky and rarely work as I expect. In practice (what upstream Xen Project mostly uses), it got replaced with pvgrub/pvhgrub and pvh-ovmf (OvmfXen) which are more reliable and less problematic security-wise (runs in the guest rather than in the dom0).
      (for using pvhgrub, you need to set a pvhgrub binary (grub-i386-xen_pvh.bin which is packaged by some distros like Alpine Linux's grub-xenhost) as kernel like done with pvh-ovmf)

      posted in Development
      TeddyAstieT
      TeddyAstie
    • RE: PCI Passthrough of QAT adapter IQA89601G1P5

      PCIe AER needs proper PCIe, which in practice needs Q35 chipset in the guest (or some other guest type/PCI passthrough way).

      Q35 support is currently work in progress

      posted in Compute
      TeddyAstieT
      TeddyAstie
    • RE: Google Coral TPU PCIe Passthrough Woes

      I think it is the same MSI-X/PBA issues that may be partially fixed with https://gitlab.com/xen-project/xen/-/commit/b2cd07a0447bfa25e96ae13e190225b61a3670cb

      However, with this device, MSI-X vector table and PBA are in a same page (vector table in 46800 and PBA in 46068) though, which is threated a bit differently

      If PBA lives on the same page, discard writes and log a message.
      Technically, writes outside of PBA could be allowed, but at this moment
      the precise location of PBA isn't saved, and also no known device abuses
      the spec in this way (at least yet).
      

      But Coral appears to abuse this according to DKMS driver by having more than just MSI-X and PBA on a single page
      https://github.com/google/gasket-driver/blob/main/src/apex_driver.c#L103-L140

      posted in Compute
      TeddyAstieT
      TeddyAstie
    • RE: Guest receiving passthrough SATA controllers does not see attached drives

      Hello @hvm,

      Can you give the output of xl dmesg in XCP-ng and of dmesg in the guest that has the issues ?
      I have the impression that something is going wrong with reserved regions related to the SATA controller.

      posted in Compute
      TeddyAstieT
      TeddyAstie

    Latest posts made by TeddyAstie

    • RE: Low end devices , share your experiences

      Unikraft would a a good fit for ram-constained devices.
      Being able to have useful VMs with 32-64 MB each.

      posted in Share your setup!
      TeddyAstieT
      TeddyAstie
    • RE: VM UUID via dmidecode does not match VM ID in xen-orchestra

      @deefdragon said in VM UUID via dmidecode does not match VM ID in xen-orchestra:

      Out of curiosity, I dumped the DMI into a bin and opened it up in a hex editor.

      I am seeing ASCII of the ID, but also a variant encoded in binary. In both cases, its formatted as 0b08f477-491a-a982-23c4-d224723624ea.

      I believe the ASCII version is the one that gets populated into the serial number as it comes after ASCII encoded versions of the 3 lines above it in the decode.

      In SMBIOS 2.8, the UUID is supposed to be encoded in little endian (i.e Microsoft GUID). Yet it is put as big endian instead. So when Linux generates the UUID string from the SMBIOS table, it is considered as little endian which causes this mismatch.

      SMBIOS 2.4 is supposed to be used (which appears to be using big endian UUIDs), but for some reason, something in XCP-ng UEFI supports forces it to be SMBIOS 2.8.

      So the binary UUID is the same, just that it is interpreted with a different endianness due to accidental format change.

      posted in Infrastructure as Code
      TeddyAstieT
      TeddyAstie
    • RE: Large "steal time" inside VMs but host CPU is not overloaded

      @lovvel from a software standpoint, this is a 16 cores CPUs and AFAICT, Xen doesn't account for these slight differences between cores.

      As to be fair, it's not really easy to know in practice if a 3D-VCache core will be faster than a non-3D-VCache one for a specific case.

      posted in Compute
      TeddyAstieT
      TeddyAstie
    • RE: VM UUID via dmidecode does not match VM ID in xen-orchestra

      @deefdragon can you provide us the output of dmidecode in the guest ?

      posted in Infrastructure as Code
      TeddyAstieT
      TeddyAstie
    • RE: VM UUID via dmidecode does not match VM ID in xen-orchestra
      deef@k31-w-3bfbbe:~$ sudo cat /sys/devices/virtual/dmi/id/product_serial
      0b08f477-491a-a982-23c4-d224723624ea
      deef@k31-w-3bfbbe:~$ sudo cat /sys/devices/virtual/dmi/id/product_uuid
      77f4080b-1a49-82a9-23c4-d224723624ea
      deef@k31-w-3bfbbe:~$ sudo cat /sys/hypervisor/uuid
      0b08f477-491a-a982-23c4-d224723624ea
      

      It looks like a endianness issue (0b08f477 vs 77f4080b).

      posted in Infrastructure as Code
      TeddyAstieT
      TeddyAstie
    • RE: Early testable PVH support

      @hoh said in Early testable PVH support:

      Then I tried to get rid of the (fake) UEFI magic.

      Well, it is actually a full standard UEFI implementation, but that works in PVH instead of HVM.

      I thought it should work to just change the PV-bootloader to pygrub. Calling pygrub on the disk image works fine and is able to extract the images and args

      # pygrub -l alpine.img
      Using <class 'grub.GrubConf.Grub2ConfigFile'> to parse /boot/grub/grub.cfg
      title: Alpine, with Linux virt
        root: None
        kernel: /boot/vmlinuz-virt
        args: root=UUID=4c6dcb06-20ff-4bcf-be4d-cb399244c4c6 ro  rootfstype=ext4 console=hvc0
        initrd: /boot/initramfs-virt
      

      But starting the VM fails. It looks like it starts but then immediately something calls force shutdown, I'll dive deeper into the logs later.

      But setting everything manually actually works. If extract the kernel and initrd to dom-0 and configure

      PV-kernel=/var/lib/xcp/guest/kernel
      PV-ramdisk=/var/lib/xcp/guest/ramdisk
      PV-args="root=/dev/xvda1 ro rootfstype=ext4 console=hvc0"
      

      it boots and I looks pretty much the same as with the pvh-ovmf magic. So perhaps the idea to use pygrub is wrong.

      I don't know how good is supported pygrub nowadays, especially since PV support got deprecated in XCP-ng 8.2 then completely dropped in XCP-ng 8.3; with the pv-shim (pv-in-pvh) being the only remaining (but not endorsed) way of booting some PV guests today.

      In my tests, pygrub was very clunky and rarely work as I expect. In practice (what upstream Xen Project mostly uses), it got replaced with pvgrub/pvhgrub and pvh-ovmf (OvmfXen) which are more reliable and less problematic security-wise (runs in the guest rather than in the dom0).
      (for using pvhgrub, you need to set a pvhgrub binary (grub-i386-xen_pvh.bin which is packaged by some distros like Alpine Linux's grub-xenhost) as kernel like done with pvh-ovmf)

      posted in Development
      TeddyAstieT
      TeddyAstie
    • RE: Early testable PVH support

      @hoh said in Early testable PVH support:

      @TeddyAstie said in Early testable PVH support:

      ... PV-kernel=/var/lib/xcp/guest/pvh-ovmf.elf

      Works fine. But IIUC, direct kernel boot should work as well. I tried setting pygrub, the VM loads the kernel and starts but then immediately stops. Any idea what's wrong?

      What are you trying to boot ?

      posted in Development
      TeddyAstieT
      TeddyAstie
    • RE: [HELP] XCP-ng 4.17.5 dom0 kernel panic — page fault in TCP stack, crashdump attached

      cc @andrew

      It looks like an issue with https://github.com/xcp-ng-rpms/r8125-module, though I am not completely sure what is going on, and why the pagetable suddently gets invalid.

      posted in XCP-ng
      TeddyAstieT
      TeddyAstie
    • RE: USB + GPU pass-though issue

      @gb.123 said in XCP-ng 8.3 updates announcements and testing:

      Here is the summary:

      If USB Keyboard & Mouse is passed-through along-with GPU:
      The GPU gets stuck in D3 state (on Shutdown/Restart of VM) (Classic GPU reset problem)

      If no vUSB is passed but GPU is passed through:
      The GPU works correctly and resets correctly (on Shutdown/Restart of VM)

      I have no clue what vUSB may change regarding GPU passthrough.

      When I run :

      $> lspci
      Extract of Output (Partial):

      07:00.0 USB controller: Advanced Micro Devices, Inc. [AMD] Device 15b8
      

      However, this controller does not show up when I run :
      xe pci-list

      Is it a bug that lspci & xe pci-list have different number of devices ?

      How can I pass this controller since xe pci-list does not show it so I can't get the UUID ?
      Will kernel parameters (like XCP-ng 8.2) work in this case ?

      Question for @Team-XAPI-Network regarding the filtering on PCI IDs.
      I don't think XAPI allows using arbitrary BDF, but I may be wrong.

      Is it safe to run on XCP-ng host ?

       echo 1 > /sys/bus/pci/rescan
      

      (I'm trying to find a way where the PCI card is reset by the host without complete reboot, though I am aware that the above command will not reset it.)

      Probably. But it's not going to change anything as the device doesn't completely leave the Dom0 when passed-through.
      FYI a function-level-reset is systematically performed by Xen when doing PCI passthrough, thus your device should be reset before entering another guest (aside reset bugs like you may have).

      Also is it advisable to use :

      xl pci-assignable-add 07:00.0
      

      in XCP-ng 8.3 ? or is this method deprecated ?

      I don't think XAPI supports this PCI passthrough approach.
      This is a command which allows dynamically to remove a device from Dom0 and put it into "quarantine domain", so that it will be ready to passthrough it.

      Current XAPI uses the approach of having a set of "passthrough-able" devices at boot time by modifying the xen-pciback.hide kernel parameter, which does the same but at boot time.

      posted in News
      TeddyAstieT
      TeddyAstie
    • RE: PCI Passthorugh INTERNAL_ERROR

      Not a Xen issue.
      This seems to be either a configuration issue (knowing /opt/xensource/libexec/xen-cmdline --get-dom0 may help) causing a issue in XAPI (@Team-XAPI-Network).

      Maybe crashing in xapi/pciops.ml#L71-L80 or xapi/xapi_pci_helpers.ml#L179-L207.

      posted in Management
      TeddyAstieT
      TeddyAstie