XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login
    1. Home
    2. TeddyAstie
    TeddyAstieT Online
    • Profile
    • Following 0
    • Followers 1
    • Topics 2
    • Posts 105
    • Groups 4

    TeddyAstie

    @TeddyAstie

    Vates 🪐 XCP-ng Team Xen Guru
    33
    Reputation
    50
    Profile views
    105
    Posts
    1
    Followers
    0
    Following
    Joined
    Last Online
    Location France

    TeddyAstie Unfollow Follow
    Hypervisor & Kernel Team Xen Guru Vates 🪐 XCP-ng Team

    Best posts made by TeddyAstie

    • RE: AMD 'Barcelo' passthrough issues - any success stories?

      @DustyArmstrong said:

      @TeddyAstie yarp.

      My bad, the VM has it as 00:08.0 but on the host it's actually 00:06.0, I just didn't think about the specifics of your request!

      06:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Barcelo (rev c1) (prog-if 00 [VGA controller])
      	Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Device 1636
      	Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
      	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
      	Interrupt: pin A routed to IRQ 38
      	Region 0: Memory at d0000000 (64-bit, prefetchable) [size=256M]
      	Region 2: Memory at e0000000 (64-bit, prefetchable) [size=2M]
      	Region 4: I/O ports at d000 [size=256]
      	Region 5: Memory at fca00000 (32-bit, non-prefetchable) [size=512K]
      	Capabilities: [48] Vendor Specific Information: Len=08 <?>
      	Capabilities: [50] Power Management version 3
      		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1+,D2+,D3hot+,D3cold+)
      		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
      	Capabilities: [64] Express (v2) Legacy Endpoint, MSI 00
      		DevCap:	MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited
      			ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
      		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
      			RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
      			MaxPayload 256 bytes, MaxReadReq 512 bytes
      		DevSta:	CorrErr- UncorrErr+ FatalErr- UnsuppReq+ AuxPwr- TransPend-
      		LnkCap:	Port #0, Speed 8GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <64ns, L1 <1us
      			ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
      		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk+
      			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
      		LnkSta:	Speed 8GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
      		DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR-, OBFF Not Supported
      		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
      		LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
      			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
      			 Compliance De-emphasis: -6dB
      		LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+, EqualizationPhase1+
      			 EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
      	Capabilities: [a0] MSI: Enable- Count=1/4 Maskable- 64bit+
      		Address: 0000000000000000  Data: 0000
      	Capabilities: [c0] MSI-X: Enable- Count=4 Masked-
      		Vector table: BAR=5 offset=00042000
      		PBA: BAR=5 offset=00043000
      	Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
      	Capabilities: [270 v1] #19
      	Capabilities: [2a0 v1] Access Control Services
      		ACSCap:	SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
      		ACSCtl:	SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
      	Capabilities: [2b0 v1] Address Translation Service (ATS)
      		ATSCap:	Invalidate Queue Depth: 00
      		ATSCtl:	Enable-, Smallest Translation Unit: 00
      	Capabilities: [2c0 v1] Page Request Interface (PRI)
      		PRICtl: Enable- Reset-
      		PRISta: RF- UPRGI- Stopped+
      		Page Request Capacity: 00000100, Page Request Allocation: 00000000
      	Capabilities: [2d0 v1] Process Address Space ID (PASID)
      		PASIDCap: Exec+ Priv+, Max PASID Width: 10
      		PASIDCtl: Enable- Exec- Priv-
      	Capabilities: [400 v1] #25
      	Capabilities: [410 v1] #26
      	Capabilities: [440 v1] #27
      	Kernel driver in use: pciback
      
      

      thanks.

      So basically, there is a more annoying issue, as the device doesn't even have a ROMBAR, in this case, the VBIOS is likely in the VFCT ACPI table of host (which the guest can't see); which needs to be injected as a "fake" rombar for the guest to behave properly.

      That doable on its own, but it's quite tricky to integrate (and you would e.g need to extract VBIOS from VFCT using external tools).

      I just discussed with Xen/AMD people, and there are known issues regarding PCI Passthrough of integrated AMD GPUs (not specific to Xen AFAIU). There are some projects regarding alternative approaches to bring AMD GPUs to VMs (virtio-gpu native context) which is the current focus.

      posted in Hardware
      TeddyAstieT
      TeddyAstie
    • RE: USB + GPU pass-though issue

      @gb.123 said in XCP-ng 8.3 updates announcements and testing:

      Here is the summary:

      If USB Keyboard & Mouse is passed-through along-with GPU:
      The GPU gets stuck in D3 state (on Shutdown/Restart of VM) (Classic GPU reset problem)

      If no vUSB is passed but GPU is passed through:
      The GPU works correctly and resets correctly (on Shutdown/Restart of VM)

      I have no clue what vUSB may change regarding GPU passthrough.

      When I run :

      $> lspci
      Extract of Output (Partial):

      07:00.0 USB controller: Advanced Micro Devices, Inc. [AMD] Device 15b8
      

      However, this controller does not show up when I run :
      xe pci-list

      Is it a bug that lspci & xe pci-list have different number of devices ?

      How can I pass this controller since xe pci-list does not show it so I can't get the UUID ?
      Will kernel parameters (like XCP-ng 8.2) work in this case ?

      Question for @Team-XAPI-Network regarding the filtering on PCI IDs.
      I don't think XAPI allows using arbitrary BDF, but I may be wrong.

      Is it safe to run on XCP-ng host ?

       echo 1 > /sys/bus/pci/rescan
      

      (I'm trying to find a way where the PCI card is reset by the host without complete reboot, though I am aware that the above command will not reset it.)

      Probably. But it's not going to change anything as the device doesn't completely leave the Dom0 when passed-through.
      FYI a function-level-reset is systematically performed by Xen when doing PCI passthrough, thus your device should be reset before entering another guest (aside reset bugs like you may have).

      Also is it advisable to use :

      xl pci-assignable-add 07:00.0
      

      in XCP-ng 8.3 ? or is this method deprecated ?

      I don't think XAPI supports this PCI passthrough approach.
      This is a command which allows dynamically to remove a device from Dom0 and put it into "quarantine domain", so that it will be ready to passthrough it.

      Current XAPI uses the approach of having a set of "passthrough-able" devices at boot time by modifying the xen-pciback.hide kernel parameter, which does the same but at boot time.

      posted in News
      TeddyAstieT
      TeddyAstie
    • Xen ERMS Patch - Call for performance testing

      Hello !

      I am looking to get some feedback and evaluation on a performance-related patch for Xen (XCP-ng 8.3 only).
      This patch changes the memcpy implementation of Xen to use the "ERMS variant" (aka REP MOVSB) instead of the current REP MOVSQ+B implementation.
      This is expected to perform better on the vast majority of Intel CPUs and modern AMD ones (Zen3+), but may perform worse on some older AMD CPUs.

      This change may impact the performance of PV drivers (especially network).

      You can find more details regarding this proposed change in : https://github.com/xcp-ng-rpms/xen/pull/54
      This change may be reworked in the future to take more in account the specificities of each CPUs (e.g check presence of ERMS flag).

      🚧 Keep in mind that this patched version is experimental and not officially supported. 🚧

      Installation :

      # Download repo file for XCP-ng 8.3
      wget https://koji.xcp-ng.org/repos/user/8/8.3/xcpng-users.repo -O /etc/yum.repos.d/xcpng-users.repo
      
      # Installing the patched Xen packages (you should see `.erms` packages)
      yum update --enablerepo=xcp-ng-tae1
      

      You can revert the changes by downgrading the Xen package with the ones in the default repos.

      yum downgrade --disablerepo=xcp-ng-tae1 "xen-*"
      
      TSnake41 opened this pull request in xcp-ng-rpms/xen

      draft Use ERMS variant for memcpy #54

      posted in Development
      TeddyAstieT
      TeddyAstie
    • RE: 89 vulnerabilities in XAPI / Citrix XenServer

      Xen Project covered this as XSA-489.

      posted in Development
      TeddyAstieT
      TeddyAstie
    • RE: Wide VMs on XCP-ng

      @plaidypus I don't know a lot about NUMA on Xen, but we have a part in the docs regarding that
      https://docs.xcp-ng.org/compute/#numa-affinity

      And also other documentation on the subject
      https://xapi-project.github.io/new-docs/toolstack/features/NUMA/index.html
      there was a design session regarding NUMA in latest Xen Summit : https://youtu.be/KoNwEYMlhyU?list=PLQMQQsKgvLnvjRgDnb-5T51e1kGHgs1SO

      posted in XCP-ng
      TeddyAstieT
      TeddyAstie
    • RE: The Lowest Priority Bug Ever? (/etc/udev/rules.d/z10-xen-vcpu-hotplug.rules)

      The rule is oddly written, and may conflict with another similar one that already exist in the distro (hence may not be useful to begin with).

      The modern generic rule for doing vCPU hotplug is, which would be preferable to the current z10-xen-vcpu-hotplug.rules.
      ACTION=="add", SUBSYSTEM=="cpu", ATTR{online}=="0", ATTR{online}="1"

      posted in XCP-ng
      TeddyAstieT
      TeddyAstie
    • RE: Execute pre-freeze and post-thaw

      @dcskinner @olivierlambert

      You can read key/values from the xenstore, and write some (from VM to outside), but you cannot write values "in live" from outside the VM to the inside.

      It is, but XAPI doesn't provide a interface for it.

      do the guest tools quiesce the filesystems before snapshotting?

      Tools are aware of a snapshot so you don't have blocks in flight.

      do the guest tools quiesce the filesystems before snapshotting?

      Guests kernel are aware, as it is them that are performing a "suspend" on toolstack request (thus quiece filesystems); although "tools" can only observe that the system has been suspended after the fact by measuring side effects, and not orchestrate it.

      It's because suspend/resume operation doesn't come from "guest tools" actually, but instead from the kernel drivers. So userland tools has no say on it.

      posted in Backup
      TeddyAstieT
      TeddyAstie
    • RE: NVMe SSD not found when installing

      Hello,

      Make sure Intel VMD is disabled (this is the hardware RAID feature of Intel, and it doesn't currently work on XCP-ng; you probably don't need it unless you are looking to make a RAID). We found some modern platforms enabling by default (which also causes issues with Windows).

      posted in Hardware
      TeddyAstieT
      TeddyAstie
    • RE: XCP-ng 8.3 & AMD Firepro S7150x2

      @tuxen said (https://xcp-ng.org/forum/topic/3652/no-free-virtual-function-found-vgpu-s7150/4?_=1731502751059)

      After some digging, could be the case of a GPU firmware being incompatible with UEFI. Do you have any spare server for testing XCP-ng boot in legacy/BIOS with this GPU?

      Perhaps it is the issue ?

      posted in Hardware
      TeddyAstieT
      TeddyAstie
    • RE: XCP-ng 8.3 & AMD Firepro S7150x2

      @ohajek

      Nov 13 11:30:21 xen03 kernel: [10188.720655] AMD IOMMUv2 driver by Joerg Roedel jroedel@suse.de
      Nov 13 11:30:21 xen03 kernel: [10188.720656] AMD IOMMUv2 functionality not available on this system
      

      This is expected, Dom0 Kernel (Linux) is not supposed to access the IOMMU when it is already used by Xen. To check if AMD-Vi is working, you need to check xl dmesg instead.

      I took a quick look at kern_gim_compiled.txt, and it look likes it timed-out somewhere

      Oct 23 20:49:32 xen03 kernel: [   80.657394]        gim error:(wait_cmd_complete:2387)  wait_cmd_complete -- time out after 0.003004460 sec
      Oct 23 20:49:32 xen03 kernel: [   80.657408]        gim error:(wait_cmd_complete:2390)   Cmd = 0x17, Status = 0x0, cmd_Complete=0
      

      3ms looks like a short timeout for me, but aside that, it looks like a driver(gim) or hardware issue

      posted in Hardware
      TeddyAstieT
      TeddyAstie

    Latest posts made by TeddyAstie

    • RE: The Lowest Priority Bug Ever? (/etc/udev/rules.d/z10-xen-vcpu-hotplug.rules)

      The rule is oddly written, and may conflict with another similar one that already exist in the distro (hence may not be useful to begin with).

      The modern generic rule for doing vCPU hotplug is, which would be preferable to the current z10-xen-vcpu-hotplug.rules.
      ACTION=="add", SUBSYSTEM=="cpu", ATTR{online}=="0", ATTR{online}="1"

      posted in XCP-ng
      TeddyAstieT
      TeddyAstie
    • RE: Slow boot on rocky linux 10 latest kernel

      @MajorP93 said:
      Do you think it is possible to fix this on hypervisor level while still having live migration etc. enabled or do we have to wait for an upstream fix within Linux kernel tree?

      Yes it's possible to fix it on the hypervisor level (Invariant TSC in guest), but it's quite a bit of work that still needs to be done. A Linux upstream fix for the underlying bug should come at some point hopefully.

      posted in Compute
      TeddyAstieT
      TeddyAstie
    • RE: Slow boot on rocky linux 10 latest kernel

      @majorp93 @henri9813 @acebmxer
      Do you observe the same behavior after setting this for the VM ?

      xe vm-param-add uuid=$UUID param-name=platform tsc_mode=2
      xe vm-param-add uuid=$UUID param-name=platform nomigrate=true
      

      (beware you lose live migration support doing this, you can cancel these changes with matching vm-param-remove like xe vm-param-remove uuid=$UUID param-name=platform param-key=nomigrate)

      posted in Compute
      TeddyAstieT
      TeddyAstie
    • RE: Slow boot on rocky linux 10 latest kernel

      @MajorP93 can you give the kernel version of all the affected vs non-affected guests ?

      posted in Compute
      TeddyAstieT
      TeddyAstie
    • RE: Slow boot on rocky linux 10 latest kernel

      @acebmxer I don't observe the same issue on Debian 13 Cloud-Init (both 6.12.38+deb13-amd64 and updated 6.12.90+deb13.1-amd64).

      Though it still takes some time to boot (especially at loading the ramdisk) but it's not related to this pv spinlock issue and mostly a "BIOS guest" related issue.
      But I'm testing on a Intel machine.

      posted in Compute
      TeddyAstieT
      TeddyAstie
    • RE: Slow boot on rocky linux 10 latest kernel

      @acebmxer which kernel version you have in your Debian guest (uname -a) ?

      posted in Compute
      TeddyAstieT
      TeddyAstie
    • RE: Slow boot on rocky linux 10 latest kernel

      Can reproduce on Fedora 44 and Alpine Linux (6.18.22-0-virt).
      But doesn't occur on Debian 13 (6.12).

      posted in Compute
      TeddyAstieT
      TeddyAstie
    • RE: Several errors on boot

      i2c error seems related to (I guess) a RGB controller that is presumably controlled by something else (or maybe not usable/non plugged). But it's harmless unless you are looking to configure RGB (but I don't think you want to do that anyway).

      EFI_MEMMAP warning is probably because Dom0 doesn't see UEFI mappings, as it relies on a different method to do UEFI calls. This is expected and this warning doesn't indicate a problem.

      The latest error is likely related to the first one.

      what I will miss /won't be able to do vs a user that has not these messages? (otherwise, why they would be raised)

      Nothing meaningful.

      if these messages /failures have an impact on the time during which the machine is booting? (it seems the load process is hanging for about a minute)

      No.

      if these messages /failures are officially documented somewhere?

      For the EFI_MEMMAP, XenServer has a article on it stating the same as me : https://support.citrix.com/external/article/CTX331542/citrix-hypervisor-82-efi-efimemmap-is-n.html

      how can I resolve these failures, because failures are failures... even if "one should not worry about them"?

      You may be able to hide it, like by blacklisting i2c and such, but TBF it's not worth the time.

      posted in Hardware
      TeddyAstieT
      TeddyAstie
    • RE: (Windows) guest IPv6 address doesn't collapse zeroes -> Long IPv6 addresses

      Do you have multiple guests agents in the VM (e.g Citrix and XCP-ng ones) that may step on each other for this IP ?

      posted in Xen Orchestra
      TeddyAstieT
      TeddyAstie
    • RE: 89 vulnerabilities in XAPI / Citrix XenServer

      Xen Project covered this as XSA-489.

      posted in Development
      TeddyAstieT
      TeddyAstie