XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login
    1. Home
    2. bleader
    • Profile
    • Following 0
    • Followers 0
    • Topics 0
    • Posts 8
    • Best 4
    • Controversial 0
    • Groups 2

    bleader

    @bleader

    Vates 🪐 XCP-ng Team 🚀

    7
    Reputation
    5
    Profile views
    8
    Posts
    0
    Followers
    0
    Following
    Joined Last Online

    bleader Unfollow Follow
    Vates 🪐 XCP-ng Team 🚀

    Best posts made by bleader

    • RE: Live migrate of Rocky Linux 8.8 VM crashes/reboots VM

      So, after our investigations, we were able to pinpoint the issue.

      It seem to happen on most RHEL derivative distributions when migrating from 8.7 to 8.8. As suggested, the bug is in the kernel.

      Starting with 4.18.0-466.el8 the patch: x86/idt: Annotate alloc_intr_gate() with __init is integrated and will create the issue. It is missing x86/xen: Split HVM vector callback setup and interrupt gate allocation that should have been integrated as well.

      The migration to 8.8 will move you to 4.18.0-477.* versions that are also raising this issue, that's what you reported.

      We found that the 4.18.0-488 that can be found in CentOS 8 Stream integrates the missing patch, and do indeed work when installed manually.

      Your report helped us identify and reproduce the issues. That allowed us to provide a callstack to Xen devs. Then Roger Pau Monné found that it was this patch missing quickly, and we were able to find which versions of the kernel RPMs were integrating it and when the fix was integrated.

      This means the issue was identified on RH side, and it is now a matter of having an updated kernel in derivative distributions like Rocky and Alma.

      posted in Compute
      bleaderB
      bleader
    • RE: Epyc VM to VM networking slow

      Hello guys,

      I'll be the one investigating this further, we're trying to compile a list of CPUs and their behavior. First, thank you for your reports and tests, that's already very helpful and gave us some insight already.

      Setup

      If some of you can help us cover more ground that would be awesome, so here is what would be an ideal for testing to get everyone on the same page:

      • An AMD host, obviously 🙂
        • yum install iperf ²
      • 2 VMs on the same host, with the distribution of your choice¹
        • each with 4 cores if possible
        • 1GB of ram should be enough if you don't have a desktop environment to load
        • iperf2²

      ¹: it seems some recent kernels do provide a slight boost, but in any case the performance is pretty low for such high grade CPUs.
      ²: iperf3 is singlethreaded, the -P option will establish multiple connexions, but it will process all of them in a single thread, so if reaching a 100% cpu usage, it won't get much increase and won't help identifying the scaling on such a cpu. For example on a Ryzen 5 7600 processor, we do have about the same low perfomances, but using multiple thread will scale, which does not seem to be the case for EPYC Zen1 CPUs.

      Tests

      • do not disable mitigations for now, as its only on kernel side, there are still mitigation active in xen, and from my testing it doesn't seem to help much, and will increase combinatory of results
      • for each test, run xentop on host, and try to get an idea of the top values of each domain when the test is running
      • run iperf -s on VM1, and let it run (no -P X this would stop after X connexion established)
      • tests:
        • vm2vm 1 thread: on VM2, run iperf -c <ip_VM1> -t 60, note result for v2v 1 thread
        • vm2vm 4 threads on VM2, run iperf -c <ip_VM1> -t 60 -P4, note result for v2v 4 threads
        • host2vm 1 thread: on host, run iperf -c <ip_VM1> -t 60, note result for h2v 1 thread
        • host2vm 4 threads on host, run iperf -c <ip_VM1> -t 60 -P4, note result for h2v 4 threads

      Report template

      Here is an example of report template

      • Host:
        • cpu:
        • number of sockets:
        • cpu pinning: yes (detail) / no (use automated setting)
        • xcp-ng version:
        • output of xl info -n especially the cpu_topology section in a code block.
      • VMs:
        • distrib & version
        • kernel version
      • Results:
        • v2m 1 thread: throughput / cpu usage from xentop³
        • v2m 4 threads: throughput / cpu usage from xentop³
        • h2m 1 thread: througput / cpu usage from xentop³
        • h2m 4 threads: througput / cpu usage from xentop³

      ³: I note the max I see while test is running in vm-client/vm-server/host order.

      What was tested

      Mostly for information, here are a few tests I ran which did not seem to improve performances.

      • disabling the mitigations of various security issues at host and VM boot time using kernel boot parameters: noibrs noibpb nopti nospectre_v2 spectre_v2_user=off spectre_v2=off nospectre_v1 l1tf=off nospec_store_bypass_disable no_stf_barrier mds=off mitigations=off. Note this won't disable them at xen level as there are patches that enable the fixes for the related hardware with no flags to disable them.
      • disabling AVX passing noxsave in kernel boot parameters as there is a known issue on Zen CPU avoided boosting when a core is under heavy AVX load, still no changes.
      • Pinning: I tried to use a single "node" in case the memory controllers are separated, I tried avoiding the "threads" on the same core, and I tried to spread load accross nodes, althrough it seems to give a sllight boost, it still is far from what we should be expecting from such CPUs.
      • XCP-ng 8.2 and 8.3-beta1, seems like 8.3 is a tiny bit faster, but tends to jitter a bit more, so I would not deem that as relevant either.

      Not tested it myself but @nicols tried on the same machine giving him about 3Gbps as we all see, on VMWare, and it went to ~25Gbps single threaded and about 40Gbps with 4 threads, and with proxmox about 21.7Gbps (I assume single threaded) which are both a lot more along what I would expect this hardware to produce.

      @JamesG did test windows and debian guests and got about the same results.

      Althrough we do get a small boost by increasing threads (or connexions in case of iperf3), it still is far from what we can see on other setups with vmware or proxmox).

      Althrough Olivier's pool with zen4 desktop cpu do scale a lot better than EPYCs when increasing the number of threads, it still is not providing us with expected results for such powerful cpus in single thread (we do not even reach vmware single thread performances with 4 threads).

      Althrough @Ajmind-0 test show a difference between debian versions, results even on debian 11 are stil not on par with expected results.

      Disabling AVX only provided an improvement on my home FX cpu, which are known to not have real "threads" and share computing unit between 2 threads of a core, so it does make sense. (this is not shown in the table)

      It seems that memcpy in the glibc is not related to the issue, dd if=/dev/zero of=/dev/null has decent performances on these machines (1.2-1.3GBytes/s), and it's worth keeping in mind that both kernel and xen have their own implementation, so it could play a small role in filling the ring buffer in iperf, but I feel like the libc memcpy() is not at play here.

      Tests table

      I'll update this table with updated results, or maybe repost it in further post.

      Throughputs are in Gbit/s, noted as G for shorter table entries.

      CPU usages are for (VMclient/VMserver/dom0) in percentage as shown in xentop.

      user cpu family market v2v 1T v2v 4T h2v 1T h2v 4T notes
      vates fx8320-e piledriver desktop 5.64 G (120/150/220) 7.5 G (180/230/330) 9.5 G (0/110/160) 13.6 G (0/300/350) not a zen cpu, no boost
      vates EPYC 7451 Zen1 server 4.6 G (110/180/250) 6.08 G (180/220/300) 7.73 G (0/150/230) 11.2 G (0/320/350) no boost
      vates Ryzen 5 7600 Zen4 desktop 9.74 G (70/80/100) 19.7 G (190/260/300) 19.2G (0/110/140) 33.9 G (0/310/350) Olivier's pool, no boost
      nicols EPYC 7443 Zen3 server 3.38 G (?) iperf3
      nicols EPYC 7443 Zen3 server 2.78 G (?) 4.44 G (?) iperf2
      nicols EPYC 7502 Zen2 server similar ^ similar ^ iperf2
      JamesG EPYC 7302p Zen2 server 6.58 G (?) iperf3
      Ajmind-0 EPYC 7313P Zen3 server 7.6 G (?) 10.3 G (?) iperf3, debian11
      Ajmind-0 EPYC 7313P Zen3 server 4.4 G (?) 3.07G (?) iperf3, debian12
      vates EPYC 9124 Zen4 server 1.16 G (16/17/??⁴) 1.35 G (20/25/??⁴) N/A N/A !xcp-ng, Xen 4.18-rc + suse 15
      vates EPYC 9124 Zen4 server 5.70 G (100/140/200) 10.4 G (230/250/420) 10.7 G (0/120/200) 15.8 G (0/320/380) no boost
      vates Ryzen 9 5950x Zen3 desktop 7.25 G (30/35/60) 16.5 G (160/210/300) 17.5 G (0/110/140) 27.6 G (0/270/330) no boost

      ⁴: xentop on this host shows 3200% on dom0 all the time, profiling does not seem to show anything actually using CPU, but may be related to the extremely poor performance

      last updated: 2023-11-29 16:46

      All help is welcome! For those of you who already provided tests I integrated in the table, feel free to not rerun tests, it looks like following the exact protocol and provided more data won't make much of a difference and I don't want to waste your time!

      Thanks again to all of you for your insight and your patience, it looks like this is going to be a deep rabbit hole, I'll do my best to get to the bottom of this as soon as possible.

      posted in Compute
      bleaderB
      bleader
    • RE: Updates announcements and testing

      @TodorPetkov Yes, for now we do not know when this update will be released on XenServer side yet, but it will be published on XCP-ng side too.

      What was released for now is suffering from the same issue as described in your link.

      If I'm not mistaken:

      • the linux-firmware update fixes the issues with zenbleed
      • the kernel patch is working around the case where the updated firmware is not used by disabling features via the control register, and there were too much disabled in the previous patch.
      • if you're using the updated firmware, this workaround will not be used, and therefore the updated patch is not critical.

      You can check you're running the right microcode version via:

      journalctl -k --grep=microcode
      

      Without the -k you should be able to see previous boots and ensure the patch_level= has changed. I'm unsure which version to expect there as we do not have zen2 at hand for testing this.

      We will indeed provide an update later, likely not in a dedicated update, but with other fixes.

      I hope that answers properly your question!

      posted in News
      bleaderB
      bleader
    • RE: Weird kern.log errors

      It doesn't ring a bell as it is for me.

      What I see from the first log is the segfault on blktap and in xcp-rrdd-xenpm, likely that was while writing to a disk. In all cases, it is a xen_mc_flush() call.

      Given it happens on a single machine, I would venture it could be related to the disk controller, or disk itself, you could try to have a look at a dmidecode to see if the controllers are the same as on othe machines (sometimes there are small discrepencies between supposedly identical hardware), and check the drives with smartctl for any health issues. But especially as you were on raid1 originally, I doubt an issue with the drives themselves would lead to such an issue...

      posted in Compute
      bleaderB
      bleader

    Latest posts made by bleader

    • RE: Epyc VM to VM networking slow

      Hello guys,

      I'll be the one investigating this further, we're trying to compile a list of CPUs and their behavior. First, thank you for your reports and tests, that's already very helpful and gave us some insight already.

      Setup

      If some of you can help us cover more ground that would be awesome, so here is what would be an ideal for testing to get everyone on the same page:

      • An AMD host, obviously 🙂
        • yum install iperf ²
      • 2 VMs on the same host, with the distribution of your choice¹
        • each with 4 cores if possible
        • 1GB of ram should be enough if you don't have a desktop environment to load
        • iperf2²

      ¹: it seems some recent kernels do provide a slight boost, but in any case the performance is pretty low for such high grade CPUs.
      ²: iperf3 is singlethreaded, the -P option will establish multiple connexions, but it will process all of them in a single thread, so if reaching a 100% cpu usage, it won't get much increase and won't help identifying the scaling on such a cpu. For example on a Ryzen 5 7600 processor, we do have about the same low perfomances, but using multiple thread will scale, which does not seem to be the case for EPYC Zen1 CPUs.

      Tests

      • do not disable mitigations for now, as its only on kernel side, there are still mitigation active in xen, and from my testing it doesn't seem to help much, and will increase combinatory of results
      • for each test, run xentop on host, and try to get an idea of the top values of each domain when the test is running
      • run iperf -s on VM1, and let it run (no -P X this would stop after X connexion established)
      • tests:
        • vm2vm 1 thread: on VM2, run iperf -c <ip_VM1> -t 60, note result for v2v 1 thread
        • vm2vm 4 threads on VM2, run iperf -c <ip_VM1> -t 60 -P4, note result for v2v 4 threads
        • host2vm 1 thread: on host, run iperf -c <ip_VM1> -t 60, note result for h2v 1 thread
        • host2vm 4 threads on host, run iperf -c <ip_VM1> -t 60 -P4, note result for h2v 4 threads

      Report template

      Here is an example of report template

      • Host:
        • cpu:
        • number of sockets:
        • cpu pinning: yes (detail) / no (use automated setting)
        • xcp-ng version:
        • output of xl info -n especially the cpu_topology section in a code block.
      • VMs:
        • distrib & version
        • kernel version
      • Results:
        • v2m 1 thread: throughput / cpu usage from xentop³
        • v2m 4 threads: throughput / cpu usage from xentop³
        • h2m 1 thread: througput / cpu usage from xentop³
        • h2m 4 threads: througput / cpu usage from xentop³

      ³: I note the max I see while test is running in vm-client/vm-server/host order.

      What was tested

      Mostly for information, here are a few tests I ran which did not seem to improve performances.

      • disabling the mitigations of various security issues at host and VM boot time using kernel boot parameters: noibrs noibpb nopti nospectre_v2 spectre_v2_user=off spectre_v2=off nospectre_v1 l1tf=off nospec_store_bypass_disable no_stf_barrier mds=off mitigations=off. Note this won't disable them at xen level as there are patches that enable the fixes for the related hardware with no flags to disable them.
      • disabling AVX passing noxsave in kernel boot parameters as there is a known issue on Zen CPU avoided boosting when a core is under heavy AVX load, still no changes.
      • Pinning: I tried to use a single "node" in case the memory controllers are separated, I tried avoiding the "threads" on the same core, and I tried to spread load accross nodes, althrough it seems to give a sllight boost, it still is far from what we should be expecting from such CPUs.
      • XCP-ng 8.2 and 8.3-beta1, seems like 8.3 is a tiny bit faster, but tends to jitter a bit more, so I would not deem that as relevant either.

      Not tested it myself but @nicols tried on the same machine giving him about 3Gbps as we all see, on VMWare, and it went to ~25Gbps single threaded and about 40Gbps with 4 threads, and with proxmox about 21.7Gbps (I assume single threaded) which are both a lot more along what I would expect this hardware to produce.

      @JamesG did test windows and debian guests and got about the same results.

      Althrough we do get a small boost by increasing threads (or connexions in case of iperf3), it still is far from what we can see on other setups with vmware or proxmox).

      Althrough Olivier's pool with zen4 desktop cpu do scale a lot better than EPYCs when increasing the number of threads, it still is not providing us with expected results for such powerful cpus in single thread (we do not even reach vmware single thread performances with 4 threads).

      Althrough @Ajmind-0 test show a difference between debian versions, results even on debian 11 are stil not on par with expected results.

      Disabling AVX only provided an improvement on my home FX cpu, which are known to not have real "threads" and share computing unit between 2 threads of a core, so it does make sense. (this is not shown in the table)

      It seems that memcpy in the glibc is not related to the issue, dd if=/dev/zero of=/dev/null has decent performances on these machines (1.2-1.3GBytes/s), and it's worth keeping in mind that both kernel and xen have their own implementation, so it could play a small role in filling the ring buffer in iperf, but I feel like the libc memcpy() is not at play here.

      Tests table

      I'll update this table with updated results, or maybe repost it in further post.

      Throughputs are in Gbit/s, noted as G for shorter table entries.

      CPU usages are for (VMclient/VMserver/dom0) in percentage as shown in xentop.

      user cpu family market v2v 1T v2v 4T h2v 1T h2v 4T notes
      vates fx8320-e piledriver desktop 5.64 G (120/150/220) 7.5 G (180/230/330) 9.5 G (0/110/160) 13.6 G (0/300/350) not a zen cpu, no boost
      vates EPYC 7451 Zen1 server 4.6 G (110/180/250) 6.08 G (180/220/300) 7.73 G (0/150/230) 11.2 G (0/320/350) no boost
      vates Ryzen 5 7600 Zen4 desktop 9.74 G (70/80/100) 19.7 G (190/260/300) 19.2G (0/110/140) 33.9 G (0/310/350) Olivier's pool, no boost
      nicols EPYC 7443 Zen3 server 3.38 G (?) iperf3
      nicols EPYC 7443 Zen3 server 2.78 G (?) 4.44 G (?) iperf2
      nicols EPYC 7502 Zen2 server similar ^ similar ^ iperf2
      JamesG EPYC 7302p Zen2 server 6.58 G (?) iperf3
      Ajmind-0 EPYC 7313P Zen3 server 7.6 G (?) 10.3 G (?) iperf3, debian11
      Ajmind-0 EPYC 7313P Zen3 server 4.4 G (?) 3.07G (?) iperf3, debian12
      vates EPYC 9124 Zen4 server 1.16 G (16/17/??⁴) 1.35 G (20/25/??⁴) N/A N/A !xcp-ng, Xen 4.18-rc + suse 15
      vates EPYC 9124 Zen4 server 5.70 G (100/140/200) 10.4 G (230/250/420) 10.7 G (0/120/200) 15.8 G (0/320/380) no boost
      vates Ryzen 9 5950x Zen3 desktop 7.25 G (30/35/60) 16.5 G (160/210/300) 17.5 G (0/110/140) 27.6 G (0/270/330) no boost

      ⁴: xentop on this host shows 3200% on dom0 all the time, profiling does not seem to show anything actually using CPU, but may be related to the extremely poor performance

      last updated: 2023-11-29 16:46

      All help is welcome! For those of you who already provided tests I integrated in the table, feel free to not rerun tests, it looks like following the exact protocol and provided more data won't make much of a difference and I don't want to waste your time!

      Thanks again to all of you for your insight and your patience, it looks like this is going to be a deep rabbit hole, I'll do my best to get to the bottom of this as soon as possible.

      posted in Compute
      bleaderB
      bleader
    • RE: Live migrate of Rocky Linux 8.8 VM crashes/reboots VM

      @KFC-Netearth Surprising as it is still .477 but maybe the patch was backported, thanks for telling us!

      posted in Compute
      bleaderB
      bleader
    • RE: Updates announcements and testing

      @TodorPetkov Yes, for now we do not know when this update will be released on XenServer side yet, but it will be published on XCP-ng side too.

      What was released for now is suffering from the same issue as described in your link.

      If I'm not mistaken:

      • the linux-firmware update fixes the issues with zenbleed
      • the kernel patch is working around the case where the updated firmware is not used by disabling features via the control register, and there were too much disabled in the previous patch.
      • if you're using the updated firmware, this workaround will not be used, and therefore the updated patch is not critical.

      You can check you're running the right microcode version via:

      journalctl -k --grep=microcode
      

      Without the -k you should be able to see previous boots and ensure the patch_level= has changed. I'm unsure which version to expect there as we do not have zen2 at hand for testing this.

      We will indeed provide an update later, likely not in a dedicated update, but with other fixes.

      I hope that answers properly your question!

      posted in News
      bleaderB
      bleader
    • RE: Live migrate of Rocky Linux 8.8 VM crashes/reboots VM

      Not entirely sure, but that may be related to what's happening on redhat side of things 😕

      posted in Compute
      bleaderB
      bleader
    • RE: Live migrate of Rocky Linux 8.8 VM crashes/reboots VM

      Makes sense, I was hoping they would go with a -488 on update... Sorry to hear its not the case 😞

      posted in Compute
      bleaderB
      bleader
    • RE: kswapd0: page allocation failure under high load

      When you say

      under high load

      Do you mean the export is creating a heavy load, or is there something else in your setup creating an heavy load at the same time?

      As there is a nfs4 in the call trace, I would guess you're doing an export from XO, so downloading the export through HTTP, and that the VM is originally on a shared SR, is that right?

      The bottom call trace is going kswapd -> shrink_slab, I would venture it is trying to free up memory, as in my first question, if there is something else putting heavy load, it could be expected, if not it could be a new leak or a higher memory consumption while doing the export for a reason or another. But I doubt about the leak as I would except it to end in an OOM more than this kind of warning.

      If you're sure the setup is similar as well as the load outside of the export, it can indeed be something from the the updates.

      posted in Compute
      bleaderB
      bleader
    • RE: Live migrate of Rocky Linux 8.8 VM crashes/reboots VM

      So, after our investigations, we were able to pinpoint the issue.

      It seem to happen on most RHEL derivative distributions when migrating from 8.7 to 8.8. As suggested, the bug is in the kernel.

      Starting with 4.18.0-466.el8 the patch: x86/idt: Annotate alloc_intr_gate() with __init is integrated and will create the issue. It is missing x86/xen: Split HVM vector callback setup and interrupt gate allocation that should have been integrated as well.

      The migration to 8.8 will move you to 4.18.0-477.* versions that are also raising this issue, that's what you reported.

      We found that the 4.18.0-488 that can be found in CentOS 8 Stream integrates the missing patch, and do indeed work when installed manually.

      Your report helped us identify and reproduce the issues. That allowed us to provide a callstack to Xen devs. Then Roger Pau Monné found that it was this patch missing quickly, and we were able to find which versions of the kernel RPMs were integrating it and when the fix was integrated.

      This means the issue was identified on RH side, and it is now a matter of having an updated kernel in derivative distributions like Rocky and Alma.

      posted in Compute
      bleaderB
      bleader
    • RE: Weird kern.log errors

      It doesn't ring a bell as it is for me.

      What I see from the first log is the segfault on blktap and in xcp-rrdd-xenpm, likely that was while writing to a disk. In all cases, it is a xen_mc_flush() call.

      Given it happens on a single machine, I would venture it could be related to the disk controller, or disk itself, you could try to have a look at a dmidecode to see if the controllers are the same as on othe machines (sometimes there are small discrepencies between supposedly identical hardware), and check the drives with smartctl for any health issues. But especially as you were on raid1 originally, I doubt an issue with the drives themselves would lead to such an issue...

      posted in Compute
      bleaderB
      bleader