XCP-ng

    • Register
    • Login
    • Search
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    1. Home
    2. ronan-a
    • Profile
    • Following 0
    • Followers 1
    • Topics 1
    • Posts 20
    • Best 6
    • Controversial 0
    • Groups 3

    ronan-a

    @ronan-a

    Vates 🪐 XCP-ng Team 🚀

    42
    Reputation
    91
    Profile views
    20
    Posts
    1
    Followers
    0
    Following
    Joined Last Online

    ronan-a Unfollow Follow
    Vates 🪐 XCP-ng Team 🚀 Global Moderator 👮

    Best posts made by ronan-a

    • RE: XCP-ng team is growing

      Hi everyone!

      Two years ago I worked on many features in the Xen Orchestra project like delta backup algorithm, load balancing, ...
      After all this time, I'm back to contribute to XCP-ng. 🙂

      Currently, I'm working on performance improvements (VM migration, storage...) and on new SMAPIv3 plugins.
      And maybe in the future on other cool stuff. 😉

      Ronan

      posted in News
      ronan-a
      ronan-a
    • RE: Dev diaries #1: Analyzing storage perf (SMAPIv3)

      qemu-dp: context and parameters

      Here are some new charts. Make sure you understand the global QCOW2 image structure. (See: https://events.static.linuxfound.org/sites/events/files/slides/kvm-forum-2017-slides.pdf)

      ioping.png
      random.png
      sequential.png

      More explicit labels 😉:

      • ext4-ng: qemu-dp with default parameters (O_DIRECT and no-flush)
      • ext4-ng (VHD): tapdisk with VHD (no O_DIRECT + timeout)
      • ext4-ng (Buffer/Flush): no O_DIRECT + flush allowed
      • Cache A: L2-Cache=3MiB
      • Cache B: L2-Cache=6.25MiB
      • Cache C: Entry-Size=8KiB
      • Cache D: Entry-Size=64KiB + no O_DIRECT + flush allowed
      • Cache E: L2-Cache=8MiB + Entry-size=8KiB + no O_DIRECT + flush allowed
      • Cache F: L2-Cache=8MiB + Entry-size=8KiB + no O_DIRECT
      • Cache G: L2-Cache=8MiB + Entry-size=8KiB
      • Cache H: L2-Cache=8MiB + Entry-size=16KiB + no O_DIRECT
      • Cache I: L2-Cache=16MiB + Entry-size=8KiB + no O_DIRECT

      These results where obtained with an optane (nvme). We can see a better random write performance with the F configuration instead of using the default qemu-dp parameters and the ioping is not so bad. But it's not sufficient compared to tapdisk.

      So like said in the previous message, it's the moment to find the bottleneck in the qemu-dp process. 😉

      posted in News
      ronan-a
      ronan-a
    • RE: Dev diaries #1: Analyzing storage perf (SMAPIv3)

      qemu-dp/tapdisk and CPU Usage per function call

      Thank you to flamegraph. 🙂 (See: http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html)
      Analysis and improvements in a future message!

      qemu

      out.png

      tapdisk

      tapdisk.png

      posted in News
      ronan-a
      ronan-a
    • Dev diaries #1: Analyzing storage perf (SMAPIv3)

      SMAPIv3: results and analyze

      After some investigation, it was discovered that the SMAPIv3 is not THE perfect storage interface. Here are some charts to analyze:

      chart3.png

      chart1.png

      chart2.png

      Yeah, there are many storage types:

      • lvm, ext (well known)
      • ext4 (storage type added on SMAPIv1)
      • ext4-ng (a new storage type added on SMAPIv3 for this benchmark and surely available in the future)
      • xfs-ng (same idea but for XFS)

      You can notice the usage of RAID0 with ext4-ng, but it's not important for the moment.

      Let's focus on the performance of ext4-ng/xfs-ng! How can we explain these poor results?! By default the SMAPIv3 plugins like gfs2/filebased added by Citrix use qemu-dp. It is a fork of qemu, it's also a substitute of the tapdisk/VHD environment used to improve performance and remove some limitations like the maximum size supported by the VHD format (2TB). QEMU supports QCow images to break this limitation.

      So, the performance problem of the SMAPIv3 seems related to qemu-dp. And yes... You can see the results of the ext4-ng VHD and ext4-ng VHDv1 plugins, they are very close to the SMAPIv1 measurements:

      • The ext4-ng VHDv1 plugin uses the O_DIRECT flag + a timeout like the SMAPIv1 implementation.
      • The ext4-ng VHD plugin does not use the O_DIRECT flag.

      Next, to validate a potential bottleneck in the qemu-dp process, two RAID0 have been set up (one with 2 disks and an other with 4), and it seems interesting to see a good usage of the physical disk! There is one qemu process for each disk in our VM, and the disk usage is similar of the performance observed in the Dom0.

      For the future

      The SMAPIv3/qemu-dp tuple is not totally a problem:

      • A good scale is visible with the RAID0 benchmark.
      • It's easy to add a new storage type in the SMAPIv3. (Two plugin types: Volume and Datapath automatically detected when added in this system. See: https://xapi-project.github.io/xapi-storage/#learn-architecture)
      • The QCow2 format is a good alternative to break the size limitation of the VHD images.
      • A RAID0 on the SMAPIv1 does not improve the I/O performance contrary to qemu-dp.

      Next steps:

      • Understand how qemu-dp is called (context, parameters, ...).
      • Find the bottleneck in the qemu-dp.
      • Find a solution to improve the performance.
      posted in News
      ronan-a
      ronan-a
    • RE: XCP-ng 8.0.0 Beta now available!

      @peder Fixed! This fix will be available (as soon as possible) in a future xcp-emu-manager package.

      posted in News
      ronan-a
      ronan-a
    • RE: SMAPIv3 - Feedback & Bug reports

      @swivvle You can create a ext4 SR like this:

      xe sr-create type=ext4-ng name-label=sr-test device-config:device=/dev/sda3
      

      For a basic filebased SR:

      xe sr-create type=filebased name-label=sr-test2 device-config:file-uri=/root/sr-folder
      

      Don't hesitate if you have other questions.

      posted in Development
      ronan-a
      ronan-a

    Latest posts made by ronan-a

    • RE: Updates announcements and testing

      @JeffBerntsen I think I will release a new linstor RPM to override the sm testing package. The current is: sm-2.30.7-1.2.0.linstor.1.xcpng8.2.x86_64.rpm. For the moment, you can downgrade if you want. 🙂

      posted in News
      ronan-a
      ronan-a
    • RE: Updates announcements and testing

      @JeffBerntsen What's your sm version? I suppose, you updated your hosts and you don't have the right one.

      Please send me the output of: rpm -qa | grep sm-. 🙂

      posted in News
      ronan-a
      ronan-a
    • RE: Create VM Error SR_BACKEND_FAILURE_1200, No such Tapdisk.

      @geoffbland No problem. 🙂 It's the first time I see this error with tapdisk (and it's it's even more surprising to have it on this type of SR...).

      It had all been working fine and I had not changed anything on the share - it just stopped working

      In this case, maybe there was a problem with the XAPI, a lock on the device or something else. Not easy to find the cause without remote access. Don't hesitate to ping us if this problem comes back. 😉

      posted in Xen Orchestra
      ronan-a
      ronan-a
    • RE: Create VM Error SR_BACKEND_FAILURE_1200, No such Tapdisk.

      Also:

      Jun  8 22:39:00 XCPNG02 SM: [11473] ['/usr/sbin/tap-ctl', 'open', '-p', '11505', '-m', '5', '-a', 'aio:/var/run/sr-mount/ec87c10e-1499-c1c5-cf3f-c234062bb459/ubuntu-22.04-live-server-amd64.iso', '-R']
      Jun  8 22:39:00 XCPNG02 SM: [11473]  = 13
      Jun  8 22:39:00 XCPNG02 SM: [11473] ['/usr/sbin/tap-ctl', 'close', '-p', '11505', '-m', '5', '-t', '30']
      Jun  8 22:39:00 XCPNG02 SM: [11473]  = 0
      Jun  8 22:39:00 XCPNG02 SM: [11473] ['/usr/sbin/tap-ctl', 'detach', '-p', '11505', '-m', '5']
      Jun  8 22:39:01 XCPNG02 SM: [11473]  = 0
      Jun  8 22:39:01 XCPNG02 SM: [11473] ['/usr/sbin/tap-ctl', 'free', '-m', '5']
      Jun  8 22:39:01 XCPNG02 SM: [11473]  = 0
      

      There is this error during the tapdisk open call: Permission denied (errno 13).
      Are you sure you can access correctly to the data of your SR?

      The last exception is caused in blktap2.py:

                          try:
                              tapdisk = cls.__from_blktap(blktap)
                              node = '/sys/dev/block/%d:%d' % (tapdisk.major(), tapdisk.minor)
                              util.set_scheduler_sysfs_node(node, 'noop')
                              return tapdisk
                          except:
                              TapCtl.close(pid, minor)
                              raise
      
      posted in Xen Orchestra
      ronan-a
      ronan-a
    • RE: Create VM Error SR_BACKEND_FAILURE_1200, No such Tapdisk.

      @geoffbland You can downgrade your sm version on each host using:

      yum downgrade sm-2.30.6-1.1.xcpng8.2.x86_64
      

      But I'm not sure if your problem is related to the sm linstor version.

      posted in Xen Orchestra
      ronan-a
      ronan-a
    • RE: RunX: tech preview

      @bc-23 You don't have the patched RPMs because there is a new hotfix in the 8.2 and 8.2.1 versions on the main branch. So the actual xenopsd package version is greater than runx... So we must build a new version of the runx packages on our side to correct this issue. We will fix that. 😉

      posted in News
      ronan-a
      ronan-a
    • RE: RunX: tech preview

      @bc-23 What's your xenopsd version? We haven't updated the modified runx package of xenopsd to support runx with XCP-ng 8.2.1. It is possible that you are using the latest packages without the right patches. ^^"

      So please to confirm this issue using rpm -qa | grep xenops. 🙂

      posted in News
      ronan-a
      ronan-a
    • RE: RunX: tech preview

      @theaeon said in RunX: tech preview:

      Oh now that's interesting. Turns out the containers (both archlinux and the one i just created) are exiting w/ error 143. They're getting sigterm'ed from somewhere.

      It's related to how we terminate the VM process: it's a wrapper and not the real process that manages the VM. But we shouldn't show this code to users, it's not the real code, I will create an issue on our side, thanks for the feedback. 🙂

      posted in News
      ronan-a
      ronan-a
    • RE: RunX: tech preview

      @theaeon said in RunX: tech preview:

      For what its worth, the podman logs archlinux command from above is w/o debug. I didn't quite realize it vanishing immediately was intended behavior though, tells you how versed I am in containers.
      I'll try setting up the matrixdotorg/mjolnir thing again now that I have a command I know is working on runc.

      Yeah by default the archlinux image executes the bash command and when the container is started, bash is launched and died just after that. Finally the VM is stopped. This behavior is the same on docker with this image. However using the interactive mode, it's not the case, but we must implement it for a next runx version.

      posted in News
      ronan-a
      ronan-a
    • RE: RunX: tech preview

      @theaeon Like I said we must change how arguments are parsed in the runx script, so avoid additional params like --log-devel. 😉

      posted in News
      ronan-a
      ronan-a