XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login
    1. Home
    2. ronan-a
    3. Best
    Offline
    • Profile
    • Following 0
    • Followers 2
    • Topics 1
    • Posts 158
    • Groups 5

    Posts

    Recent Best Controversial
    • RE: XCP-ng team is growing

      Hi everyone!

      Two years ago I worked on many features in the Xen Orchestra project like delta backup algorithm, load balancing, ...
      After all this time, I'm back to contribute to XCP-ng. 🙂

      Currently, I'm working on performance improvements (VM migration, storage...) and on new SMAPIv3 plugins.
      And maybe in the future on other cool stuff. 😉

      Ronan

      posted in News
      ronan-aR
      ronan-a
    • RE: Dev diaries #1: Analyzing storage perf (SMAPIv3)

      qemu-dp: context and parameters

      Here are some new charts. Make sure you understand the global QCOW2 image structure. (See: https://events.static.linuxfound.org/sites/events/files/slides/kvm-forum-2017-slides.pdf)

      ioping.png
      random.png
      sequential.png

      More explicit labels 😉:

      • ext4-ng: qemu-dp with default parameters (O_DIRECT and no-flush)
      • ext4-ng (VHD): tapdisk with VHD (no O_DIRECT + timeout)
      • ext4-ng (Buffer/Flush): no O_DIRECT + flush allowed
      • Cache A: L2-Cache=3MiB
      • Cache B: L2-Cache=6.25MiB
      • Cache C: Entry-Size=8KiB
      • Cache D: Entry-Size=64KiB + no O_DIRECT + flush allowed
      • Cache E: L2-Cache=8MiB + Entry-size=8KiB + no O_DIRECT + flush allowed
      • Cache F: L2-Cache=8MiB + Entry-size=8KiB + no O_DIRECT
      • Cache G: L2-Cache=8MiB + Entry-size=8KiB
      • Cache H: L2-Cache=8MiB + Entry-size=16KiB + no O_DIRECT
      • Cache I: L2-Cache=16MiB + Entry-size=8KiB + no O_DIRECT

      These results where obtained with an optane (nvme). We can see a better random write performance with the F configuration instead of using the default qemu-dp parameters and the ioping is not so bad. But it's not sufficient compared to tapdisk.

      So like said in the previous message, it's the moment to find the bottleneck in the qemu-dp process. 😉

      posted in News
      ronan-aR
      ronan-a
    • RE: Dev diaries #1: Analyzing storage perf (SMAPIv3)

      qemu-dp/tapdisk and CPU Usage per function call

      Thank you to flamegraph. 🙂 (See: http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html)
      Analysis and improvements in a future message!

      qemu

      out.png

      tapdisk

      tapdisk.png

      posted in News
      ronan-aR
      ronan-a
    • Dev diaries #1: Analyzing storage perf (SMAPIv3)

      SMAPIv3: results and analyze

      After some investigation, it was discovered that the SMAPIv3 is not THE perfect storage interface. Here are some charts to analyze:

      chart3.png

      chart1.png

      chart2.png

      Yeah, there are many storage types:

      • lvm, ext (well known)
      • ext4 (storage type added on SMAPIv1)
      • ext4-ng (a new storage type added on SMAPIv3 for this benchmark and surely available in the future)
      • xfs-ng (same idea but for XFS)

      You can notice the usage of RAID0 with ext4-ng, but it's not important for the moment.

      Let's focus on the performance of ext4-ng/xfs-ng! How can we explain these poor results?! By default the SMAPIv3 plugins like gfs2/filebased added by Citrix use qemu-dp. It is a fork of qemu, it's also a substitute of the tapdisk/VHD environment used to improve performance and remove some limitations like the maximum size supported by the VHD format (2TB). QEMU supports QCow images to break this limitation.

      So, the performance problem of the SMAPIv3 seems related to qemu-dp. And yes... You can see the results of the ext4-ng VHD and ext4-ng VHDv1 plugins, they are very close to the SMAPIv1 measurements:

      • The ext4-ng VHDv1 plugin uses the O_DIRECT flag + a timeout like the SMAPIv1 implementation.
      • The ext4-ng VHD plugin does not use the O_DIRECT flag.

      Next, to validate a potential bottleneck in the qemu-dp process, two RAID0 have been set up (one with 2 disks and an other with 4), and it seems interesting to see a good usage of the physical disk! There is one qemu process for each disk in our VM, and the disk usage is similar of the performance observed in the Dom0.

      For the future

      The SMAPIv3/qemu-dp tuple is not totally a problem:

      • A good scale is visible with the RAID0 benchmark.
      • It's easy to add a new storage type in the SMAPIv3. (Two plugin types: Volume and Datapath automatically detected when added in this system. See: https://xapi-project.github.io/xapi-storage/#learn-architecture)
      • The QCow2 format is a good alternative to break the size limitation of the VHD images.
      • A RAID0 on the SMAPIv1 does not improve the I/O performance contrary to qemu-dp.

      Next steps:

      • Understand how qemu-dp is called (context, parameters, ...).
      • Find the bottleneck in the qemu-dp.
      • Find a solution to improve the performance.
      posted in News
      ronan-aR
      ronan-a
    • RE: XOSTOR hyperconvergence preview

      ⚠ UPDATE AND IMPORTANT INFO ⚠

      I am updating the LINSTOR packages on our repositories.
      This update fixes many issues, especially regarding the HA.

      However, this update is not compatible with the LINSTOR SRs already configured, so it is necessary to DELETE the existing SRs before installing this update.
      We exceptionally allow ourselves to force a reinstallation during this beta, as long as we haven't officially released a production version.
      In theory, this should not happen again.

      To resume:
      1 - Uninstall any existing LINSTOR SR.
      2 - Install the new sm package: "sm-2.30.7-1.3.0.linstor.3.xcpng8.2.x86_64" on all used hosts.
      3 - Reinstall the LINSTOR SR.

      Thank you ! 🙂

      posted in XOSTOR
      ronan-aR
      ronan-a
    • RE: XOSTOR on 8.3?

      @fatek Just for information, the current version 8.3 is not usable without major problems. Hower, I rebased recently all the LINSTOR sm changes from XCP-ng 8.2 to 8.3 in a new package: sm-3.2.3-1.7.xcpng8.3.x86_64.rpm, we passed the driver tests without too many problems. This RPM should be available during the month of October. Even after its release, we consider that it is not stable enough for production use until we have enough user feedback (but of course this new RPM is synchronized on all fixes and improvements of version 8.2).

      EDIT: Released on October 25, we originally planned to wait a bit. 🙂

      posted in XOSTOR
      ronan-aR
      ronan-a
    • RE: XOSTOR hyperconvergence preview

      @Maelstrom96 Well there is no simple helper to do that using the CLI.

      So you can create a new node using:

      linstor node create --node-type Combined <NAME> <IP>
      

      Then you must evacuate the old node to preserve the replication count:

      linstor node evacuate <OLD_NAME>
      

      Next, you can change your hostname an restart the services on each host:

      systemctl stop linstor-controller
      systemctl restart linstor-satellites
      

      Finally you can delete the node:

      linstor node delete <OLD_NAME>
      

      After that you must recreate the diskless resources if necessary. Exec linstor advise r to see the commands to execute.

      posted in XOSTOR
      ronan-aR
      ronan-a
    • RE: XOSTOR hyperconvergence preview

      @BHellman The first post has a FAQ that I update each time I meet users with a common/recurring problem. 😉

      posted in XOSTOR
      ronan-aR
      ronan-a
    • RE: XOSTOR hyperconvergence preview

      @gb-123 You can use this command:

      linstor resource-group modify xcp-sr-linstor_group_thin_device --place-count <NEW_COUNT>
      

      You can confirm the resource group to use with:

      linstor resource-group list
      

      Ignore the default group named: DfltRscGrp and take the second.

      Note: Don't use a replication count greater than 3.

      posted in XOSTOR
      ronan-aR
      ronan-a
    • RE: XCP-ng 8.0.0 Beta now available!

      @peder Fixed! This fix will be available (as soon as possible) in a future xcp-emu-manager package.

      posted in News
      ronan-aR
      ronan-a
    • RE: Xen Orchestra Load Balancer - turning on hosts

      @berish-lohith Just FYI I created a card in our backlog, I don't see too many blocking points to implement it correctly. 😉

      posted in Xen Orchestra
      ronan-aR
      ronan-a
    • RE: Unable to enable HA with XOSTOR

      @dslauter You can test the new RPMs using the testing repository, FYI: sm-3.2.3-1.14.xcpng8.3 and http-nbd-transfer-1.5.0-1.xcpng8.3.

      posted in Advanced features
      ronan-aR
      ronan-a
    • RE: Unable to enable HA with XOSTOR

      @dslauter Just for your information, I will update the http-nbd-transfer + sm in a few weeks. I fixed many issues regarding HA activation in 8.3 due to bad migration of specific python code from version 2 to version 3.

      posted in Advanced features
      ronan-aR
      ronan-a
    • RE: XOSTOR on 8.3?

      @fatek Use directly the XOA method, it correctly installs the dependencies + is more secure regarding the disk selection. 😉

      posted in XOSTOR
      ronan-aR
      ronan-a
    • RE: XOSTOR hyperconvergence preview

      @ferrao said in XOSTOR hyperconvergence preview:

      May I ask now a licensing issue: if we upgrade to Vates VM, does the deployment mode on the first message is considered supported or everything will need to be done again from XOA?

      Regarding XOSTOR Support Licenses: In general, we prefer our users to use a trial license through XOA. And if they are interested, they can subscribe to a commercial license.
      To be more precise: the manual steps in this thread are still valid to configure an SR LINSTOR, no difference with the XOA commands. However, if you wish to suscribe to a support license from a pool without XOA nor trial license, we are quite strict on the fact that the infrastructure must be in a stable state.

      posted in XOSTOR
      ronan-aR
      ronan-a
    • RE: XOSTOR hyperconvergence preview

      @Maelstrom96 said in XOSTOR hyperconvergence preview:

      However, after rebooting the master, it seems like the SR doesn't want to allow any disk migration, and manual Scan are failing.

      What's the status of these commands on each host?

      systemctl status linstor-controller
      systemctl status linstor-satellite
      systemctl status drbd-reactor
      mountpoint /var/lib/linstor
      drbdsetup events2
      

      Also please share your SMlog files. 🙂

      posted in XOSTOR
      ronan-aR
      ronan-a
    • RE: XOSTOR hyperconvergence preview

      @fatek No. I removed this param, it's useless now.

      posted in XOSTOR
      ronan-aR
      ronan-a
    • RE: XOSTOR hyperconvergence preview

      @Maelstrom96 The XCP-ng 8.3 LINSTOR version is not often updated, and we are totally focused on the stable 8.2 version.
      As a reminder XCP-ng 8.3 is still in beta, so we can't write now a documentation to update LINSTOR between these versions because we still have important issues to fix and improvements to add that can impact and/or invalidate a migration process.

      posted in XOSTOR
      ronan-aR
      ronan-a
    • RE: XOSTOR hyperconvergence preview

      @Maelstrom96 We must update our documentation for that, This will probably require executing commands manually during an upgrade.

      posted in XOSTOR
      ronan-aR
      ronan-a
    • RE: XOSTOR hyperconvergence preview

      @TheiLLeniumStudios said in XOSTOR hyperconvergence preview:

      Is there a way to remove Linstor / XOSTOR entirely?

      If you destroyed the SR, you can remove the linstor packages and force a reinstallation of a stable sm package using yum.

      I've been experimenting a little bit with the latest update and it looks like it takes a really really long time to run VMs on the shared SR created by XOSTOR and mass VM creation (tested with 6 VMs using terraform)

      What do you mean? What's the performance problem? When the VM starts ? During execution?

      also fails with "code":"TOO_MANY_STORAGE_MIGRATES","params":["3"]

      Do you migrate with XO? Because this exception is normally handled by it, and a new migration is restarted a few moment later in this case.

      Or reprovision it? I removed the SR from the XOA but the actual disk still seems to have the filesystem intact.

      Did you just execute a xe sr-forget command on the SR? In this case the volumes are not removed. xe sr-destroy must be used to remove the volumes. So you can execute xe sr-introduce and then xe sr-destroy to clean your hosts.

      Also I was running the XOSTOR as the HA SR. Maybe that caused it to become really slow and also I had the affinity set to disabled

      I'm not sure, (but the HA was buggy before the last update).

      I'm also seeing this in the xcp-rrdd-plugins.log:

      No relation with LINSTOR here. 😉

      posted in XOSTOR
      ronan-aR
      ronan-a