Swen

Swen

@gb-123 pool master does not need to be linstor controller. Try this command on the other hosts. It will only work on the node which is the linstor controller.

Swen

@ronan-a If you want me to test some of your fixed, please don't hesitate.

Swen

@gb-123 pool master does not need to be linstor controller. Try this command on the other hosts. It will only work on the node which is the linstor controller.

Swen

@olivierlambert did you already had the change to test your new hardware?

We did some more benchmarking with only bonded 40G interface.
We used the following fio command:
fio --name=a --direct=1 --bs=1M --iodepth=32 --ioengine=libaio --rw=write

on bare-metal (OS: Ubuntu 22 LTS) we are able to reach 3100MB/s
on a VM installed on xcp-ng we are able to reach 1200MB/s
on dom0 we are able to reach 1300MB/s when create a new linstor volume and using /dev/drbd directly
on dom0 when using the lvm without drbd we are able to reach 1500MB/s

And btw it looks like tapdisk is our bottleneck on dom0 as you suggested before, because this is a single-thread process and our CPU reached a limit.

From the numbers above the performance inside the VM does not look as bad as we thought at the beginning. The only question we have at this moment is why we are "loosing" over half of the performance between a bare-metal installation and when testing the same storage from within dom0?
Is this expected behavior?

Swen

@olivierlambert just to be sure: we did also use your recommended fio parameters with the exact same results. We used fio from inside a VM not from inside dom0. My comments regarding waits inside the VM and no waits in dom0 was just additional information.

I am aware of possible bottleneck like latency, SSD others, but in our case we can rule them out. Reason is that we double our speed when switching from 10G to 40G interface while the rest is the exact same configuration. As fsr as I can see this looks to me like xcp-ng is the bottleneck and limiting bandwidth of the interface in some way. Even the numbers you provided are not really good performance numbers. Did you get more bandwidth than 8 Gbits over the linstor interface?

We are going to install Ubuntu on the same servers and install linstor on it to test our infrastructure on bare-metal without any hypervisor to see if it is xcp-ng related or not.

Swen

@olivierlambert thx for the feedback! I do not get how you see that you reach 20G on your nic. Can you please explain it? I see that you reach 2600MiB/s in read, but this is more likely on local disk, isn't? What I can see in our lab environment is that for what ever reason we do not get more than around 8Gbits on pass-through via a 40G interface and 4Gbits via a 10G interface and therefore we do not get any good performance out of the storage repository. I am unable to find the root cause of this. Do you have any idea where to look? I can see high waits on the OS of the VM, but no waits inside dom0 of any node.

Swen

hi @ronan-a,
we did some performance testing with the latest version and we run into a bottleneck we are unable to identify in detail.

Here is our setup:
Dell R730
CPU: 2x Intel E5-2680v4
RAM: 384GB
Storage: 2x NVMe Samsung PM9A3 3.84TB via U.2 PCIe 3 x16 Extender Card
NICs: 2x 10G Intel, 2x 40G Intel

We have 3 servers with the same configuration and installed them as a cluster with replica count of 2.
xcp-ng 8.2 with latest patches is installed. All servers are using the same switch (2x QFX5100-24Q, configured as virtual chassis). We are using a LACP bond on the 40G interfaces.

When using the 10G interfaces (xcp-ng is using those interfaces as management interfaces) for linstor traffic we run into a cap on the nic bandwith of around 4 Gbit/s (500MB/s).
When using the bonded 40G interfaces the cap is around 8 Gbit/s (1000MB/s)

Only 1 VM is installed on the pool. We are using Ubuntu 22.04 LTS with latest updates installed from ISO using the template for Ubuntu 20.04.

Here is the fio command we are using:
fio --name=a --direct=1 --bs=1M --iodepth=32 --ioengine=libaio --rw=write --filename=/tmp/test.io --size=100G

I would expect far more because we do not hit any known bottleneck of interfaces, NVMe or PCIe slot. Do I miss something? Is this expected performance? If not, any idea what the bottleneck is? Does anybody have some data we can compare with?

regards,
Swen

Swen

@ronan-a did you test some linstor vars like:
DrbdOptions/auto-diskful': Makes a resource diskful if it was continuously diskless primary for X minutes
'DrbdOptions/auto-diskful-allow-cleanup': Allows this resource to be cleaned up after toggle-disk + resync is finished

Thx for your feedback!

Swen

@olivierlambert thx for the quick reply! Does close mean days, weeks or month?

Swen

@olivierlambert can you please provide an update or better a roadmap regarding the implementation of linstor in xcp-ng? I find it hard to understand in which status this project is at the moment. As you know we are really looking forward to use it in production with our Cloudstack installation. Thx for any news.

Swen

@TheiLLeniumStudios It should be possible, but not recommended. You can end up in a split-brain-scenario.

Swen

@andersonalipio said in XOSTOR hyperconvergence preview:

Hello all,

I got it working just fine so far on all lab tests, but one thing a couldn't find in here or other post related to using a dedicated storage network other then managemente network. Can it be modified? In this lab, we have 2 hosts with 2 network cards each, one for mgmt and vm external traffic, and the second should be exclusive for storage, since is way faster.

We are using a separate network in our lab. What we do is this:

get the node list from the running controller via

linstor node list

take a look at the node interface list via

linstor node interface list <node name>

modify each nodes interface via

linstor node interface modify <node name> default --ip <ip>

check addresses via this

linstor node list

Hope that helps!

Swen

@Swen

Best posts made by Swen

Latest posts made by Swen