-
@ronan-a another thing I found is that it linstor occupies more storage than expected. I created the sr with option 'thin'. I created 2 VMs each with 50GB disk. XCP-ng cente ris shoing me
238.7 GB used of 2.6 TB total (150 GB allocated)
I would not expected that! I would expected less than 100 GB used and allocated.
-
@Swen Could you list the VDIs of your linstor SR please?
-
@ronan-a sure, do you mean the output of xe vdi-list?
-
@Swen Yes, because this allocation value is indeed surprising.
-
[16:30 xcp-test1 ~]# xe vdi-list sr-uuid=77e5097a-c971-34e4-9506-7386a1e640b8 uuid ( RO) : 23876ae4-27b3-4f2f-8c8b-eb623b2dc2e4 name-label ( RW): base copy name-description ( RW): sr-uuid ( RO): 77e5097a-c971-34e4-9506-7386a1e640b8 virtual-size ( RO): 53687091200 sharable ( RO): false read-only ( RO): true uuid ( RO) : 3a2ab3da-5507-4c7e-aa07-497c65b18ec1 name-label ( RW): ubuntu20-linstor 0 name-description ( RW): Created by template provisioner sr-uuid ( RO): 77e5097a-c971-34e4-9506-7386a1e640b8 virtual-size ( RO): 53687091200 sharable ( RO): false read-only ( RO): false uuid ( RO) : 13a8fa52-9aa3-490b-86e0-eedb101128f9 name-label ( RW): ubuntu20-linstor 0 name-description ( RW): Created by template provisioner sr-uuid ( RO): 77e5097a-c971-34e4-9506-7386a1e640b8 virtual-size ( RO): 53687091200 sharable ( RO): false read-only ( RO): false
ok, the third vdi makes sense, cause I used storage-level fast disk clone to duplicate the VM. This explains the allocated value I guess, but not the used one.
Did you see my other question? Are you aware of any NIC constraints regarding throughput?
-
@ronan-a Wait a sec, maybe I found the root cause. I created a snapshot of a VM and deleted it. It created another base copy vdi and allocated space is now 200GB. MAybe I need to wait for the celanup job to take care of this?
-
@Swen The 150GiB are related to the base copy VDI yes.
Of course this value is just the maximum amount of data used because you use the thin LVM plugin. (It's not the real used data.)Regarding NIC, I didn't encounter any problems during my tests. The best way to measure the DRBD performance is to use
fio
directly in a VM and also on the host with a DRBD volume.The difference between local storage and DRBD is not a surprise:
- DRBD must sync the data between nodes
- DRBD is on top of LVM
-
@Swen
Writing zeros should result in nothing written with thin allocation (or dedup and compression): that's why I am hesitant to use /dev/zero as a source.Of course /dev/random could require to much of an overhead, depending on the quality and implementation which is why I like to use
fio
: a bit of initial effort to know and understand the tool, but much better control, especially when it comes to dealing with an OS that tries to be smart. -
@ronan-a did you use 10Gbit interfaces for linstor traffic? I am aware that there is a difference between local storage and DRBD, but if this difference is that high, linstor is not really interesting for high performance workloads. I need to be sure that the root cause it not related to my setup.
@ronan-a @abufrejoval which exact fio params are you using to test your environment and can you copy some numbers, so we can compare them?
-
We mostly use those displayed in this blog post: https://smcleod.net/tech/2016/04/29/benchmarking-io/
edit: depending on the storage, iodepth can be increased.
-
There is obviously tons of variations....
I've used this fio file a lot to quickly gain an understanding of how a bit of storage performs.
Basically it only uses a small 100MB file, but tells the OS to avoid buffering and then goes over that with a mix of reads and writes, mostly transitioning between block size, essentially going from super random to almost sequential in a single run.
It's helped me find issues with Gluster, identify network bandwidth issues or even find deteriorated RAIDs with a bad BBU. Creates the test file in the working directiory unless changed.
[global] filename=fio.file ioengine=libaio rw=randrw size=100m norandommap direct=1 iodepth=1 time_based runtime=10 [B512] bs=512 stonewall [B1k] bs=1k stonewall [B2k] bs=2k stonewall [b4k] bs=4k stonewall [b8k] bs=8k stonewall [b16k] bs=16k stonewall [b32k] bs=32k [b64k] bs=64k stonewall [b512k] bs=512k stonewall [b1m] bs=1m stonewall
Numbers: It should approach the network bandwidth towards the end (potentially divided by write amplification).
-
@ronan-a Hi,
I tested your branch and now the new added hosts to the pool are now attached to the XOSTOR. This is nice !
I have looked at the code, but I'm not sure if in the current state of your branch we can add a disk on the new host and update the replication ? I think not... but just to be sure.
-
@dumarjo
linstor resource-group modify --place-count=X
should be enough to update the replication. I'm not sure to add a command in the plugin now (but probably yes for XOA integration). -
@ronan-a said in XOSTOR hyperconvergence preview:
For some VMs that have built-in software replication/HA, like DBs, it might be prefered to have replication=1 set for the VDI.
We can authorize this behavior without having other SRs. It would suffice to pass a replication parameter for this particular VDI when it is created. So thank you for this feedback. I think we must implement this use case for the future.
@ronan-a Have anything been done regarding this feature? I scanned the thread, but I couldn't really find anything related to a new VDI option.
-
It might be done in the future, but that's not the priority for a v1
-
@olivierlambert
I just checked the sm repository, and it looks like it wouldn't be that complicated to add a new sm-config and pass it down to the volume creation. Do you accept PR/Contributions on that repository? We're really interested in this feature and I think I can take the time to write the code to handle this. -
The problem will be about to compute the available space if you have different replication number.
But in any cases, contributions are always welcome, we'll discuss details in PR.
-
I am trying this on pool with 5 hosts. Each host has a 4TB HDD installed that I am using for this.
Following the instructions here I download the installer and run it with
./install --disks /dev/sda --force
- this runs through and no errors are shown but right at the end it displays:Volume group "linstor_group" not found Cannot process volume group linstor_group Physical volume "/dev/sda" successfully created. Volume group "linstor_group" successfully created
But then checking the disk I don't see the expected partitions:
[09:12 XCPNG01 ~]# lsblk ... sda 8:0 0 3.7T 0 disk <nothing here> ...
Versions
[09:34 XCPNG01 ~]# uname -a Linux XCPNG01 4.19.0+1 #1 SMP Thu Jan 13 12:55:45 CET 2022 x86_64 x86_64 x86_64 GNU/Linux [09:28 XCPNG01 ~]# rpm -qa | grep -E "^(sm|xha)-.*linstor.*" sm-2.30.6-1.1.0.linstor.1.xcpng8.2.x86_64 xha-10.1.0-2.2.0.linstor.1.xcpng8.2.x86_64
I should mention that this disk was previously mounted as an SR but is no longer, and also was part of a glusterfs store but is no longer.
If I run
./install --disks /dev/sda --force
a second time it obviously has not much to do as it has installed everything but now I get a slightly different error:Package python-linstor-1.12.0-1.noarch already installed and latest version Nothing to do VG #PV #LV #SN Attr VSize VFree linstor_group 1 0 0 wz--n- <3.64t <3.64t Volume group "linstor_group" successfully removed Volume group "sda" not found Cannot process volume group sda Failed to execute vgremove properly.
What should I do to work out what the problem is?
-
Just reading through the
install
script, if thin provisioning is not used (i.e thick provisioning is) then the volume grouplinstor_group
will get created but no logical volume is created:if subprocess.call(['vgcreate', LINSTOR_GROUP] + disks): print('Failed to execute vgcreate properly.') return os.EX_SOFTWARE if thin and subprocess.call(['lvcreate', '-l', '100%FREE', '-T', '{}/thin_device'.format(LINSTOR_GROUP)]): print('Failed to create thin device properly.') return os.EX_SOFTWARE
So are the installation instructions incorrect? Step 3 where it states to check the config before proceeding it states to use
lsblk
to check that the LVM logical volumes are created - but it looks to me like this does not occur unless thin provisioning is used?I can see that the volume group has been created as you would expect by looking at the install script.
[08:41 XCPNG01 ~]# pvscan ... PV /dev/sda VG linstor_group lvm2 [<3.64 TiB / <3.64 TiB free] ... [08:35 XCPNG01 ~]# vgdisplay ... --- Volume group --- VG Name linstor_group System ID Format lvm2 Metadata Areas 1 Metadata Sequence No 1 VG Access read/write VG Status resizable MAX LV 0 Cur LV 0 Open LV 0 Max PV 0 Cur PV 1 Act PV 1 VG Size <3.64 TiB PE Size 4.00 MiB Total PE 953861 Alloc PE / Size 0 / 0 Free PE / Size 953861 / <3.64 TiB VG UUID uidJ13-juc7-2cm0-NnkV-wdhA-4fNm-HAWrgh
Am I misunderstanding the instructions somehow or missing something?
-
Pinging @ronan-a