XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Testing ZFS with XCP-ng

    Scheduled Pinned Locked Moved Development
    80 Posts 10 Posters 40.0k Views 3 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • borzelB Offline
      borzel XCP-ng Center Team
      last edited by borzel

      @nraynaud Is there any info how to build blktap2 myself to test with my homelab?
      I did find it here: https://xcp-ng.org/forum/topic/122/how-to-build-blktap-from-sources/3

      1 Reply Last reply Reply Quote 0
      • borzelB Offline
        borzel XCP-ng Center Team
        last edited by borzel

        I tested the updated version of blktap and build it like described in https://xcp-ng.org/forum/post/1677, but copying a vm within the SR fails 😞

        Async.VM.copy R:f286a572f8aa|xapi] Error in safe_clone_disks: Server_error(VDI_COPY_FAILED, [ End_of_file ])
        
        1 Reply Last reply Reply Quote 0
        • borzelB Offline
          borzel XCP-ng Center Team
          last edited by borzel

          the function safe_clone_disks is located in https://github.com/xapi-project/xen-api/blob/72a9a2d6826e9e39d30fab0d6420de6a0dcc0dc5/ocaml/xapi/xapi_vm_clone.ml#L139

          it calls clone_single_vdi in https://github.com/xapi-project/xen-api/blob/72a9a2d6826e9e39d30fab0d6420de6a0dcc0dc5/ocaml/xapi/xapi_vm_clone.ml#L116

          which calls Client.Async.VDI.copy from a XAPI-OCaml Module Client which I can not find 😞

          1 Reply Last reply Reply Quote 0
          • borzelB Offline
            borzel XCP-ng Center Team
            last edited by borzel

            found something in /var/log/SMlog

            Jul 21 19:52:40 xen SM: [17060] result: {'o_direct_reason': 'SR_NOT_SUPPORTED', 'params': '/dev/sm/backend
            /4c8bc619-98bd-9342-85fe-1ea4782c0cf2/8e901c80-0ead-4764-b3d9-b04b4376e4c4',
            'o_direct': True, 'xenstore_data': {'scsi/0x12/0x80': 'AIAAEjhlOTAxYzgwLTBlYWQtNDcgIA==', 'scsi/0x12/0x83': 
            'AIMAMQIBAC1YRU5TUkMgIDhlOTAxYzgwLTBlYWQtNDc2NC1iM2Q5LWIwNGI0Mzc2ZTRjNCA=', 'vdi-uuid': 
            '8e901c80-0ead-4764-b3d9-b04b4376e4c4', 'mem-pool': '4c8bc619-98bd-9342-85fe-1ea4782c0cf2'}}
            

            Note the 'o_direct': True in this log entry. But I created the SR with other-config:o_direct=false

            stormiS 1 Reply Last reply Reply Quote 0
            • stormiS Offline
              stormi Vates 🪐 XCP-ng Team @borzel
              last edited by

              @borzel I'll let @nraynaud have a say, but first, to be sure, you built it with

              xcp-build --build-local . --define 'xcp_ng_section extras'
              

              and the resulting RPM contains .extras in its name?

              1 Reply Last reply Reply Quote 0
              • borzelB Offline
                borzel XCP-ng Center Team
                last edited by borzel

                yes, I did 🙂 Checked it multiple times. Will check it again today to be 100% sure.

                I also saw the right logoutput from @nraynaud new function where he probes the open with O_DIRECT and retries with O_DSYNC.

                I have the "feel" that xapi or blktap copy's files in some situations without it's normal way. Maybe a special "file copy" without a open command?

                Can I attach a debugger to blktap while in operration? This would also be needed for deep support of the whole server.

                1 Reply Last reply Reply Quote 0
                • stormiS Offline
                  stormi Vates 🪐 XCP-ng Team
                  last edited by

                  You can probably attach gdb to a running blktap, yes. Install the blktap-debuginfo package that was produced when you built blktap to make debug symbols available.

                  1 Reply Last reply Reply Quote 0
                  • nraynaudN Offline
                    nraynaud XCP-ng Team
                    last edited by

                    @borzel there is something fishy around here https://github.com/xapi-project/sm/blob/master/drivers/blktap2.py#L994 I am still unsure what to do.

                    I have not removed O_DIRECT everywhere in blktap, I was expecting to remove the remainder by the o_direct flag in python, but I guess I was wrong. we might patch the python.

                    1 Reply Last reply Reply Quote 0
                    • borzelB Offline
                      borzel XCP-ng Center Team
                      last edited by

                      Maybe we can create a new SR type "zfs" and copy the file SR implementation from https://github.com/xapi-project/sm/blob/master/drivers/FileSR.py
                      This would us later allow to create a more deeper integration of ZFS.

                      1 Reply Last reply Reply Quote 0
                      • borzelB Offline
                        borzel XCP-ng Center Team
                        last edited by borzel

                        Success! I changed in /opt/xensource/sm/blktap2.py

                         elif not ((self.target.vdi.sr.handles("nfs") or self.target.vdi.sr.handles("ext") or self.target.vdi.sr.handles("smb"))):
                        

                        to

                         elif not ((self.target.vdi.sr.handles("file")) or (self.target.vdi.sr.handles("nfs") or self.target.vdi.sr.handles("ext") or self.target.vdi.sr.handles("smb"))):
                        

                        Than I deleted /opt/xensource/sm/blktap2.pyc and /opt/xensource/sm/blktap2.pyo so that the *.py file is used.

                        Now I can copy from other SR types to my ZFS SR.

                        But copy within my ZFS SR does not work...

                        1 Reply Last reply Reply Quote 0
                        • borzelB Offline
                          borzel XCP-ng Center Team
                          last edited by

                          some info about the o_direct flag: https://xenserver.org/blog/entry/read-caching.html

                          1 Reply Last reply Reply Quote 0
                          • borzelB Offline
                            borzel XCP-ng Center Team
                            last edited by borzel

                            After reading some, I think we can not get it fully working in short time. Some of the tools might not work without o_direct support:

                            If the underlying file-system does not support O_DIRECT, utilities (e.g, vhd-util) may fail with error code 22 (EINVAL). Similarly, Xen may fail with a message as follows:
                            TapdiskException: ('create', '-avhd:/home/cklein/vms/vm-rubis-0/root.vhd') failed (5632 )

                            https://github.com/xapi-project/blktap/blob/master/README


                            Today there is just one fully working local ZFS configuration:

                            • create your ZFS-Pool: zpool create ...
                            • create a ZVOL (aka blockdevice): zfs create -V 50G pool/my_local_sr
                            • create an EXT3 based SR: xe sr-create host-uuid=<UUID_of_your_host> type=ext shared=false name-label=<Name_of_my_SR> device-config:device=/dev/zvol/<pool-name>/<zvol-name>

                            It's not optimal, but working.

                            1 Reply Last reply Reply Quote 0
                            • nraynaudN Offline
                              nraynaud XCP-ng Team
                              last edited by

                              I am working on the issue right now. I am trying to exactly nail the problem.
                              there are 2 cases:

                              • xe vdi-copy from another SR to a ZFS SR doesn't work
                              • xe vdi-copy on the same SR doesn't work.

                              I that a complete assessment of the issues you found?

                              A cursory test seems to show that ssh ${XCP_HOST_UNDER_TEST} sed -i.bak 's/# unbuffered = true/unbuffered = false/' /etc/sparse_dd.conf solves the issue of intra-ZFS copies, but I am still confirming that I have not done anything else on my test box.

                              Thanks,
                              Nicolas.

                              1 Reply Last reply Reply Quote 0
                              • borzelB Offline
                                borzel XCP-ng Center Team
                                last edited by borzel

                                @nraynaud said in Testing ZFS with XCP-ng:

                                xe vdi-copy from another SR to a ZFS SR doesn't work
                                xe vdi-copy on the same SR doesn't work.

                                Yes 🙂

                                Additional to that the copy from ZFS SR to another SR is not working.

                                1 Reply Last reply Reply Quote 0
                                • nraynaudN Offline
                                  nraynaud XCP-ng Team
                                  last edited by

                                  ok, I really think changing /etc/sparse_dd.conf is the right path.

                                  #!/usr/bin/env bash
                                  
                                  # HOW TO create the passthrough: xe sr-create name-label="sda passthrough" name-description="Block devices" type=udev content-type=disk device-config:location=/dev/sda host-uuid=77b3f6ad-020b-4e48-b090-74b2a26c4f69
                                  
                                  set -ex
                                  
                                  MASTER_HOST=root@192.168.100.1
                                  PASSTHROUGH_VDI=a74d267e-bb14-4732-bd80-b9c445199e8a
                                  
                                  SNAPSHOT_UUID=19d3758e-eb21-f237-b8f7-6e2f638cc8e0
                                  VM_HOST_UNDER_TEST_UUID=13ec74c2-9b57-a327-962f-1ebd9702eec4
                                  XCP_HOST_UNDER_TEST_UUID=05c61e28-11cf-4131-b645-a0be7637c044
                                  XCP_HOST_UNDER_TEST_IP=192.168.100.151
                                  XCP_HOST_UNDER_TEST=root@${XCP_HOST_UNDER_TEST_IP}
                                  
                                  INCEPTION_VM_UUID=a7e37541-fb9a-4392-6b54-60cf7ce3d08a
                                  INCEPTION_VM_IP=192.168.100.32
                                  INCEPTION_VM=root@${INCEPTION_VM_IP}
                                  
                                  ssh ${MASTER_HOST} xe snapshot-revert snapshot-uuid=${SNAPSHOT_UUID}
                                  NEW_VBD=`ssh ${MASTER_HOST} xe vbd-create device=1 type=Disk mode=RW vm-uuid=${VM_HOST_UNDER_TEST_UUID} vdi-uuid=${PASSTHROUGH_VDI}`
                                  ssh ${MASTER_HOST} xe vm-start vm=${VM_HOST_UNDER_TEST_UUID}
                                  until ping -c1 ${XCP_HOST_UNDER_TEST_IP} &>/dev/null; do :; done
                                  sleep 20
                                  
                                  # try EXT3
                                  ssh ${XCP_HOST_UNDER_TEST} 'mkfs.ext3 /dev/sdb2 && echo /dev/sdb2 /mnt/ext3 ext3 >>/etc/fstab && mkdir -p /mnt/ext3 && mount /dev/sdb2 && df'
                                  SR_EXT3_UUID=`ssh ${XCP_HOST_UNDER_TEST} "xe sr-create host-uuid=${XCP_HOST_UNDER_TEST_UUID} name-label=test-ext3-sr type=file other-config:o_direct=false device-config:location=/mnt/ext3/test-ext3-sr"`
                                  TEST_EXT3_VDI=`ssh ${XCP_HOST_UNDER_TEST} xe vdi-create sr-uuid=${SR_EXT3_UUID} name-label=test-ext3-vdi virtual-size=214748364800`
                                  TEST_VBD=`ssh ${XCP_HOST_UNDER_TEST} xe vbd-create device=1 type=Disk mode=RW vm-uuid=${INCEPTION_VM_UUID} vdi-uuid=${TEST_EXT3_VDI}`
                                  
                                  
                                  ssh ${XCP_HOST_UNDER_TEST} reboot || true
                                  sleep 20
                                  until ping -c1 ${XCP_HOST_UNDER_TEST_IP} &>/dev/null; do :; done
                                  sleep 20
                                  
                                  ssh ${XCP_HOST_UNDER_TEST} xe vm-start vm=${INCEPTION_VM_UUID} on=${XCP_HOST_UNDER_TEST_UUID}
                                  sleep 2
                                  until ping -c1 ${INCEPTION_VM_IP} &>/dev/null; do :; done
                                  sleep 20
                                  ssh ${INCEPTION_VM} echo FROM BENCH
                                  ssh ${INCEPTION_VM} 'apk add gcc zlib-dev libaio libaio-dev make linux-headers git binutils musl-dev; git clone https://github.com/axboe/fio fio; cd fio; ./configure && make&& make install'
                                  ssh ${INCEPTION_VM} 'mkfs.ext3 /dev/xvdb && mount /dev/xvdb /mnt;df'
                                  ssh ${INCEPTION_VM} 'cd /mnt;sync;/usr/local/bin/fio --name=randwrite --ioengine=libaio --iodepth=1 --rw=write --bs=4k --direct=1 --size=512M --numjobs=2 --runtime=30 --group_reporting' > ext3_write_result
                                  ssh ${INCEPTION_VM} 'cd /mnt;sync;/usr/local/bin/fio --name=randwrite --ioengine=libaio --iodepth=1 --rw=write --bs=4k --direct=1 --size=512M --numjobs=2 --runtime=30 --group_reporting' >> ext3_write_result
                                  ssh ${INCEPTION_VM} 'cd /mnt;sync;/usr/local/bin/fio --name=randwrite --ioengine=libaio --iodepth=1 --rw=read --bs=4k --direct=1 --size=512M --numjobs=2 --runtime=30 --group_reporting' > ext3_read_result
                                  ssh ${INCEPTION_VM} 'cd /mnt;sync;/usr/local/bin/fio --name=randwrite --ioengine=libaio --iodepth=1 --rw=read --bs=4k --direct=1 --size=512M --numjobs=2 --runtime=30 --group_reporting' >> ext3_read_result
                                  ssh ${XCP_HOST_UNDER_TEST} xe vm-shutdown uuid=${INCEPTION_VM_UUID}
                                  
                                  # try ZFS
                                  # install binaries that don't use O_DIRECT
                                  rsync -r zfs ${XCP_HOST_UNDER_TEST}:
                                  scp /Users/nraynaud/dev/xenserver-build-env/blktap-3.5.0-1.12test.x86_64.rpm ${XCP_HOST_UNDER_TEST}:
                                  ssh ${XCP_HOST_UNDER_TEST} yum remove -y blktap
                                  ssh ${XCP_HOST_UNDER_TEST} yum install -y blktap-3.5.0-1.12test.x86_64.rpm
                                  ssh ${XCP_HOST_UNDER_TEST} yum install -y zfs/*.rpm
                                  ssh ${XCP_HOST_UNDER_TEST} depmod -a
                                  ssh ${XCP_HOST_UNDER_TEST} modprobe zfs
                                  ssh ${XCP_HOST_UNDER_TEST} zpool create -f -m /mnt/zfs tank /dev/sdb1
                                  ssh ${XCP_HOST_UNDER_TEST} zfs set sync=disabled tank
                                  ssh ${XCP_HOST_UNDER_TEST} zfs set compression=lz4 tank
                                  ssh ${XCP_HOST_UNDER_TEST} zfs list
                                  
                                  
                                  SR_ZFS_UUID=`ssh ${XCP_HOST_UNDER_TEST} "xe sr-create host-uuid=${XCP_HOST_UNDER_TEST_UUID} name-label=test-zfs-sr type=file other-config:o_direct=false device-config:location=/mnt/zfs/test-zfs-sr"`
                                  TEST_ZFS_VDI=`ssh ${XCP_HOST_UNDER_TEST} xe vdi-create sr-uuid=${SR_ZFS_UUID} name-label=test-zfs-vdi virtual-size=214748364800`
                                  # this line avoids O_DIRECT in reads
                                  ssh ${XCP_HOST_UNDER_TEST} "sed -i.bak 's/# unbuffered = true/unbuffered = false/' /etc/sparse_dd.conf"
                                  # try various clone situations
                                  ssh ${XCP_HOST_UNDER_TEST} xe vdi-copy uuid=${TEST_ZFS_VDI} sr-uuid=${SR_ZFS_UUID}
                                  ssh ${XCP_HOST_UNDER_TEST} xe vdi-copy uuid=${TEST_ZFS_VDI} sr-uuid=${SR_EXT3_UUID}
                                  ssh ${XCP_HOST_UNDER_TEST} xe vdi-copy uuid=${TEST_EXT3_VDI} sr-uuid=${SR_ZFS_UUID}
                                  

                                  this script complete to the end without error.

                                  1 Reply Last reply Reply Quote 0
                                  • nraynaudN Offline
                                    nraynaud XCP-ng Team
                                    last edited by

                                    If other people can reproduce my results, I propose to directly change the parameter in the XCP-ng distribution RPM.

                                    borzelB 1 Reply Last reply Reply Quote 0
                                    • olivierlambertO Offline
                                      olivierlambert Vates 🪐 Co-Founder CEO
                                      last edited by

                                      Your package is experimental, so feel free to add the modification inside it 🙂

                                      1 Reply Last reply Reply Quote 0
                                      • borzelB Offline
                                        borzel XCP-ng Center Team @nraynaud
                                        last edited by borzel

                                        @nraynaud said in Testing ZFS with XCP-ng:

                                        If other people can reproduce my results

                                        with the change in /etc/sparse_dd.conf I can copy my VMs from:

                                        • EXT3 -> ZFS
                                        • ZFS -> ZFS
                                        • ZFS -> EXT3

                                        Yeha! 🙂

                                        Thanks for your work!

                                        By the way, my XCP-ng replication host at work is working just fine with ZFS-SR. All stable like ZFS should be.

                                        1 Reply Last reply Reply Quote 1
                                        • olivierlambertO Offline
                                          olivierlambert Vates 🪐 Co-Founder CEO
                                          last edited by

                                          Yay!! Thanks for testing 🙂

                                          1 Reply Last reply Reply Quote 0
                                          • E Offline
                                            eexodus
                                            last edited by eexodus

                                            I have a clean install of 7.5 and I'm following the guide on the wiki but can't install zfs-test or enable the zfs module:

                                            [root@xcp-ng-endlqfgb ~]# yum install --enablerepo="xcp-ng-extras" zfs-test
                                            Loaded plugins: fastestmirror
                                            Loading mirror speeds from cached hostfile
                                            Resolving Dependencies
                                            --> Running transaction check
                                            ---> Package zfs-test.x86_64 0:0.7.9-1.el7.centos will be installed
                                            --> Processing Dependency: lsscsi for package: zfs-test-0.7.9-1.el7.centos.x86_64
                                            --> Processing Dependency: ksh for package: zfs-test-0.7.9-1.el7.centos.x86_64
                                            --> Processing Dependency: fio for package: zfs-test-0.7.9-1.el7.centos.x86_64
                                            --> Processing Dependency: rng-tools for package: zfs-test-0.7.9-1.el7.centos.x86_64
                                            --> Finished Dependency Resolution
                                            Error: Package: zfs-test-0.7.9-1.el7.centos.x86_64 (xcp-ng-extras)
                                                       Requires: fio
                                            Error: Package: zfs-test-0.7.9-1.el7.centos.x86_64 (xcp-ng-extras)
                                                       Requires: lsscsi
                                            Error: Package: zfs-test-0.7.9-1.el7.centos.x86_64 (xcp-ng-extras)
                                                       Requires: rng-tools
                                            Error: Package: zfs-test-0.7.9-1.el7.centos.x86_64 (xcp-ng-extras)
                                                       Requires: ksh
                                             You could try using --skip-broken to work around the problem
                                             You could try running: rpm -Va --nofiles --nodigest
                                            [root@xcp-ng-endlqfgb ~]# zpool create tank /dev/sdb
                                            The ZFS modules are not loaded.
                                            Try running '/sbin/modprobe zfs' as root to load them.
                                            [root@xcp-ng-endlqfgb ~]# /sbin/modprobe zfs
                                            modprobe: FATAL: Module zfs not found.
                                            [root@xcp-ng-endlqfgb ~]#
                                            
                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post