XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    CEPH FS Storage Driver

    Scheduled Pinned Locked Moved Development
    86 Posts 10 Posters 34.4k Views 8 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • R Offline
      r1 XCP-ng Team
      last edited by

      @jmccoy555 Can you try the latest patch?

      Before applying it restore to normal state
      # yum reinstall sm

      # cd /
      # wget "https://gist.githubusercontent.com/rushikeshjadhav/ea8a6e15c3b5e7f6e61fe0cb873173d2/raw/dabe5c915b30a0efc932cab169ebe94c17d8c1ca/ceph-8.1.patch"
      # patch -p0 < ceph-8.1.patch
      
      # yum install centos-release-ceph-nautilus --enablerepo=extras
      # yum install ceph-common
      

      Note: Keep secret in /etc/ceph/admin.secret with permission 600

      To handle the NFS port conflict, specifying port is mandatory e.g. device-config:serverport=6789

      Ceph Example:
      # xe sr-create type=nfs device-config:server=10.10.10.10,10.10.10.26 device-config:serverpath=/ device-config:serverport=6789 device-config:options=name=admin,secretfile=/etc/ceph/admin.secret name-label=Ceph

      NFS Example:
      # xe sr-create type=nfs device-config:server=10.10.10.5 device-config:serverpath=/root/nfs name-label=NFS

      1 Reply Last reply Reply Quote 0
      • J Offline
        jmccoy555
        last edited by

        @r1 Tried from my pool....

        [14:38 xcp-ng-bad-1 /]# xe sr-create type=nfs device-config:server=10.10.1.141,10.10.1.142,10.10.1.143 device-config:serverpath=/xcp device-config:serverport=6789 device-config:options=name=xcp,secretfile=/etc/ceph/xcp.secret name-label=CephFS host-uuid=c6977e4e-972f-4dcc-a71f-42120b51eacf
        Error code: SR_BACKEND_FAILURE_1200
        Error parameters: , not all arguments converted during string formatting,
        
        
        Apr  6 14:38:26 xcp-ng-bad-1 SM: [8906] lock: opening lock file /var/lock/sm/a6e19bdc-0831-4d87-087d-86fca8cfb6fd/sr
        Apr  6 14:38:26 xcp-ng-bad-1 SM: [8906] lock: acquired /var/lock/sm/a6e19bdc-0831-4d87-087d-86fca8cfb6fd/sr
        Apr  6 14:38:26 xcp-ng-bad-1 SM: [8906] sr_create {'sr_uuid': 'a6e19bdc-0831-4d87-087d-86fca8cfb6fd', 'subtask_of': 'DummyRef:|572cd61e-b30c-48cb-934f-d597218facc0|SR.create', 'args': ['0'], 'host_ref': 'OpaqueRef:5527aabc-8bd0-416e-88bf-b6a0cb2b72b1', 'session_ref': 'OpaqueRef:e83f61b2-b546-4f22-b14f-31b5d5e7ae4f', 'device_config': {'server': '10.10.1.141,10.10.1.142,10.10.1.143', 'serverpath': '/xcp', 'SRmaster': 'true', 'serverport': '6789', 'options': 'name=xcp,secretfile=/etc/ceph/xcp.secret'}, 'command': 'sr_create', 'sr_ref': 'OpaqueRef:f77a27d8-d427-4c68-ab26-059c1c576c30'}
        Apr  6 14:38:26 xcp-ng-bad-1 SM: [8906] _testHost: Testing host/port: 10.10.1.141,6789
        Apr  6 14:38:26 xcp-ng-bad-1 SM: [8906] ['/usr/sbin/rpcinfo', '-p', '10.10.1.141']
        Apr  6 14:38:26 xcp-ng-bad-1 SM: [8906] FAILED in util.pread: (rc 1) stdout: '', stderr: 'rpcinfo: can't contact portmapper: RPC: Remote system error - Connection refused
        Apr  6 14:38:26 xcp-ng-bad-1 SM: [8906] '
        Apr  6 14:38:26 xcp-ng-bad-1 SM: [8906] Unable to obtain list of valid nfs versions
        Apr  6 14:38:26 xcp-ng-bad-1 SM: [8906] lock: released /var/lock/sm/a6e19bdc-0831-4d87-087d-86fca8cfb6fd/sr
        Apr  6 14:38:26 xcp-ng-bad-1 SM: [8906] ***** generic exception: sr_create: EXCEPTION <type 'exceptions.TypeError'>, not all arguments converted during string formatting
        Apr  6 14:38:26 xcp-ng-bad-1 SM: [8906]   File "/opt/xensource/sm/SRCommand.py", line 110, in run
        Apr  6 14:38:26 xcp-ng-bad-1 SM: [8906]     return self._run_locked(sr)
        Apr  6 14:38:26 xcp-ng-bad-1 SM: [8906]   File "/opt/xensource/sm/SRCommand.py", line 159, in _run_locked
        Apr  6 14:38:26 xcp-ng-bad-1 SM: [8906]     rv = self._run(sr, target)
        Apr  6 14:38:26 xcp-ng-bad-1 SM: [8906]   File "/opt/xensource/sm/SRCommand.py", line 323, in _run
        Apr  6 14:38:26 xcp-ng-bad-1 SM: [8906]     return sr.create(self.params['sr_uuid'], long(self.params['args'][0]))
        Apr  6 14:38:26 xcp-ng-bad-1 SM: [8906]   File "/opt/xensource/sm/NFSSR", line 222, in create
        Apr  6 14:38:26 xcp-ng-bad-1 SM: [8906]     raise exn
        Apr  6 14:38:26 xcp-ng-bad-1 SM: [8906]
        Apr  6 14:38:26 xcp-ng-bad-1 SM: [8906] ***** NFS VHD: EXCEPTION <type 'exceptions.TypeError'>, not all arguments converted during string formatting
        Apr  6 14:38:26 xcp-ng-bad-1 SM: [8906]   File "/opt/xensource/sm/SRCommand.py", line 372, in run
        Apr  6 14:38:26 xcp-ng-bad-1 SM: [8906]     ret = cmd.run(sr)
        Apr  6 14:38:26 xcp-ng-bad-1 SM: [8906]   File "/opt/xensource/sm/SRCommand.py", line 110, in run
        Apr  6 14:38:26 xcp-ng-bad-1 SM: [8906]     return self._run_locked(sr)
        Apr  6 14:38:26 xcp-ng-bad-1 SM: [8906]   File "/opt/xensource/sm/SRCommand.py", line 159, in _run_locked
        Apr  6 14:38:26 xcp-ng-bad-1 SM: [8906]     rv = self._run(sr, target)
        Apr  6 14:38:26 xcp-ng-bad-1 SM: [8906]   File "/opt/xensource/sm/SRCommand.py", line 323, in _run
        Apr  6 14:38:26 xcp-ng-bad-1 SM: [8906]     return sr.create(self.params['sr_uuid'], long(self.params['args'][0]))
        Apr  6 14:38:26 xcp-ng-bad-1 SM: [8906]   File "/opt/xensource/sm/NFSSR", line 222, in create
        Apr  6 14:38:26 xcp-ng-bad-1 SM: [8906]     raise exn
        Apr  6 14:38:26 xcp-ng-bad-1 SM: [8906]
        

        Will try from my standalone host with a reboot later to see if the NFS reconnect issue has gone.

        1 Reply Last reply Reply Quote 0
        • R Offline
          r1 XCP-ng Team
          last edited by

          @jmccoy555 I have rpcbind service running. Can you check on your ceph node?

          1 Reply Last reply Reply Quote 0
          • J Offline
            jmccoy555
            last edited by

            @r1 yep, that's it, rpcbind is needed. I have a very minimal Debian 10 VM hosting my Ceph (dockers) as is now the way with Octopus.

            Also had to swap the host-uuid= with shared=true for it to connect to all hosts within the pool (might be useful for the notes).

            Will test and also check that everything is good after a reboot and report back.

            1 Reply Last reply Reply Quote 0
            • J Offline
              jmccoy555
              last edited by jmccoy555

              Just to report back..... So far so good. I've moved over a few VDIs and not had any problems.

              I've rebooted hosts and Ceph nodes and all is good.

              NFS is also all good now.

              Hope this gets merges soon so I don't have to worry about updates 😄

              On a side note, I've also set up two pools, one of SSDs and one of HDDs using File Layouts to assign different directories (VM SRs) to different pools.

              1 Reply Last reply Reply Quote 0
              • R Offline
                r1 XCP-ng Team
                last edited by

                @jmccoy555 glad to know. I don't have much knowledge on File Layouts but that looks good.

                NFS edits won't be merged as that was just for POC. Working on a dedicated CephFS SR driver which hopefully won't be impacted due to sm or other upgrades. Keep watching this space.

                1 Reply Last reply Reply Quote 1
                • olivierlambertO Offline
                  olivierlambert Vates 🪐 Co-Founder CEO
                  last edited by

                  We can write a simple "driver" like we did for Gluster 🙂

                  U 1 Reply Last reply Reply Quote 2
                  • U Offline
                    usbalbin @olivierlambert
                    last edited by

                    @olivierlambert With (experimental) CephFS driver added in 8.2.0. Reading the documentation

                    WARNING
                    
                    This way of using Ceph requires installing ceph-common inside dom0 from outside the official XCP-ng repositories. It is reported to be working by some users, but isn't recommended officially (see Additional packages). You will also need to be **careful about system updates and upgrades.**
                    

                    are there any plans to put ceph-common into the official XCP-ng repositories to make updates less scary? 🙂

                    I have been testing this for almost 8 months now. First with only one or two VMs, now with about 8-10 smaller VMs. The ceph cluster itself is running as 3 VMs(themself not stored on CephFS) with SATA-controllers passedthoughed on 3 different hosts.

                    This has been working great with the exception of in situations when the XCP-ng hosts are unable to reach the ceph cluster. At one time the ceph nodes had crashed(my fault) but I was unable to restart them because all VM operations were blocked taking forever without ever suceeding eventhough the ceph nodes them selfes are not stored on the inaccessable SR. To me it seems the XCP-ng hosts are endlessly trying to connect never timing out which makes them non-responsive.

                    1 Reply Last reply Reply Quote 0
                    • olivierlambertO Offline
                      olivierlambert Vates 🪐 Co-Founder CEO
                      last edited by

                      Hi,

                      Short term: no. Longer term when we have SMAPIv3: very likely, yes, at least as a community driver.

                      What about perfs? Can you describe more your setup and config?

                      1 Reply Last reply Reply Quote 0
                      • S Offline
                        scboley
                        last edited by

                        I'm about to deploy the latest ceph on 45drives hardware and will use 8.2 with finally a decent amount of network backbone to start building a new virtual world. I've been using nfs over cephfs on single gigabit public and single gigabit privates and it performs ok for what we do but cannot do any failover or moving of virtuals live. This should alleviate those issues as well as give me lots more options for snapshots and recovery.

                        So in latest 8.2 patches and updates what do I need to do other than install ceph-common? Will the ceph repository show up in the xcp-ng center or orchestra?

                        I've had power outages and UPS failures and this stuff just self heals and the only issue has to be with mounting the cephfs after boot and then restarting nfs to recover the nfs repositories and it just comes up. Its scalable and way less trouble to deal with than fiber sans or iscsi.

                        J 1 Reply Last reply Reply Quote 0
                        • J Offline
                          jmccoy555 @scboley
                          last edited by

                          @scboley https://xcp-ng.org/docs/storage.html#cephfs

                          Once you do the manual stuff it will show up like any other SR in Xen Orchestra etc.

                          S 1 Reply Last reply Reply Quote 0
                          • S Offline
                            scboley @jmccoy555
                            last edited by

                            @jmccoy555 should I go ahead and update the 8.2 to latest patches first before doing this? I have yet to run a single patch on xcp-ng over many years and is it straightforward?

                            J 1 Reply Last reply Reply Quote 0
                            • J Offline
                              jmccoy555 @scboley
                              last edited by

                              @scboley I would assume so, but I can't say yes. I don't think it was available before 8.2 without following the above.

                              S 1 Reply Last reply Reply Quote 0
                              • S Offline
                                scboley @jmccoy555
                                last edited by

                                @jmccoy555 I'm talking about 8.2.1 and 8.2.2 and so forth. Is that a simple yum update on the system? I've just left it default version and never updated I was on 7.6 for a long time and just took it all to 8.2 with one straggler xenserver 6.5 still in production. I've loved the stability I've had with xcp-ng not even messing with it at all.

                                M 1 Reply Last reply Reply Quote 0
                                • M Offline
                                  msgerbs @scboley
                                  last edited by

                                  @scboley Yes, it's mostly just a matter of doing yum update: https://xcp-ng.org/docs/updates.html

                                  1 Reply Last reply Reply Quote 0
                                  • S Offline
                                    scboley
                                    last edited by

                                    Ok I see the package listed in the documentation is still nautilus has that been updated to any newer ceph versions as of yet? @olivierlambert

                                    1 Reply Last reply Reply Quote 0
                                    • olivierlambertO Offline
                                      olivierlambert Vates 🪐 Co-Founder CEO
                                      last edited by

                                      I don't think we updated anything on that aspect, since Ceph isn't a "main" supported SR 🙂

                                      S 1 Reply Last reply Reply Quote 0
                                      • S Offline
                                        scboley @olivierlambert
                                        last edited by

                                        @olivierlambert what are the plans to elevate it? I have a feeling its really starting to gain traction in the storage world.

                                        1 Reply Last reply Reply Quote 0
                                        • olivierlambertO Offline
                                          olivierlambert Vates 🪐 Co-Founder CEO
                                          last edited by

                                          Very likely when the platform will be more modern (upgrading the kernel, platform and using SMAPIv3)

                                          S 1 Reply Last reply Reply Quote 0
                                          • S Offline
                                            scboley @olivierlambert
                                            last edited by

                                            @olivierlambert Ok I see even with 8.x you are still based on centos 7 when is it going up to 8 and I'd assume rocky would be the choice since the redhat streaming snafu cough cough.

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post