XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    XOSTOR hyperconvergence preview

    Scheduled Pinned Locked Moved XOSTOR
    446 Posts 47 Posters 479.1k Views 48 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • L Offline
      limezest
      last edited by limezest

      So, controller failover works. I used instructions here to test drbd-reactor failover: https://linbit.com/blog/drbd-reactor-promoter/

      I'm seeing an error in linstor error-reports list that has to do with how linstor queries free space on thin provisioned LVM storage. It traces back to this ticket. https://github.com/LINBIT/linstor-server/issues/80

      ERROR REPORT 65558791-33400-000000
      
      ============================================================
      
      Application:                        LINBIT® LINSTOR
      Module:                             Satellite
      Version:                            1.24.2
      Build ID:                           adb19ca96a07039401023410c1ea116f09929295
      Build time:                         2023-08-30T05:15:08+00:00
      Error time:                         2023-11-15 22:08:11
      Node:                               node0
      
      ============================================================
      
      Reported error:
      ===============
      
      Description:
          Expected 3 columns, but got 2
      Cause:
          Failed to parse line:   thin_device;23044370202624;
      Additional information:
          External command: vgs --config devices { filter=['a|/dev/sdn|','a|/dev/sdk|','a|/dev/sdj|','a|/dev/sdm|','a|/dev/sdl|','a|/dev/sdg|','a|/dev/sdf|','a|/dev/sdi|','a|/dev/sdh|','a|/dev/sdc|','a|/dev/sde|','a|/dev/sdd|','r|.*|'] } -o lv_name,lv_size,data_percent --units b --separator ; --noheadings --nosuffix linstor_group/thin_device
      
      Category:                           LinStorException
      Class name:                         StorageException
      Class canonical name:               com.linbit.linstor.storage.StorageException
      Generated at:                       Method 'getThinFreeSize', Source file 'LvmUtils.java', Line #399
      
      Error message:                      Unable to parse free thin sizes
      
      ErrorContext:   Description: Expected 3 columns, but got 2
        Cause:       Failed to parse line:   thin_device;23044370202624;
        Details:     External command: vgs --config devices { filter=['a|/dev/sdn|','a|/dev/sdk|','a|/dev/sdj|','a|/dev/sdm|','a|/dev/sdl|','a|/dev/sdg|','a|/dev/sdf|','a|/dev/sdi|','a|/dev/sdh|','a|/dev/sdc|','a|/dev/sde|','a|/dev/sdd|','r|.*|'] } -o lv_name,lv_size,data_percent --units b --separator ; --noheadings --nosuffix linstor_group/thin_device
      
      
      Call backtrace:
      
          Method                                   Native Class:Line number
          getThinFreeSize                          N      com.linbit.linstor.layer.storage.lvm.utils.LvmUtils:399
          getSpaceInfo                             N      com.linbit.linstor.layer.storage.lvm.LvmThinProvider:406
          getStoragePoolSpaceInfo                  N      com.linbit.linstor.layer.storage.StorageLayer:441
          getSpaceInfo                             N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:1116
          getSpaceInfo                             N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:1816
          getStoragePoolSpaceInfo                  N      com.linbit.linstor.core.apicallhandler.StltApiCallHandlerUtils:325
          applyChanges                             N      com.linbit.linstor.core.apicallhandler.StltStorPoolApiCallHandler:274
          applyFullSync                            N      com.linbit.linstor.core.apicallhandler.StltApiCallHandler:330
          execute                                  N      com.linbit.linstor.api.protobuf.FullSync:113
          executeNonReactive                       N      com.linbit.linstor.proto.CommonMessageProcessor:534
          lambda$execute$14                        N      com.linbit.linstor.proto.CommonMessageProcessor:509
          doInScope                                N      com.linbit.linstor.core.apicallhandler.ScopeRunner:149
          lambda$fluxInScope$0                     N      com.linbit.linstor.core.apicallhandler.ScopeRunner:76
          call                                     N      reactor.core.publisher.MonoCallable:72
          trySubscribeScalarMap                    N      reactor.core.publisher.FluxFlatMap:127
          subscribeOrReturn                        N      reactor.core.publisher.MonoFlatMapMany:49
          subscribe                                N      reactor.core.publisher.Flux:8759
          onNext                                   N      reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:195
          request                                  N      reactor.core.publisher.Operators$ScalarSubscription:2545
          onSubscribe                              N      reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:141
          subscribe                                N      reactor.core.publisher.MonoJust:55
          subscribe                                N      reactor.core.publisher.MonoDeferContextual:55
          subscribe                                N      reactor.core.publisher.Flux:8773
          onNext                                   N      reactor.core.publisher.FluxFlatMap$FlatMapMain:427
          slowPath                                 N      reactor.core.publisher.FluxArray$ArraySubscription:127
          request                                  N      reactor.core.publisher.FluxArray$ArraySubscription:100
          onSubscribe                              N      reactor.core.publisher.FluxFlatMap$FlatMapMain:371
          subscribe                                N      reactor.core.publisher.FluxMerge:70
          subscribe                                N      reactor.core.publisher.Flux:8773
          onComplete                               N      reactor.core.publisher.FluxConcatArray$ConcatArraySubscriber:258
          subscribe                                N      reactor.core.publisher.FluxConcatArray:78
          subscribe                                N      reactor.core.publisher.InternalFluxOperator:62
          subscribe                                N      reactor.core.publisher.FluxDefer:54
          subscribe                                N      reactor.core.publisher.Flux:8773
          onNext                                   N      reactor.core.publisher.FluxFlatMap$FlatMapMain:427
          drainAsync                               N      reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:453
          drain                                    N      reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:724
          onNext                                   N      reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:256
          drainFused                               N      reactor.core.publisher.SinkManyUnicast:319
          drain                                    N      reactor.core.publisher.SinkManyUnicast:362
          tryEmitNext                              N      reactor.core.publisher.SinkManyUnicast:237
          tryEmitNext                              N      reactor.core.publisher.SinkManySerialized:100
          processInOrder                           N      com.linbit.linstor.netcom.TcpConnectorPeer:392
          doProcessMessage                         N      com.linbit.linstor.proto.CommonMessageProcessor:227
          lambda$processMessage$2                  N      com.linbit.linstor.proto.CommonMessageProcessor:164
          onNext                                   N      reactor.core.publisher.FluxPeek$PeekSubscriber:185
          runAsync                                 N      reactor.core.publisher.FluxPublishOn$PublishOnSubscriber:440
          run                                      N      reactor.core.publisher.FluxPublishOn$PublishOnSubscriber:527
          call                                     N      reactor.core.scheduler.WorkerTask:84
          call                                     N      reactor.core.scheduler.WorkerTask:37
          run                                      N      java.util.concurrent.FutureTask:264
          run                                      N      java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask:304
          runWorker                                N      java.util.concurrent.ThreadPoolExecutor:1128
          run                                      N      java.util.concurrent.ThreadPoolExecutor$Worker:628
          run                                      N      java.lang.Thread:829
      
      
      END OF ERROR REPORT.
      
      

      I think the vgs query is improperly formatted for this version of device-mapper-persistent-data

      [12:31 node0 ~]# vgs -o lv_name,lv_size,data_percent --units b --noheadings --separator ;
      vgs: option '--separator' requires an argument
        Error during parsing of command line.
      

      but it works if formatted like this:

      [12:32 node0 ~]# vgs -o lv_name,lv_size,data_percent --units b --noheadings --separator=";"
        MGT;4194304B;
        VHD-d959f7a9-2bd1-4ac5-83af-1724336a73d0;532676608B;
        thin_device;23044370202624B;6.96
        xcp-persistent-database_00000;1077936128B;13.85
        xcp-volume-fda3d913-47cc-4a8d-8a54-3364c8ae722a_00000;86083895296B;25.20
        xcp-volume-8ddb8f7e-a549-4c53-a9d5-9b2e40d3810e_00000;215197155328B;2.35
        xcp-volume-43467341-30c8-4fec-b807-81334d0dd309_00000;215197155328B;2.52
        xcp-volume-5283a6e0-4e95-4aca-b5e1-7eb3fea7fcd3_00000;2194921226240B;69.30
        xcp-volume-907e72d1-4389-4425-8e1e-e53a4718cb92_00000;86088089600B;0.60
        xcp-volume-4c368a33-d0af-4f1d-9f7d-486a1df1d028_00000;86088089600B;0.06
        xcp-volume-2bd88964-3feb-401a-afc1-c88c790cc206_00000;86092283904B;24.81
        xcp-volume-833eba2a-a70b-4787-b78a-afef8cc0e14d_00000;86092283904B;0.04
        xcp-volume-81809c66-5763-4558-919a-591b864d3f22_00000;215197155328B;4.66
        xcp-volume-9fa2ec95-9bea-45ae-a583-6f1941a614e7_00000;86096478208B;0.04
        xcp-volume-5dbfaef0-cc83-43a8-bba1-469d65bc3460_00000;215205543936B;6.12
        xcp-volume-1e2dd480-a505-46fc-a6e8-ac8d4341a213_00000;215209738240B;0.02
        xcp-volume-603ac344-edf1-43d7-8c27-eecfd7e6d627_00000;215209738240B;2.19
      
      

      In fact, vgs --separator accepts pretty much any character except semicolon. Maybe it's a problem with this version of LVM2?

      [12:37 node0 ~]# yum info device-mapper-persistent-data.x86_64
      Loaded plugins: fastestmirror
      Loading mirror speeds from cached hostfile
      Excluding mirror: updates.xcp-ng.org
       * xcp-ng-base: mirrors.xcp-ng.org
      Excluding mirror: updates.xcp-ng.org
       * xcp-ng-updates: mirrors.xcp-ng.org
      Installed Packages
      Name        : device-mapper-persistent-data
      Arch        : x86_64
      Version     : 0.7.3
      Release     : 3.el7
      Size        : 1.2 M
      Repo        : installed
      From repo   : install
      
      
      containerman17 created this issue in LINBIT/linstor-server

      closed Unable to parse free thin sizes error on Satellite #80

      1 Reply Last reply Reply Quote 0
      • J Offline
        jmm
        last edited by

        Hi team,
        I'm currently testing xostor on a three nodes xcp-8.2.1 pool
        Before adding any new vm, i replaced a node (xcp-hc3)
        Since everything seems to be ok, i've added two vms.
        But I think that a diskless resource is missing for "xcp-persistent-database"
        Is there a way to resolve this situation ?

        [10:23 xcp-hc1 ~]# linstor resource list
        ╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
        ┊ ResourceName ┊ Node ┊ Port ┊ Usage ┊ Conns ┊ State ┊ CreatedOn ┊
        ╞════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
        ┊ xcp-persistent-database ┊ xcp-hc1 ┊ 7000 ┊ InUse ┊ Ok ┊ UpToDate ┊ 2023-12-18 15:47:37 ┊
        ┊ xcp-persistent-database ┊ xcp-hc2 ┊ 7000 ┊ Unused ┊ Ok ┊ UpToDate ┊ 2023-12-18 15:47:37 ┊
        ┊ xcp-volume-17208381-56c0-4d8a-9c16-0a2000a45e56 ┊ xcp-hc1 ┊ 7004 ┊ Unused ┊ Ok ┊ UpToDate ┊ 2023-12-18 17:41:41 ┊
        ┊ xcp-volume-17208381-56c0-4d8a-9c16-0a2000a45e56 ┊ xcp-hc2 ┊ 7004 ┊ InUse ┊ Ok ┊ Diskless ┊ 2023-12-18 17:41:41 ┊
        ┊ xcp-volume-17208381-56c0-4d8a-9c16-0a2000a45e56 ┊ xcp-hc3 ┊ 7004 ┊ Unused ┊ Ok ┊ UpToDate ┊ 2023-12-18 17:41:42 ┊
        ┊ xcp-volume-94af3c03-91b4-46ea-bf51-d0c50a085e6b ┊ xcp-hc1 ┊ 7002 ┊ InUse ┊ Ok ┊ Diskless ┊ 2023-12-19 10:17:15 ┊
        ┊ xcp-volume-94af3c03-91b4-46ea-bf51-d0c50a085e6b ┊ xcp-hc2 ┊ 7002 ┊ Unused ┊ Ok ┊ UpToDate ┊ 2023-12-19 09:49:35 ┊
        ┊ xcp-volume-94af3c03-91b4-46ea-bf51-d0c50a085e6b ┊ xcp-hc3 ┊ 7002 ┊ Unused ┊ Ok ┊ UpToDate ┊ 2023-12-19 09:49:35 ┊
        ┊ xcp-volume-a395bb01-76a2-4e9a-a082-f18b3287afb2 ┊ xcp-hc1 ┊ 7005 ┊ Unused ┊ Ok ┊ Diskless ┊ 2023-12-19 10:17:16 ┊
        ┊ xcp-volume-a395bb01-76a2-4e9a-a082-f18b3287afb2 ┊ xcp-hc2 ┊ 7005 ┊ Unused ┊ Ok ┊ UpToDate ┊ 2023-12-19 09:49:45 ┊
        ┊ xcp-volume-a395bb01-76a2-4e9a-a082-f18b3287afb2 ┊ xcp-hc3 ┊ 7005 ┊ Unused ┊ Ok ┊ UpToDate ┊ 2023-12-19 09:49:45 ┊

        J 1 Reply Last reply Reply Quote 0
        • J Offline
          jmm @jmm
          last edited by

          @jmm Self answer :
          linstor resource create xcp-hc3 xcp-persistent-database --drbd-diskless

          🙂

          1 Reply Last reply Reply Quote 0
          • J john.c referenced this topic on
          • G Offline
            gb.123
            last edited by

            I am getting :

              WARNING: Pool zeroing and 1.00 MiB large chunk size slows down thin provisioning.
              WARNING: Consider disabling zeroing (-Zn) or using smaller chunk size (<512.00 KiB).
            

            How do I change Chunk Size and/or zeroing ?

            Can this be done 'on the fly' (without loosing data) ?

            1 Reply Last reply Reply Quote 0
            • G Offline
              gb.123
              last edited by gb.123

              This post is deleted!
              1 Reply Last reply Reply Quote 0
              • B Offline
                BHellman 3rd party vendor
                last edited by

                This thread has grown quite large and has a lot of information in it. Is there an official documentation chapter on XOSTOR available anywhere?

                ronan-aR 1 Reply Last reply Reply Quote 0
                • olivierlambertO Offline
                  olivierlambert Vates 🪐 Co-Founder CEO
                  last edited by

                  For now it's within this thread 🙂 Feel free to tell us what's missing in the first post!

                  1 Reply Last reply Reply Quote 0
                  • ronan-aR Offline
                    ronan-a Vates 🪐 XCP-ng Team @BHellman
                    last edited by

                    @BHellman The first post has a FAQ that I update each time I meet users with a common/recurring problem. 😉

                    1 Reply Last reply Reply Quote 2
                    • B Offline
                      BHellman 3rd party vendor
                      last edited by

                      Thanks for the replies. My issues are currently with the GUI so I don't know if that applies here. This is all from the GUI, so please let me know if that's outside the scope of this post and I can post elsewhere.

                      One issue is upon creating a new XOSTOR SR, the packages are installed, however the SR creation fails due to one of the package, sm-rawhba, that needs updating. You have to apply patched through the GUI then reboot the node, or execute "xe-restart-toolstack" on each node. You can then go back and create a new SR, but only after wiping the disks that you originally tried to create the SR on; vgremove and pvremove.

                      I'm planning on doing some more testing, please let me know if GUI issues are appropriate to post here.

                      ronan-aR 1 Reply Last reply Reply Quote 0
                      • ronan-aR Offline
                        ronan-a Vates 🪐 XCP-ng Team @BHellman
                        last edited by

                        @BHellman It's fine to post simple issues in this thread. For complex problems a ticket is probably better. 🙂

                        One issue is upon creating a new XOSTOR SR, the packages are installed, however the SR creation fails due to one of the package, sm-rawhba, that needs updating.

                        Not totally that, sm-rawhba is added to the list because the UI installs a modified version of sm with LINSTOR support.
                        The real issue is that xe-toolstack-restart is not called during the initial setup, a method is missing in our updater plugin to check if a package is present or not, I will add this method for the XOA team. 😉

                        1 Reply Last reply Reply Quote 0
                        • B Offline
                          BHellman 3rd party vendor
                          last edited by

                          I'm not sure what the expected behavior is but....

                          I have xcp1, xcp2, xcp3 as hosts in my XOSTOR pool, using an XOSTOR repository. I had a VM running on xcp2, unplugged the power from it and left it uplugged for about 5 minutes. The VM remained "running" according to XOA, however it wasn't.

                          What is the expected behavior when this happens and how do you go about recovering from a temporarily failed/powered off node?

                          My expectation was that my vm would move to xcp1 (where there is a replica) and start, then outdate xcp2. I have "auto start" enabled under advanced on the VM.

                          L 1 Reply Last reply Reply Quote 0
                          • L Offline
                            limezest @BHellman
                            last edited by

                            @BHellman
                            "auto start" means that when you power up the cluster or host node that VM will be automatically started.

                            I think you're describing high availability, which needs to be enabled at the cluster level. Then you need to define a HA policy for the vm

                            ronan-aR 1 Reply Last reply Reply Quote 1
                            • ronan-aR Offline
                              ronan-a Vates 🪐 XCP-ng Team @limezest
                              last edited by

                              @limezest Exactly. Auto start feature is only checked during host boot.

                              @BHellman To automatically restart a VM in case of failure:

                              xe vm-param-set uuid=<VM_UUID> ha-restart-priority=restart order=1 
                              xe pool-ha-enable heartbeat-sr-uuids=<SR_UUID> 
                              
                              B 1 Reply Last reply Reply Quote 0
                              • B Offline
                                BHellman 3rd party vendor @ronan-a
                                last edited by

                                @ronan-a @limezest

                                Thank you for the replies 🙂

                                Sorry for all the newb questions - I'm diving into this when time permits. Appreciate the help and understanding.

                                1 Reply Last reply Reply Quote 1
                                • B Offline
                                  BHellman 3rd party vendor
                                  last edited by

                                  I did those commands on xcp1 (pool master) and on the SR that was XOSTOR (linstor) and powered off xcp2. At that point the pool disappeared.

                                  Now I'm getting the following on the xcp servers console:

                                  Broadcast message from systemd-journald@xcp3 (Thu 2024-02-08 14:03:12 EST):
                                  
                                  xapi-nbd[5580]: main: Failed to log in via xapi's Unix domain socket in 300.000000 seconds
                                  
                                  
                                  Broadcast message from systemd-journald@xcp3 (Thu 2024-02-08 14:03:12 EST):
                                  
                                  xapi-nbd[5580]: main: Caught unexpected exception: (Failure
                                  
                                  
                                  Broadcast message from systemd-journald@xcp3 (Thu 2024-02-08 14:03:12 EST):
                                  
                                  xapi-nbd[5580]: main:   "Failed to log in via xapi's Unix domain socket in 300.000000 seconds")
                                  
                                  

                                  After powering up xcp2 the pool never comes back in the XOA interface.

                                  I'm seeing this on
                                  xcp1:

                                  [14:04 xcp1 ~]# drbdadm status
                                  xcp-persistent-database role:Secondary
                                    disk:Diskless quorum:no
                                    xcp2 connection:Connecting
                                    xcp3 connection:Connecting
                                  
                                  

                                  xcp2 and 3

                                  [14:10 xcp2 ~]# drbdadm status
                                  # No currently configured DRBD found.
                                  

                                  Seems like I hosed this thing up really good. I assume this broke because XOSTOR isn't a shared disk technically.

                                  [14:15 xcp1 /]# xe sr-list
                                  The server could not join the liveset because the HA daemon could not access the heartbeat disk.
                                  

                                  Is HA + XOSTOR something that should work?

                                  M olivierlambertO 2 Replies Last reply Reply Quote 0
                                  • J Offline
                                    Jonathon
                                    last edited by Jonathon

                                    Hello!

                                    I am attempting to update our hosts, starting with the pool controller. But I am getting a message that I wanted to ask about.

                                    The following happens when I attempt a yum update

                                    --> Processing Dependency: sm-linstor for package: xcp-ng-linstor-1.1-3.xcpng8.2.noarch
                                    --> Finished Dependency Resolution
                                    Error: Package: xcp-ng-linstor-1.1-3.xcpng8.2.noarch (xcp-ng-updates)
                                               Requires: sm-linstor
                                    You could try using --skip-broken to work around the problem
                                               You could try running: rpm -Va --nofiles --nodigest
                                    

                                    Only reference I am finding is here: https://koji.xcp-ng.org/buildinfo?buildID=3044
                                    My best guess is I need to do two updates, the first one skip broken. But wanted to ask to be sure as to not put things in a weird state.

                                    Thanks in advance!

                                    stormiS 2 Replies Last reply Reply Quote 0
                                    • M Offline
                                      Midget @BHellman
                                      last edited by

                                      @BHellman I have the EXACT same errors and scrolling logs now. I made a thread here...

                                      1 Reply Last reply Reply Quote 0
                                      • olivierlambertO Offline
                                        olivierlambert Vates 🪐 Co-Founder CEO @BHellman
                                        last edited by

                                        @BHellman Yes it should. @ronan-a will take a look around when he can 🙂

                                        1 Reply Last reply Reply Quote 0
                                        • stormiS Offline
                                          stormi Vates 🪐 XCP-ng Team @Jonathon
                                          last edited by

                                          @Jonathon Never use --skip-broken.

                                          1 Reply Last reply Reply Quote 0
                                          • stormiS Offline
                                            stormi Vates 🪐 XCP-ng Team @Jonathon
                                            last edited by

                                            @Jonathon What's the output of yum repolist?

                                            J 1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post