XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    XOSTOR hyperconvergence preview

    Scheduled Pinned Locked Moved XOSTOR
    457 Posts 50 Posters 543.4k Views 53 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • G Offline
      geoffbland @ronan-a
      last edited by

      This post is deleted!
      1 Reply Last reply Reply Quote 0
      • G Offline
        geoffbland
        last edited by

        What is the URL for the GitHub repo for XOSTOR in case I find some issues and need to report them? I looked under the xcpng project - only sm (Stroage Manager) seemed relevant.

        ronan-aR 1 Reply Last reply Reply Quote 0
        • ronan-aR Offline
          ronan-a Vates 🪐 XCP-ng Team @geoffbland
          last edited by

          @geoffbland Yes, https://github.com/xcp-ng/sm is the right repository. 😉

          G 1 Reply Last reply Reply Quote 0
          • G Offline
            geoffbland @ronan-a
            last edited by

            @ronan-a Thanks.
            There's no Issues tab on this repo so no way to open issues on this repo. Are Issues turned on for this?

            ronan-aR 1 Reply Last reply Reply Quote 0
            • ronan-aR Offline
              ronan-a Vates 🪐 XCP-ng Team @geoffbland
              last edited by

              @geoffbland The entry for issues is this repo: https://github.com/xcp-ng/xcp
              The sm repo is used for the pull requests.

              1 Reply Last reply Reply Quote 0
              • G Offline
                geoffbland
                last edited by geoffbland

                First tests with XOSAN with newly created VMs have been good.

                I'm now trying to migrate some existing VMs from NFS (TrueNAS) to XOSAN to test "active" VMs.

                With the VM running - pressing the Migrate VDI button on the Disks tab, pauses the VM as expected but when the VM restarts the VDI is still on the original disk. The VDI has not been migrated to XOSAN.

                If I first stop the VM and then press the Migrate VDI button on the Disks tab, I then do get an error.

                vdi.migrate
                {
                  "id": "8a3520ad-328f-4515-b547-2fb283edbd91",
                  "sr_id": "cf896912-cd71-d2b2-488a-5792b7147c87"
                }
                {
                  "code": "SR_BACKEND_FAILURE_46",
                  "params": [
                    "",
                    "The VDI is not available [opterr=Could not load f1ca0b16-ce23-408a-b80e-xxxxxxxxxxxx because: No such file or directory]",
                    ""
                  ],
                  "task": {
                    "uuid": "8b3b47ee-4135-fea7-5f30-xxxxxxxxxxxx",
                    "name_label": "Async.VDI.pool_migrate",
                    "name_description": "",
                    "allowed_operations": [],
                    "current_operations": {},
                    "created": "20220522T12:20:12Z",
                    "finished": "20220522T12:20:54Z",
                    "status": "failure",
                    "resident_on": "OpaqueRef:a1e9a8f3-0a79-4824-b29f-d81b3246d190",
                    "progress": 1,
                    "type": "<none/>",
                    "result": "",
                    "error_info": [
                      "SR_BACKEND_FAILURE_46",
                      "",
                      "The VDI is not available [opterr=Could not load f1ca0b16-ce23-408a-b80e-xxxxxxxxxxxx because: No such file or directory]",
                      ""
                    ],
                    "other_config": {},
                    "subtask_of": "OpaqueRef:NULL",
                    "subtasks": [],
                    "backtrace": "(((process xapi)(filename ocaml/xapi-client/client.ml)(line 7))((process xapi)(filename ocaml/xapi-client/client.ml)(line 19))((process xapi)(filename ocaml/xapi-client/client.ml)(line 12325))((process xapi)(filename lib/xapi-stdext-pervasives/pervasiveext.ml)(line 24))((process xapi)(filename lib/xapi-stdext-pervasives/pervasiveext.ml)(line 35))((process xapi)(filename ocaml/xapi/message_forwarding.ml)(line 131))((process xapi)(filename lib/xapi-stdext-pervasives/pervasiveext.ml)(line 24))((process xapi)(filename lib/xapi-stdext-pervasives/pervasiveext.ml)(line 35))((process xapi)(filename lib/xapi-stdext-pervasives/pervasiveext.ml)(line 24))((process xapi)(filename ocaml/xapi/rbac.ml)(line 231))((process xapi)(filename ocaml/xapi/server_helpers.ml)(line 103)))"
                  },
                  "message": "SR_BACKEND_FAILURE_46(, The VDI is not available [opterr=Could not load f1ca0b16-ce23-408a-b80e-xxxxxxxxxxxx because: No such file or directory], )",
                  "name": "XapiError",
                  "stack": "XapiError: SR_BACKEND_FAILURE_46(, The VDI is not available [opterr=Could not load f1ca0b16-ce23-408a-b80e-xxxxxxxxxxxx because: No such file or directory], )
                    at Function.wrap (/opt/xo/xo-builds/xen-orchestra-202204291839/packages/xen-api/src/_XapiError.js:16:12)
                    at _default (/opt/xo/xo-builds/xen-orchestra-202204291839/packages/xen-api/src/_getTaskResult.js:11:29)
                    at Xapi._addRecordToCache (/opt/xo/xo-builds/xen-orchestra-202204291839/packages/xen-api/src/index.js:949:24)
                    at forEach (/opt/xo/xo-builds/xen-orchestra-202204291839/packages/xen-api/src/index.js:983:14)
                    at Array.forEach (<anonymous>)
                    at Xapi._processEvents (/opt/xo/xo-builds/xen-orchestra-202204291839/packages/xen-api/src/index.js:973:12)
                    at Xapi._watchEvents (/opt/xo/xo-builds/xen-orchestra-202204291839/packages/xen-api/src/index.js:1139:14)"
                }
                

                Exporting the VDI from NFS and (re)importing as a VM on XOSTOR does work.

                I'm guessing this is not a problem with XOSTOR specifically but with XO or NFS, still I would like to work out what is causing migration and how to fix it?

                Also I have noticed that the underlying volume on XOSTOR/linstor that was created and started to be populated does get cleaned up when the migrate fails.

                This is using XO from the sources - updated fairly recently (commit 8ed84) and XO server 5.92.0.

                1 Reply Last reply Reply Quote 0
                • olivierlambertO Online
                  olivierlambert Vates 🪐 Co-Founder CEO
                  last edited by

                  It could be interesting to understand why the migration failed the first time. Is there absolutely no error during this first migration?

                  G 1 Reply Last reply Reply Quote 0
                  • G Offline
                    geoffbland @olivierlambert
                    last edited by

                    @olivierlambert Thanks for the prompt response.

                    I am pretty sure there was no error reported but as I cleared down the logs when I retried from on export/import I can't be 100% sure.

                    So I tested migration on another VM to try and replicate this and this time migration worked OK.

                    The only difference I can think of is that the failure occurred on a VM created quite a time ago - whilst the working VM had been created recently.

                    I will do a few more tests and see if I can replicate this again.

                    1 Reply Last reply Reply Quote 0
                    • olivierlambertO Online
                      olivierlambert Vates 🪐 Co-Founder CEO
                      last edited by

                      Okay great 🙂 If you can reproduce, that would be even better to try to do the migration with xe CLI, this way we remove more moving pieces in the middle 🙂

                      G 1 Reply Last reply Reply Quote 0
                      • G Offline
                        geoffbland @olivierlambert
                        last edited by

                        @olivierlambert Sorry, took some time to get around to this. But trying to migrate a VDI from an NFS store to XOSTOR is still failing most of the time. This is a VM that was created some time ago - it I do the same with the VDI of a a recently created VM the migration seems to work OK.

                        >xe vm-disk-list vm=lb01
                        Disk 0 VBD:
                        uuid ( RO)             : d9d06048-6f91-1913-714d-xxxxxxxxaece
                            vm-name-label ( RO): lb01
                               userdevice ( RW): 0
                        
                        
                        Disk 0 VDI:
                        uuid ( RO)             : a38f27e8-c6a0-49d3-9fd3-xxxxxxxx10e3
                               name-label ( RW): lb01_tnc01_hdd
                            sr-name-label ( RO): XCPNG_VMs_TrueNAS
                             virtual-size ( RO): 10737418240
                        
                        >xe sr-list name-label=XOSTOR01
                        uuid ( RO)                : cf896912-cd71-d2b2-488a-xxxxxxxx7c87
                                  name-label ( RW): XOSTOR01
                            name-description ( RW):
                                        host ( RO): <shared>
                                        type ( RO): linstor
                                content-type ( RO):
                        
                        >xe vdi-pool-migrate uuid=a38f27e8-c6a0-49d3-9fd3-xxxxxxxx10e3 sr-uuid=cf896912-cd71-d2b2-488a-xxxxxxxx7c87
                        
                        Error code: SR_BACKEND_FAILURE_46
                        Error parameters: , The VDI is not available [opterr=Could not load 735fc2d7-f1f0-4cc6-9d35-xxxxxxxxec6c because: ['XENAPI_PLUGIN_FAILURE', 'getVHDInfo', 'CommandException', 'No such file or directory']],
                        

                        Running this I see the VM pause as expected for a few minutes and then it just starts up again. VM is still running with no issues - it just did not move the VDI.

                        What is the resource with UUID 735fc2d7-f1f0-4cc6-9d35-xxxxxxxxec6c that it is trying to find? That UUID does not match the VDI.

                        The VDI must be OK as the VM is still up and running with no errors.

                        As this is probably not an XOSTOR issue - should I raise a new topic for this?

                        ronan-aR 1 Reply Last reply Reply Quote 0
                        • olivierlambertO Online
                          olivierlambert Vates 🪐 Co-Founder CEO
                          last edited by

                          It's hard to tell. If you can migrate between non-XOSTOR SRs and see if you reproduce, then it's another issue. If it's only happening when using XOSTOR in the loop, then it's relevant here 🙂

                          1 Reply Last reply Reply Quote 0
                          • ronan-aR Offline
                            ronan-a Vates 🪐 XCP-ng Team @geoffbland
                            last edited by

                            @geoffbland I can't reproduce your problem, can you send me the SMlog of your hosts please? 🙂

                            G 1 Reply Last reply Reply Quote 0
                            • G Offline
                              geoffbland @ronan-a
                              last edited by

                              @ronan-a said in XOSTOR hyperconvergence preview:

                              @geoffbland I can't reproduce your problem, can you send me the SMlog of your hosts please? 🙂

                              As requested,

                              May 24 09:13:22 XCPNG01 SM: [18127] ['/usr/bin/vhd-util', 'query', '--debug', '-vsfp', '-n', '/dev/drbd/by-res/xcp-volume-75c11231-0fb8-4b40-9e2e-xxxxxxxx58c0/0']
                              May 24 09:13:23 XCPNG01 SM: [18127] FAILED in util.pread: (rc 2) stdout: '40960
                              May 24 09:13:23 XCPNG01 SM: [18127] 2048840192
                              May 24 09:13:23 XCPNG01 SM: [18127] query failed
                              May 24 09:13:23 XCPNG01 SM: [18127] hidden: 0
                              May 24 09:13:23 XCPNG01 SM: [18127] ', stderr: ''
                              May 24 09:13:23 XCPNG01 SM: [18127] linstor-manager:get_vhd_info error: No such file or directory
                              May 24 09:13:26 XCPNG01 SM: [18158] ['/usr/bin/vhd-util', 'query', '--debug', '-vsfp', '-n', '/dev/drbd/by-res/xcp-volume-75c11231-0fb8-4b40-9e2e-xxxxxxxx58c0/0']
                              May 24 09:13:26 XCPNG01 SM: [18158] FAILED in util.pread: (rc 2) stdout: '40960
                              May 24 09:13:26 XCPNG01 SM: [18158] 2048840192
                              May 24 09:13:26 XCPNG01 SM: [18158] query failed
                              May 24 09:13:26 XCPNG01 SM: [18158] hidden: 0
                              May 24 09:13:26 XCPNG01 SM: [18158] ', stderr: ''
                              May 24 09:13:26 XCPNG01 SM: [18158] linstor-manager:get_vhd_info error: No such file or directory
                              May 24 09:13:29 XCPNG01 SM: [18200] ['/usr/bin/vhd-util', 'query', '--debug', '-vsfp', '-n', '/dev/drbd/by-res/xcp-volume-75c11231-0fb8-4b40-9e2e-xxxxxxxx58c0/0']
                              May 24 09:13:29 XCPNG01 SM: [18200] FAILED in util.pread: (rc 2) stdout: '40960
                              May 24 09:13:29 XCPNG01 SM: [18200] 2048840192
                              May 24 09:13:29 XCPNG01 SM: [18200] query failed
                              May 24 09:13:29 XCPNG01 SM: [18200] hidden: 0
                              May 24 09:13:29 XCPNG01 SM: [18200] ', stderr: ''
                              May 24 09:13:29 XCPNG01 SM: [18200] linstor-manager:get_vhd_info error: No such file or directory
                              May 24 09:13:32 XCPNG01 SM: [18212] ['/usr/bin/vhd-util', 'query', '--debug', '-vsfp', '-n', '/dev/drbd/by-res/xcp-volume-75c11231-0fb8-4b40-9e2e-xxxxxxxx58c0/0']
                              May 24 09:13:32 XCPNG01 SM: [18212] FAILED in util.pread: (rc 2) stdout: '40960
                              May 24 09:13:32 XCPNG01 SM: [18212] 2048840192
                              May 24 09:13:32 XCPNG01 SM: [18212] query failed
                              May 24 09:13:32 XCPNG01 SM: [18212] hidden: 0
                              May 24 09:13:32 XCPNG01 SM: [18212] ', stderr: ''
                              May 24 09:13:32 XCPNG01 SM: [18212] linstor-manager:get_vhd_info error: No such file or directory
                              May 24 09:13:35 XCPNG01 SM: [18247] ['/usr/bin/vhd-util', 'query', '--debug', '-vsfp', '-n', '/dev/drbd/by-res/xcp-volume-75c11231-0fb8-4b40-9e2e-xxxxxxxx58c0/0']
                              May 24 09:13:35 XCPNG01 SM: [18247] FAILED in util.pread: (rc 2) stdout: '40960
                              May 24 09:13:35 XCPNG01 SM: [18247] 2048840192
                              May 24 09:13:35 XCPNG01 SM: [18247] query failed
                              May 24 09:13:35 XCPNG01 SM: [18247] hidden: 0
                              May 24 09:13:35 XCPNG01 SM: [18247] ', stderr: ''
                              May 24 09:13:35 XCPNG01 SM: [18247] linstor-manager:get_vhd_info error: No such file or directory
                              May 24 09:13:36 XCPNG01 SM: [18259] ['/usr/bin/vhd-util', 'query', '--debug', '-vsfp', '-n', '/dev/drbd/by-res/xcp-volume-75c11231-0fb8-4b40-9e2e-xxxxxxxx58c0/0']
                              May 24 09:13:36 XCPNG01 SM: [18259] FAILED in util.pread: (rc 2) stdout: '40960
                              May 24 09:13:36 XCPNG01 SM: [18259] 2048840192
                              May 24 09:13:36 XCPNG01 SM: [18259] query failed
                              May 24 09:13:36 XCPNG01 SM: [18259] hidden: 0
                              May 24 09:13:36 XCPNG01 SM: [18259] ', stderr: ''
                              May 24 09:13:36 XCPNG01 SM: [18259] linstor-manager:get_vhd_info error: No such file or directory
                              
                              1 Reply Last reply Reply Quote 0
                              • G Offline
                                geoffbland
                                last edited by geoffbland

                                I found one of the VMs I had been using to test XOSTOR was locked up this morning. I restarted it but it will not start up and gives an error about the VDI being missing.

                                >xe vm-list name-label=test04
                                uuid ( RO)           : 8ec952b4-7229-7a30-81b6-1564a58f6343
                                     name-label ( RW): test04
                                    power-state ( RO): halted
                                
                                >xe vm-disk-list vm=test04
                                Disk 0 VBD:
                                uuid ( RO)             : e3c465b8-17a4-d147-6383-527bd9341a16
                                    vm-name-label ( RO): test04
                                       userdevice ( RW): 0
                                
                                
                                Disk 0 VDI:
                                uuid ( RO)             : 735fc2d7-f1f0-4cc6-9d35-42a049d8ec6c
                                       name-label ( RW): test04_xostor01_vdi
                                    sr-name-label ( RO): XOSTOR01
                                     virtual-size ( RO): 42949672960
                                
                                >xe vm-start vm=test04
                                Error code: SR_BACKEND_FAILURE_46
                                Error parameters: , The VDI is not available [opterr=Could not load 735fc2d7-f1f0-4cc6-9d35-42a049d8ec6c because: ['XENAPI_PLUGIN_FAILURE', 'getVHDInfo', 'CommandException', 'No such file or directory']],
                                

                                The logs for this are attached as file xostor issue 1.txt

                                ronan-aR 1 Reply Last reply Reply Quote 0
                                • ronan-aR Offline
                                  ronan-a Vates 🪐 XCP-ng Team @geoffbland
                                  last edited by

                                  @geoffbland Can you execute this command on the other hosts please?

                                  ls -l /dev/drbd/by-res/xcp-volume-75c11231-0fb8-4b40-9e2e-a0665bb758c0/0
                                  

                                  Also I don't have all the info in your previous log, can you send me the previous SMlog files? (Using private message if you want. 🙂 )

                                  G 1 Reply Last reply Reply Quote 0
                                  • G Offline
                                    geoffbland @ronan-a
                                    last edited by

                                    @ronan-a said in XOSTOR hyperconvergence preview:

                                    Can you execute this command on the other hosts please?

                                    As requested

                                    XCPNG01 - Current linstor master

                                    [10:59 XCPNG01 ~]# ls -l /dev/drbd/by-res/xcp-volume-75c11231-0fb8-4b40-9e2e-a0665bb758c0/0
                                    lrwxrwxrwx 1 root root 17 May 22 19:24 /dev/drbd/by-res/xcp-volume-75c11231-0fb8-4b40-9e2e-a0665bb758c0/0 -> ../../../drbd1004
                                    

                                    XCPNG02

                                    [11:00 XCPNG02 ~]# ls -l /dev/drbd/by-res/xcp-volume-75c11231-0fb8-4b40-9e2e-a0665bb758c0/0
                                    lrwxrwxrwx 1 root root 17 May 22 19:25 /dev/drbd/by-res/xcp-volume-75c11231-0fb8-4b40-9e2e-a0665bb758c0/0 -> ../../../drbd1004
                                    

                                    XCPNG03

                                    [07:31 XCPNG03 ~]# ls -l /dev/drbd/by-res/xcp-volume-75c11231-0fb8-4b40-9e2e-a0665bb758c0/0
                                    lrwxrwxrwx 1 root root 17 May 22 19:24 /dev/drbd/by-res/xcp-volume-75c11231-0fb8-4b40-9e2e-a0665bb758c0/0 -> ../../../drbd1004
                                    

                                    XCPNG04

                                    [07:35 XCPNG04 ~]# ls -l /dev/drbd/by-res/xcp-volume-75c11231-0fb8-4b40-9e2e-a0665bb758c0/0
                                    lrwxrwxrwx 1 root root 17 May 22 19:24 /dev/drbd/by-res/xcp-volume-75c11231-0fb8-4b40-9e2e-a0665bb758c0/0 -> ../../../drbd1004
                                    

                                    XCPNG05

                                    [10:49 XCPNG05 ~]# ls -l /dev/drbd/by-res/xcp-volume-75c11231-0fb8-4b40-9e2e-a0665bb758c0/0
                                    lrwxrwxrwx 1 root root 17 May 22 19:24 /dev/drbd/by-res/xcp-volume-75c11231-0fb8-4b40-9e2e-a0665bb758c0/0 -> ../../../drbd1004
                                    
                                    ronan-aR 1 Reply Last reply Reply Quote 0
                                    • G Offline
                                      geoffbland
                                      last edited by

                                      It seems like xe may be getting mixed up between where a host is running and where the XOSTOR storage is held.
                                      Apologies if I have misunderstood and done something wrong here - but I think this migration should have worked.
                                      I created a new VM on one of my hosts XCPNG05 using XOSTOR as the VDI RS. I can see that the linstore volumes are on hosts XCPNG01, XCPNG03 and XCPNG05.
                                      XCPNG05 is an Intel server, XCPNG01 and XCPNG03 are AMD. The VM is running on XCPNG05.
                                      Now when I try to migrate the VM's VDI from XOSTOR onto a local VDI on the same host the VM is currently running on I get a warning about incompatible CPUs.

                                      To replicate the issue:

                                      Create new VM test05 on XOSTOR.
                                      VM is created on host XCPNG05.

                                      >xe vm-list name-label=test05
                                      uuid ( RO)           : d3f8c52d-be3c-3712-0ccc-a526dcc241a5
                                           name-label ( RW): test05
                                          power-state ( RO): running
                                      
                                      >xe vm-disk-list vm=test05
                                      Disk 0 VBD:
                                      uuid ( RO)             : a337cd1f-04cc-ce46-fbfb-d5d8e290dc03
                                          vm-name-label ( RO): test05
                                             userdevice ( RW): 0
                                      
                                      Disk 0 VDI:
                                      uuid ( RO)             : f856680c-c00d-44af-ba3f-16d9952ccb2f
                                             name-label ( RW): test05_vdi
                                          sr-name-label ( RO): XOSTOR01
                                           virtual-size ( RO): 34359738368
                                      
                                      >xe sr-list name-label=XOSTOR01
                                      uuid ( RO)                : cf896912-cd71-d2b2-488a-xxxxxxxx7c87
                                                name-label ( RW): XOSTOR01
                                          name-description ( RW):
                                                      host ( RO): <shared>
                                                      type ( RO): linstor
                                              content-type ( RO):
                                      

                                      Migrate to local disk (SSD1) on same host (XCPNG05) - this fails migrating to the same host the VM is currently running on due to incompatible CPU.

                                      >xe sr-list name-label=XCPNG05SSD1
                                      uuid ( RO)                : c0851501-3a1b-c661-70b9-54373e0d9847
                                                name-label ( RW): XCPNG05SSD1
                                          name-description ( RW):
                                                      host ( RO): XCPNG05
                                                      type ( RO): lvm
                                              content-type ( RO): user
                                      
                                      >xe vdi-pool-migrate uuid=f856680c-c00d-44af-ba3f-16d9952ccb2f sr-uuid=c0851501-3a1b-c661-70b9-54373e0d9847
                                      The VM is incompatible with the CPU features of this host.
                                      vm: d3f8c52d-be3c-3712-0ccc-a526dcc241a5 (test05)
                                      host: 7bd62a77-71d6-4b51-9a86-850dd4ff4b60 (XCPNG05)
                                      reason: VM last booted on a host which had a CPU from a different vendor.
                                      
                                      
                                      1 Reply Last reply Reply Quote 0
                                      • ronan-aR Offline
                                        ronan-a Vates 🪐 XCP-ng Team @geoffbland
                                        last edited by

                                        @geoffbland Thank you, so ok the VDI is still here on all hosts.

                                        You can try to check the status of the VDH like the smapi using:

                                        /usr/bin/vhd-util query --debug -vsfp -n /dev/drbd/by-res/xcp-volume-75c11231-0fb8-4b40-9e2e-a0665bb758c0/0
                                        

                                        If you have this problem on many hosts, I suspect a problem with DRBD, so maybe there is a useful info in daemon.log and/or kern.log.

                                        G 1 Reply Last reply Reply Quote 0
                                        • G Offline
                                          geoffbland @ronan-a
                                          last edited by geoffbland

                                          @ronan-a said in XOSTOR hyperconvergence preview:

                                          You can try to check the status of the VDH like the smapi using:
                                          /usr/bin/vhd-util query --debug -vsfp -n /dev/drbd/by-res/xcp-volume-75c11231-0fb8-4b40-9e2e-a0665bb758c0/0
                                          useful info in daemon.log and/or kern.log.

                                          This gives the following:

                                          [10:59 XCPNG01 ~]# /usr/bin/vhd-util query --debug -vsfp -n /dev/drbd/by-res/xcp-volume-75c11231-0fb8-4b40-9e2e-a0665bb758c0/0
                                          40960
                                          2061447680
                                          query failed
                                          hidden: 0
                                          

                                          I will send logs by direct mail.

                                          ronan-aR 1 Reply Last reply Reply Quote 0
                                          • ronan-aR Offline
                                            ronan-a Vates 🪐 XCP-ng Team @geoffbland
                                            last edited by

                                            @geoffbland Okay so it's probably not related to the driver itself, I will take a look to the logs after reception. 🙂

                                            G 1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post