XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    CBT: the thread to centralize your feedback

    Scheduled Pinned Locked Moved Backup
    439 Posts 37 Posters 386.1k Views 29 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Tristis OrisT Offline
      Tristis Oris Top contributor
      last edited by

      i still have one VMs stuck without backup. Already restart it host and halt VM itself. SMlog have no records during 5minutes of that task.

            "result": {
              "code": "VDI_IN_USE",
              "params": [
                "OpaqueRef:1f96d4e7-5ca6-4070-b686-b34dd83e5442",
                "destroy"
              ],
              "task": {
                "uuid": "81a60e3a-c887-13f3-fedc-36eae232a6df",
                "name_label": "Async.VDI.destroy",
                "name_description": "",
                "allowed_operations": [],
                "current_operations": {},
                "created": "20240918T18:03:28Z",
                "finished": "20240918T18:03:28Z",
                "status": "failure",
                "resident_on": "OpaqueRef:223881b6-1309-40e6-9e42-5ad74a274d2d",
                "progress": 1,
                "type": "<none/>",
                "result": "",
                "error_info": [
                  "VDI_IN_USE",
                  "OpaqueRef:1f96d4e7-5ca6-4070-b686-b34dd83e5442",
                  "destroy"
                ],
                "other_config": {},
                "subtask_of": "OpaqueRef:NULL",
                "subtasks": [],
                "backtrace": "(((process xapi)(filename ocaml/xapi/message_forwarding.ml)(line 4711))((process xapi)(filename lib/xapi-stdext-pervasives/pervasiveext.ml)(line 24))((process xapi)(filename ocaml/xapi/rbac.ml)(line 205))((process xapi)(filename ocaml/xapi/server_helpers.ml)(line 95)))"
              },
              "message": "VDI_IN_USE(OpaqueRef:1f96d4e7-5ca6-4070-b686-b34dd83e5442, destroy)",
              "name": "XapiError",
              "stack": "XapiError: VDI_IN_USE(OpaqueRef:1f96d4e7-5ca6-4070-b686-b34dd83e5442, destroy)\n    at XapiError.wrap (file:///opt/xo/xo-builds/xen-orchestra-202409111040/packages/xen-api/_XapiError.mjs:16:12)\n    at default (file:///opt/xo/xo-builds/xen-orchestra-202409111040/packages/xen-api/_getTaskResult.mjs:13:29)\n    at Xapi._addRecordToCache (file:///opt/xo/xo-builds/xen-orchestra-202409111040/packages/xen-api/index.mjs:1041:24)\n    at file:///opt/xo/xo-builds/xen-orchestra-202409111040/packages/xen-api/index.mjs:1075:14\n    at Array.forEach (<anonymous>)\n    at Xapi._processEvents (file:///opt/xo/xo-builds/xen-orchestra-202409111040/packages/xen-api/index.mjs:1065:12)\n    at Xapi._watchEvents (file:///opt/xo/xo-builds/xen-orchestra-202409111040/packages/xen-api/index.mjs:1238:14)\n    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)"
            }
          },
      

      and often get Job canceled to protect the VDI chain errors for others. That continue since bad CBT commit.

      1 Reply Last reply Reply Quote 0
      • D Offline
        Delgado
        last edited by Delgado

        It looks like all of my backups have started erroring with "can't create a stream from a metadata VDI, fall back to a base" I am using 1 NDB connection and I am not commit 530c3. I have attached the logs of a delta backup and a replication.

        2024-09-19T16_00_00.002Z - backup NG.json.txt
        2024-09-19T04_00_00.001Z - backup NG.json.txt

        I am seeing this in the journal logs.

        Sep 19 12:01:39 hostname xo-server[11597]:   error: XapiError: SR_BACKEND_FAILURE_460(, Failed to calculate changed blocks for given VDIs. [opterr=Source and target VDI are unrelated], )
        Sep 19 12:01:39 hostname xo-server[11597]:       at XapiError.wrap (file:///opt/xo/xo-builds/xen-orchestra-202409180806/packages/xen-api/_XapiError.mjs:16:12)
        Sep 19 12:01:39 hostname xo-server[11597]:       at file:///opt/xo/xo-builds/xen-orchestra-202409180806/packages/xen-api/transports/json-rpc.mjs:38:21
        Sep 19 12:01:39 hostname xo-server[11597]:       at process.processTicksAndRejections (node:internal/process/task_queues:95:5) {
        Sep 19 12:01:39 hostname xo-server[11597]:     code: 'SR_BACKEND_FAILURE_460',
        Sep 19 12:01:39 hostname xo-server[11597]:     params: [
        Sep 19 12:01:39 hostname xo-server[11597]:       '',
        Sep 19 12:01:39 hostname xo-server[11597]:       'Failed to calculate changed blocks for given VDIs. [opterr=Source and target VDI are unrelated]',
        Sep 19 12:01:39 hostname xo-server[11597]:       ''
        Sep 19 12:01:39 hostname xo-server[11597]:     ],
        Sep 19 12:01:39 hostname xo-server[11597]:     call: { method: 'VDI.list_changed_blocks', params: [Array] },
        Sep 19 12:01:39 hostname xo-server[11597]:     url: undefined,
        Sep 19 12:01:39 hostname xo-server[11597]:     task: undefined
        Sep 19 12:01:39 hostname xo-server[11597]:   },
        Sep 19 12:01:39 hostname xo-server[11597]:   ref: 'OpaqueRef:0438087b-5cbc-a458-a8a0-4eaa6ce74d19',
        Sep 19 12:01:39 hostname xo-server[11597]:   baseRef: 'OpaqueRef:ae1330a2-0f95-6c16-6878-f6c05373a2f2'
        Sep 19 12:01:39 hostname xo-server[11597]: }
        Sep 19 12:01:43 hostname xo-server[11597]: 2024-09-19T16:01:43.015Z xo:xapi:vdi INFO  OpaqueRef:b6f65ae4-bee8-b179-a06c-2bb4956214ba has been disconnected from dom0 {
        Sep 19 12:01:43 hostname xo-server[11597]:   vdiRef: 'OpaqueRef:0438087b-5cbc-a458-a8a0-4eaa6ce74d19',
        Sep 19 12:01:43 hostname xo-server[11597]:   vbdRef: 'OpaqueRef:b6f65ae4-bee8-b179-a06c-2bb4956214ba'
        Sep 19 12:01:43 hostname xo-server[11597]: }
        Sep 19 12:02:29 hostname xo-server[11597]: 2024-09-19T16:02:29.855Z xo:xapi:vdi INFO can't get changed block {
        Sep 19 12:02:29 hostname xo-server[11597]:   error: XapiError: SR_BACKEND_FAILURE_460(, Failed to calculate changed blocks for given VDIs. [opterr=Source and target VDI are unrelated], )
        Sep 19 12:02:29 hostname xo-server[11597]:       at XapiError.wrap (file:///opt/xo/xo-builds/xen-orchestra-202409180806/packages/xen-api/_XapiError.mjs:16:12)
        Sep 19 12:02:29 hostname xo-server[11597]:       at file:///opt/xo/xo-builds/xen-orchestra-202409180806/packages/xen-api/transports/json-rpc.mjs:38:21
        Sep 19 12:02:29 hostname xo-server[11597]:       at process.processTicksAndRejections (node:internal/process/task_queues:95:5) {
        Sep 19 12:02:29 hostname xo-server[11597]:     code: 'SR_BACKEND_FAILURE_460',
        Sep 19 12:02:29 hostname xo-server[11597]:     params: [
        Sep 19 12:02:29 hostname xo-server[11597]:       '',
        Sep 19 12:02:29 hostname xo-server[11597]:       'Failed to calculate changed blocks for given VDIs. [opterr=Source and target VDI are unrelated]',
        
        R 1 Reply Last reply Reply Quote 0
        • R Offline
          rtjdamen @Delgado
          last edited by

          @Delgado this error sound like issues with the cbt that got invalid, could it be u had a host crash or storage issue? Does a retry create a working full?

          D 1 Reply Last reply Reply Quote 0
          • D Offline
            Delgado @rtjdamen
            last edited by

            @rtjdamen I haven't had any hosts crash recently or any storage issue from what I can tell. The "type" in the log says delta but the size of the backups definitely look like full backups. They're also labelled as key when I look at the restore points for delta backups.

            R 1 Reply Last reply Reply Quote 0
            • R Offline
              rtjdamen @Delgado
              last edited by

              @Delgado i believe this error message is incorrect, it should be something like "CBT invalid fall back to base", i have seen it random once in a while on a vm, and also with issues on a host or specific storage pool.

              1 Reply Last reply Reply Quote 0
              • Tristis OrisT Offline
                Tristis Oris Top contributor
                last edited by

                not sure is it CBT related, never seen that before. VM backup failed in 1min , as always, but task still looks like active.

                5268e5c6-2136-434d-9417-37abc9b4be6e-image.png
                2318239b-370b-4e1a-8b2d-413204b2eac5-image.png

                1 Reply Last reply Reply Quote 0
                • C Offline
                  CJ
                  last edited by

                  @olivierlambert Any progress on the attached disks and multiple NBD connections issue?

                  Related, should we see any performance difference related to the number of NBD connections? I went from 4 to 1 and my backups are still taking the same amount of time.

                  1 Reply Last reply Reply Quote 0
                  • olivierlambertO Offline
                    olivierlambert Vates 🪐 Co-Founder CEO
                    last edited by

                    I'm not the right person to ask, I'm not tracking this in details. In our own prod, with use more concurrency with 1x NBD connection and that's the best combo I found so far.

                    C 1 Reply Last reply Reply Quote 0
                    • C Offline
                      CJ @olivierlambert
                      last edited by

                      @olivierlambert Is there a specific person we should ping or link to watch to get updates on the status?

                      1 Reply Last reply Reply Quote 0
                      • olivierlambertO Offline
                        olivierlambert Vates 🪐 Co-Founder CEO
                        last edited by

                        @florent is the main backup guy, but he's ultra busy. No guarantee, so the best isn't to ping anyone in particular and see if you have some feedback. If it's a priority, go directly on the pro support. But we'll do our best to answer here, however it can't be a priority vs support ticket.

                        C 1 Reply Last reply Reply Quote 0
                        • Tristis OrisT Offline
                          Tristis Oris Top contributor
                          last edited by Tristis Oris

                          as for about today commit https://github.com/vatesfr/xen-orchestra/commit/ad8cd3791b9459b06d754defa657c97b66261eb3 - migraion still failing.

                          0 fbeauchamp committed to vatesfr/xen-orchestra
                          fix(xo-server): migration of vm with cbt enabled disk (#8017)
                          olivierlambertO 1 Reply Last reply Reply Quote 0
                          • olivierlambertO Offline
                            olivierlambert Vates 🪐 Co-Founder CEO @Tristis Oris
                            last edited by

                            @Tristis-Oris Can you be more specific? What output do you exactly have?

                            Tristis OrisT 1 Reply Last reply Reply Quote 0
                            • Tristis OrisT Offline
                              Tristis Oris Top contributor @olivierlambert
                              last edited by

                              @olivierlambert

                              vdi.migrate
                              {
                                "id": "1d536c76-1ee7-41aa-93ff-7c7a297e2e80",
                                "sr_id": "9a80cc74-a807-0475-1cc9-b0e42ffc7bf9"
                              }
                              {
                                "code": "SR_BACKEND_FAILURE_46",
                                "params": [
                                  "",
                                  "The VDI is not available [opterr=Error scanning VDI b3e09a17-9b08-48e5-8b47-93f16979b045]",
                                  ""
                                ],
                                "task": {
                                  "uuid": "a6db64c5-b9d2-946c-3cfd-59cd8c4c4586",
                                  "name_label": "Async.VDI.pool_migrate",
                                  "name_description": "",
                                  "allowed_operations": [],
                                  "current_operations": {},
                                  "created": "20240930T07:54:12Z",
                                  "finished": "20240930T07:54:30Z",
                                  "status": "failure",
                                  "resident_on": "OpaqueRef:223881b6-1309-40e6-9e42-5ad74a274d2d",
                                  "progress": 1,
                                  "type": "<none/>",
                                  "result": "",
                                  "error_info": [
                                    "SR_BACKEND_FAILURE_46",
                                    "",
                                    "The VDI is not available [opterr=Error scanning VDI b3e09a17-9b08-48e5-8b47-93f16979b045]",
                                    ""
                                  ],
                                  "other_config": {},
                                  "subtask_of": "OpaqueRef:NULL",
                                  "subtasks": [],
                                  "backtrace": "(((process xapi)(filename ocaml/xapi-client/client.ml)(line 7))((process xapi)(filename ocaml/xapi-client/client.ml)(line 19))((process xapi)(filename ocaml/xapi-client/client.ml)(line 12359))((process xapi)(filename lib/xapi-stdext-pervasives/pervasiveext.ml)(line 24))((process xapi)(filename lib/xapi-stdext-pervasives/pervasiveext.ml)(line 35))((process xapi)(filename ocaml/xapi/message_forwarding.ml)(line 134))((process xapi)(filename lib/xapi-stdext-pervasives/pervasiveext.ml)(line 24))((process xapi)(filename lib/xapi-stdext-pervasives/pervasiveext.ml)(line 35))((process xapi)(filename lib/xapi-stdext-pervasives/pervasiveext.ml)(line 24))((process xapi)(filename ocaml/xapi/rbac.ml)(line 205))((process xapi)(filename ocaml/xapi/server_helpers.ml)(line 95)))"
                                },
                                "message": "SR_BACKEND_FAILURE_46(, The VDI is not available [opterr=Error scanning VDI b3e09a17-9b08-48e5-8b47-93f16979b045], )",
                                "name": "XapiError",
                                "stack": "XapiError: SR_BACKEND_FAILURE_46(, The VDI is not available [opterr=Error scanning VDI b3e09a17-9b08-48e5-8b47-93f16979b045], )
                                  at Function.wrap (file:///opt/xo/xo-builds/xen-orchestra-202409301043/packages/xen-api/_XapiError.mjs:16:12)
                                  at default (file:///opt/xo/xo-builds/xen-orchestra-202409301043/packages/xen-api/_getTaskResult.mjs:13:29)
                                  at Xapi._addRecordToCache (file:///opt/xo/xo-builds/xen-orchestra-202409301043/packages/xen-api/index.mjs:1041:24)
                                  at file:///opt/xo/xo-builds/xen-orchestra-202409301043/packages/xen-api/index.mjs:1075:14
                                  at Array.forEach (<anonymous>)
                                  at Xapi._processEvents (file:///opt/xo/xo-builds/xen-orchestra-202409301043/packages/xen-api/index.mjs:1065:12)
                                  at Xapi._watchEvents (file:///opt/xo/xo-builds/xen-orchestra-202409301043/packages/xen-api/index.mjs:1238:14)"
                              }
                              

                              SMlog

                              Sep 30 10:54:11 srv SM: [20535] lock: opening lock file /var/lock/sm/1d536c76-1ee7-41aa-93ff-7c7a297e2e80/vdi
                              Sep 30 10:54:11 srv SM: [20535] lock: acquired /var/lock/sm/1d536c76-1ee7-41aa-93ff-7c7a297e2e80/vdi
                              Sep 30 10:54:11 srv SM: [20535] Pause for 1d536c76-1ee7-41aa-93ff-7c7a297e2e80
                              Sep 30 10:54:11 srv SM: [20535] Calling tap pause with minor 2
                              Sep 30 10:54:11 srv SM: [20535] ['/usr/sbin/tap-ctl', 'pause', '-p', '12281', '-m', '2']
                              Sep 30 10:54:11 srv SM: [20535]  = 0
                              Sep 30 10:54:11 srv SM: [20535] lock: released /var/lock/sm/1d536c76-1ee7-41aa-93ff-7c7a297e2e80/vdi
                              Sep 30 10:54:12 srv SM: [20545] lock: opening lock file /var/lock/sm/1d536c76-1ee7-41aa-93ff-7c7a297e2e80/vdi
                              Sep 30 10:54:12 srv SM: [20545] lock: acquired /var/lock/sm/1d536c76-1ee7-41aa-93ff-7c7a297e2e80/vdi
                              Sep 30 10:54:12 srv SM: [20545] Unpause for 1d536c76-1ee7-41aa-93ff-7c7a297e2e80
                              Sep 30 10:54:12 srv SM: [20545] Realpath: /dev/VG_XenStorage-d8c3a5f0-6446-6bc0-79d0-749a3a138662/VHD-1d536c76-1ee7-41aa-93ff-7c7a297e2e80
                              Sep 30 10:54:12 srv SM: [20545] Setting LVM_DEVICE to /dev/disk/by-scsid/3600c0ff000524e513777c56301000000
                              Sep 30 10:54:12 srv SM: [20545] lock: opening lock file /var/lock/sm/d8c3a5f0-6446-6bc0-79d0-749a3a138662/sr
                              Sep 30 10:54:12 srv SM: [20545] LVMCache created for VG_XenStorage-d8c3a5f0-6446-6bc0-79d0-749a3a138662
                              Sep 30 10:54:12 srv SM: [20545] lock: opening lock file /var/lock/sm/.nil/lvm
                              Sep 30 10:54:12 srv SM: [20545] ['/sbin/vgs', '--readonly', 'VG_XenStorage-d8c3a5f0-6446-6bc0-79d0-749a3a138662']
                              Sep 30 10:54:12 srv SM: [20545]   pread SUCCESS
                              Sep 30 10:54:12 srv SM: [20545] Entering _checkMetadataVolume
                              Sep 30 10:54:12 srv SM: [20545] LVMCache: will initialize now
                              Sep 30 10:54:12 srv SM: [20545] LVMCache: refreshing
                              Sep 30 10:54:12 srv SM: [20545] lock: acquired /var/lock/sm/.nil/lvm
                              Sep 30 10:54:12 srv SM: [20545] ['/sbin/lvs', '--noheadings', '--units', 'b', '-o', '+lv_tags', '/dev/VG_XenStorage-d8c3a5f0-6446-6bc0-79d0-749a3a138662']
                              Sep 30 10:54:12 srv SM: [20545]   pread SUCCESS
                              Sep 30 10:54:12 srv SM: [20545] lock: released /var/lock/sm/.nil/lvm
                              Sep 30 10:54:12 srv SM: [20545] lock: acquired /var/lock/sm/.nil/lvm
                              Sep 30 10:54:12 srv SM: [20545] lock: released /var/lock/sm/.nil/lvm
                              Sep 30 10:54:12 srv SM: [20545] Calling tap unpause with minor 2
                              Sep 30 10:54:12 srv SM: [20545] ['/usr/sbin/tap-ctl', 'unpause', '-p', '12281', '-m', '2', '-a', 'vhd:/dev/VG_XenStorage-d8c3a5f0-6446-6bc0-79d0-749a3a138662/VHD-1d5
                              36c76-1ee7-41aa-93ff-7c7a297e2e80']
                              Sep 30 10:54:12 srv SM: [20545]  = 0
                              Sep 30 10:54:12 srv SM: [20545] lock: released /var/lock/sm/1d536c76-1ee7-41aa-93ff-7c7a297e2e80/vdi
                              
                              1 Reply Last reply Reply Quote 0
                              • olivierlambertO Offline
                                olivierlambert Vates 🪐 Co-Founder CEO
                                last edited by olivierlambert

                                Your issue seems to be related to a storage problem regarding the VDI b3e09a17-9b08-48e5-8b47-93f16979b045. If your SR cannot be scanned due to whatever issue in it, you won't be able to do any operation, snapshot or migrate. I have the impression this problem isn't related at all with CBT.

                                Tristis OrisT 1 Reply Last reply Reply Quote 0
                                • Tristis OrisT Offline
                                  Tristis Oris Top contributor @olivierlambert
                                  last edited by

                                  @olivierlambert Probably you right. i got that error with both pool's physical SR, but at other pools disks migration fine. So again iscsi problems?

                                  1 Reply Last reply Reply Quote 0
                                  • C Offline
                                    CJ @olivierlambert
                                    last edited by

                                    @olivierlambert There's a current workaround with NBD connections set to 1 so it's not a priority. I was just looking for a way to keep an eye on the status of any work on it so I can help test, etc.

                                    1 Reply Last reply Reply Quote 0
                                    • ForzaF Offline
                                      Forza
                                      last edited by

                                      Hi!

                                      Great work with the CBT feature. I noticed it is now included in the "stable" branch which is good news. Is there a summary of the different settings, how they relate and the considerations to take when choosing these options? The XOA documentation, I think, isn't updated yet (searched for CBT with no results).

                                      Backup view

                                      ec535cab-eee2-4de6-a14e-2f1b80e5c2f7-image.png

                                      VM disk view

                                      038a31fc-ee78-42a4-b27f-37c668aefa27-image.png

                                      Pool network view

                                      2b9d7c3a-61b1-4465-bfe4-4f9091130065-image.png

                                      1 Reply Last reply Reply Quote 0
                                      • olivierlambertO Offline
                                        olivierlambert Vates 🪐 Co-Founder CEO
                                        last edited by

                                        1. If you tick the box, CBT will be used for any NBD enabled network (otherwise fallback to regular VHD export). If the box is not tick, then it will never use NBD.
                                        2. If you tick the box purge snapshot, the snap will be removed and it rely on CBT metadata to do the delta. Otherwise, the snapshot will be kept.
                                        3. NBD connections per disk: we can try to get multiple NBD blocks downloaded in parallel to try to speed up stuff; but in the end, it seems to cause more issues than really improve the transfer speed.
                                        1 Reply Last reply Reply Quote 0
                                        • S Offline
                                          StormMaster
                                          last edited by

                                          To follow up on the backups that break, I have been able to create a reproducible scenario for causing backups to fail with the error message... "can't create a stream from a metadata VDI, fall back to a base"

                                          Hardware:

                                          • 4 XCP-NG hosts (Error is reproducible regardless of which host is hosting the VM.)
                                          • 1 TrueNAS file server providing the NFS share the XCP-NG hosts use for running the VMs.
                                          • 1 TrueNAS file server providing the NFS share for backing up VMs in Continuous Replication mode.
                                          • 1 TrueNAS file server providing the NFS share for backing up VMs in Delta Backups.

                                          Backup Order:

                                          • Full backup in Continuous Replication mode
                                          • Full backup in Delta Backup mode
                                          • Multiple incremental backups in Continuous Replication mode (Not sure of the exact minimum but my servers ran the backup 7 times yesterday.)
                                          • At this point, running an incremental backup of a Delta backup will fail with "can't create a stream from a metadata VDI, fall back to a base"

                                          After this backup has failed, attempting to run any incremental backup will fail. Even the Continuous Replication backups that were working correctly prior to the Delta Backup attempt.

                                          • NDB Connections can be any amount.
                                          • With NBD + CBT = True
                                          • Purge snapshots when using CBP = True

                                          Not purging snapshots generates the same error but also causes problems with the VDI chain when it happens.

                                          All retention policies are set to 1.

                                          This was tested on the current latest version of XCP-NG on all hosts and with commit 0a28a and multiple commits prior to it of the XO Community Edition.

                                          1 Reply Last reply Reply Quote 0
                                          • olivierlambertO Offline
                                            olivierlambert Vates 🪐 Co-Founder CEO
                                            last edited by

                                            @florent feedback for testing this 🙂

                                            S 1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post