CBT: the thread to centralize your feedback

Andrew

@CJ Do you have Number of NBD connection per disk set to 1 (one), or is it set higher? If it's set higher than 1, try setting to back to 1. I have the same problem when I use higher than 1.

CJ

@Andrew I have it set to 4. But it's only these 3 VMs, not all of the VMs part of the backup job.

olivierlambert

Try to set it to 1 first and see if you still have the same problem (after cleaning the previous left overs)

CJ

@olivierlambert That's the really weird part. Other than the notice on the dashboard, I'm not having any problems. The backups are completing successfully. Which is a definite change from before I updated everything. Then the backups would fail for any VM with an attached disk.

rtjdamen

@CJ we have the number of nbd connections also set to 1, did some testing with more but had issues with it and it gave no performance improvement. Maybe this is causing your issue?

CJ

This is odd. It seems to need to get to a certain point before backups start failing. I have the one VM with 3 disks attached to the control, the other two with only one disk each attached, and now a fourth VM with only one disk attached. However, the backup only failed the original three VMs. The backup failed with "VDI_IN_USE(OpaqueRef:UUID, destroy)".

I've changed the number of NBD connections to 1 so we'll see if that stops the attachment issue.

There appears to be a problem with the backup report email, however. It states "Success: 0/N" while the actual job report shows that only the three VMs failed and the others succeeded.

Andrew

@olivierlambert @florent @CJ Backups have been much more stable since the latest XO update 10-Sep-2024 (XOA 5.98.1, master commit 4c7acc1).

Running CR and CBT/NBD of 2 connections does not leave stranded VDIs any more (at least I have not seen any yet).

rtjdamen

@Andrew we see the same behavior here, no strange backup issues so far!

CJ

No attached disks so far, but I'll wait until next week to bump up the NBD connections to make sure.

CJ

Unfortunately, as soon as I bumped the NBD connections up to 2 I got an attached disk. It doesn't seem like the latest changes have fixed the issue.

Xen Orchestra, commit 74e6f

Delgado

This post is deleted!

rtjdamen

@florent deployed a fix last week that resolved the vdi_in_use errors, however after updating tot the latest XOA release that problem came back and is not resolved anymore. Not shure if this is a new issue or that it is having issues with the fix.

rtjdamen

@CJ seems like issue with the nbd connections then… hope this is something that can be fixed easy.

flakpyro

In relation to the issues i have been seeing about "can't create a stream from a metadata VDI, fall back to a base " after preforming a VM migration from one host to another i notice i also see the following in the SMLog.

Note: i also see this in the SMLog on the pool master after a VM migration even if i don't have snapshot delete enabled but simply have NBD + CBT Enabled. However the regular delta backup will proceed anyway and works fine in that case. (With snap delete disabled) With Snap delete i will see "can't create a stream from a metadata VDI, fall back to a base". Running the job again after this will produce no error in SMLog. Only after a VM migration between hosts will this appear.

Log snippit:

Sep 17 21:07:40 xcpng-prd-03 SM: [1126578] lock: opening lock file /var/lock/sm/afd3edac-3659-4253-8d6e-76062399579c/cbtlog
Sep 17 21:07:40 xcpng-prd-03 SM: [1126578] lock: acquired /var/lock/sm/afd3edac-3659-4253-8d6e-76062399579c/cbtlog
Sep 17 21:07:40 xcpng-prd-03 SM: [1126578] ['/usr/sbin/cbt-util', 'get', '-n', '/var/run/sr-mount/16e4ecd2-583e-e2a0-5d3d-8e53ae9c1429/afd3edac-3659-4253-8d6e-76062399579c.cbtlog', '-c']
Sep 17 21:07:40 xcpng-prd-03 SM: [1126578]   pread SUCCESS
Sep 17 21:07:40 xcpng-prd-03 SM: [1126578] lock: released /var/lock/sm/afd3edac-3659-4253-8d6e-76062399579c/cbtlog
Sep 17 21:07:40 xcpng-prd-03 SM: [1126578] Raising exception [460, Failed to calculate changed blocks for given VDIs. [opterr=Source and target VDI are unrelated]]
Sep 17 21:07:40 xcpng-prd-03 SM: [1126578] ***** generic exception: vdi_list_changed_blocks: EXCEPTION <class 'xs_errors.SROSError'>, Failed to calculate changed blocks for given VDIs. [opterr=Source and target VDI are unrelated]
Sep 17 21:07:40 xcpng-prd-03 SM: [1126578]   File "/opt/xensource/sm/SRCommand.py", line 111, in run
Sep 17 21:07:40 xcpng-prd-03 SM: [1126578]     return self._run_locked(sr)
Sep 17 21:07:40 xcpng-prd-03 SM: [1126578]   File "/opt/xensource/sm/SRCommand.py", line 161, in _run_locked
Sep 17 21:07:40 xcpng-prd-03 SM: [1126578]     rv = self._run(sr, target)
Sep 17 21:07:40 xcpng-prd-03 SM: [1126578]   File "/opt/xensource/sm/SRCommand.py", line 326, in _run
Sep 17 21:07:40 xcpng-prd-03 SM: [1126578]     return target.list_changed_blocks()
Sep 17 21:07:40 xcpng-prd-03 SM: [1126578]   File "/opt/xensource/sm/VDI.py", line 757, in list_changed_blocks
Sep 17 21:07:40 xcpng-prd-03 SM: [1126578]     "Source and target VDI are unrelated")
Sep 17 21:07:40 xcpng-prd-03 SM: [1126578]
Sep 17 21:07:40 xcpng-prd-03 SM: [1126578] ***** NFS VHD: EXCEPTION <class 'xs_errors.SROSError'>, Failed to calculate changed blocks for given VDIs. [opterr=Source and target VDI are unrelated]
Sep 17 21:07:40 xcpng-prd-03 SM: [1126578]   File "/opt/xensource/sm/SRCommand.py", line 385, in run
Sep 17 21:07:40 xcpng-prd-03 SM: [1126578]     ret = cmd.run(sr)
Sep 17 21:07:40 xcpng-prd-03 SM: [1126578]   File "/opt/xensource/sm/SRCommand.py", line 111, in run
Sep 17 21:07:40 xcpng-prd-03 SM: [1126578]     return self._run_locked(sr)
Sep 17 21:07:40 xcpng-prd-03 SM: [1126578]   File "/opt/xensource/sm/SRCommand.py", line 161, in _run_locked
Sep 17 21:07:40 xcpng-prd-03 SM: [1126578]     rv = self._run(sr, target)
Sep 17 21:07:40 xcpng-prd-03 SM: [1126578]   File "/opt/xensource/sm/SRCommand.py", line 326, in _run
Sep 17 21:07:40 xcpng-prd-03 SM: [1126578]     return target.list_changed_blocks()
Sep 17 21:07:40 xcpng-prd-03 SM: [1126578]   File "/opt/xensource/sm/VDI.py", line 757, in list_changed_blocks
Sep 17 21:07:40 xcpng-prd-03 SM: [1126578]     "Source and target VDI are unrelated")
Sep 17 21:07:40 xcpng-prd-03 SM: [1126578]
Sep 17 21:07:40 xcpng-prd-03 SM: [1126578] lock: closed /var/lock/sm/afd3edac-3659-4253-8d6e-76062399579c/cbtlog
Sep 17 21:07:40 xcpng-prd-03 SM: [1126578] lock: closed /var/lock/sm/16e4ecd2-583e-e2a0-5d3d-8e53ae9c1429/sr
Sep 17 21:07:40 xcpng-prd-03 SM: [1126556] FileVDI._snapshot for c56e5d87-1486-41da-86d4-92ede62de75a (type 2)
Sep 17 21:07:40 xcpng-prd-03 SM: [1126556] ['uuidgen', '-r']
Sep 17 21:07:40 xcpng-prd-03 SM: [1126556]   pread SUCCESS

olivierlambert

I wonder if the migration is not also migrating the VDI with it, which shouldn't be the case. What are you doing exactly to migrate the VM?

flakpyro

@olivierlambert Im leaving the VM itself on the shared NFS SR. The above error was triggered by putting a host in maintenance mode via XOA to install the RC2 update via ISO yesterday. Other times its just by checking off a number of VMs in XOA and clicking the migrate button, selecting another host within the same pool and clicking Ok. Everything should be staying on the same shared SR.

Its like the hosts can't read each others metadata for some reason?

olivierlambert

Can you show me the exact UI steps you do in XO to do the migration? Then, in the task, are you seeing anything outside VM migrate?

Delgado

I saw these errors in my log today after starting a replication job on commit 530c3. I have not migrated these VMs to a new host or an SR.

Sep 18 08:17:12  xo-server[6199]: 2024-09-18T12:17:12.861Z xo:xapi:vdi INFO can't get changed block {
Sep 18 08:17:12  xo-server[6199]:   error: XapiError: SR_BACKEND_FAILURE_460(, Failed to calculate changed blocks for given VDIs.  [opterr=Source and target VDI are unrelated], )
Sep 18 08:17:12  xo-server[6199]:       at XapiError.wrap (file:///opt/xo/xo-builds/xen-orchestra-202409180806/packages/xen-api/_XapiError.mjs:16:12)
Sep 18 08:17:12  xo-server[6199]:       at file:///opt/xo/xo-builds/xen-orchestra-202409180806/packages/xen-api/transports/json-rpc.mjs:38:21
Sep 18 08:17:12  xo-server[6199]:       at process.processTicksAndRejections (node:internal/process/task_queues:95:5) {
Sep 18 08:17:12  xo-server[6199]:     code: 'SR_BACKEND_FAILURE_460',
Sep 18 08:17:12  xo-server[6199]:     params: [
Sep 18 08:17:12  xo-server[6199]:       '',
Sep 18 08:17:12  xo-server[6199]:       'Failed to calculate changed blocks for given VDIs. [opterr=Source and target VDI are unrelated], )
Sep 18 08:17:12  xo-server[6199]:       ''
Sep 18 08:17:12  xo-server[6199]:     ],
Sep 18 08:17:12  xo-server[6199]:     call: { method: 'VDI.list_changed_blocks', params: [Array] },
Sep 18 08:17:12  xo-server[6199]:     url: undefined,
Sep 18 08:17:12  xo-server[6199]:     task: undefined
Sep 18 08:17:12  xo-server[6199]:   },
Sep 18 08:17:12  xo-server[6199]:   ref: 'OpaqueRef:a7c534ef-d1d5-0578-a564-05b2c36de7be',
Sep 18 08:17:12  xo-server[6199]:   baseRef: 'OpaqueRef:5d4109f0-5278-64d8-233d-6cd73c8c6d6a'
Sep 18 08:17:12  xo-server[6199]: }
Sep 18 08:17:14  xo-server[6199]: 2024-09-18T12:17:14.459Z xo:xapi:vdi INFO can't get changed block {
Sep 18 08:17:14  xo-server[6199]:   error: XapiError: SR_BACKEND_FAILURE_460(, Failed to calculate changed blocks for given VDIs. [opterr=Source and target VDI are unrelated], )
Sep 18 08:17:14  xo-server[6199]:       at XapiError.wrap (file:///opt/xo/xo-builds/xen-orchestra-202409180806/packages/xen-api/_XapiError.mjs:16:12)
Sep 18 08:17:14  xo-server[6199]:       at file:///opt/xo/xo-builds/xen-orchestra-202409180806/packages/xen-api/transports/json-rpc.mjs:38:21
Sep 18 08:17:14  xo-server[6199]:       at process.processTicksAndRejections (node:internal/process/task_queues:95:5) {
Sep 18 08:17:14  xo-server[6199]:     code: 'SR_BACKEND_FAILURE_460',
Sep 18 08:17:14  xo-server[6199]:     params: [
Sep 18 08:17:14  xo-server[6199]:       '',
Sep 18 08:17:14  xo-server[6199]:       'Failed to calculate changed blocks for given VDIs. [opterr=Source and target VDI are unrelated]',
Sep 18 08:17:14  xo-server[6199]:       ''
Sep 18 08:17:14  xo-server[6199]:     ],
Sep 18 08:17:14  xo-server[6199]:     call: { method: 'VDI.list_changed_blocks', params: [Array] },
Sep 18 08:17:14  xo-server[6199]:     url: undefined,
Sep 18 08:17:14  xo-server[6199]:     task: undefined
Sep 18 08:17:14  xo-server[6199]:   },

CJ

After changing NBD back to 1, I haven't seen any additional attached disks. However, the backup that originally succeeded with an attached disk has now failed. Odd that it will initially work with an attached disk but then fail with an attached disk.

Tristis Oris

i still have one VMs stuck without backup. Already restart it host and halt VM itself. SMlog have no records during 5minutes of that task.

      "result": {
        "code": "VDI_IN_USE",
        "params": [
          "OpaqueRef:1f96d4e7-5ca6-4070-b686-b34dd83e5442",
          "destroy"
        ],
        "task": {
          "uuid": "81a60e3a-c887-13f3-fedc-36eae232a6df",
          "name_label": "Async.VDI.destroy",
          "name_description": "",
          "allowed_operations": [],
          "current_operations": {},
          "created": "20240918T18:03:28Z",
          "finished": "20240918T18:03:28Z",
          "status": "failure",
          "resident_on": "OpaqueRef:223881b6-1309-40e6-9e42-5ad74a274d2d",
          "progress": 1,
          "type": "<none/>",
          "result": "",
          "error_info": [
            "VDI_IN_USE",
            "OpaqueRef:1f96d4e7-5ca6-4070-b686-b34dd83e5442",
            "destroy"
          ],
          "other_config": {},
          "subtask_of": "OpaqueRef:NULL",
          "subtasks": [],
          "backtrace": "(((process xapi)(filename ocaml/xapi/message_forwarding.ml)(line 4711))((process xapi)(filename lib/xapi-stdext-pervasives/pervasiveext.ml)(line 24))((process xapi)(filename ocaml/xapi/rbac.ml)(line 205))((process xapi)(filename ocaml/xapi/server_helpers.ml)(line 95)))"
        },
        "message": "VDI_IN_USE(OpaqueRef:1f96d4e7-5ca6-4070-b686-b34dd83e5442, destroy)",
        "name": "XapiError",
        "stack": "XapiError: VDI_IN_USE(OpaqueRef:1f96d4e7-5ca6-4070-b686-b34dd83e5442, destroy)\n    at XapiError.wrap (file:///opt/xo/xo-builds/xen-orchestra-202409111040/packages/xen-api/_XapiError.mjs:16:12)\n    at default (file:///opt/xo/xo-builds/xen-orchestra-202409111040/packages/xen-api/_getTaskResult.mjs:13:29)\n    at Xapi._addRecordToCache (file:///opt/xo/xo-builds/xen-orchestra-202409111040/packages/xen-api/index.mjs:1041:24)\n    at file:///opt/xo/xo-builds/xen-orchestra-202409111040/packages/xen-api/index.mjs:1075:14\n    at Array.forEach (<anonymous>)\n    at Xapi._processEvents (file:///opt/xo/xo-builds/xen-orchestra-202409111040/packages/xen-api/index.mjs:1065:12)\n    at Xapi._watchEvents (file:///opt/xo/xo-builds/xen-orchestra-202409111040/packages/xen-api/index.mjs:1238:14)\n    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)"
      }
    },

and often get Job canceled to protect the VDI chain errors for others. That continue since bad CBT commit.