XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    CBT: the thread to centralize your feedback

    Scheduled Pinned Locked Moved Backup
    439 Posts 37 Posters 386.6k Views 29 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • R Offline
      rtjdamen @olivierlambert
      last edited by

      @olivierlambert i updated yesterday to the latest version, during the night our backups did run but still some errors.

      I did not see the stream error, however it seems like the same behavior is occuring as we saw with the stream error, but now with error can't create a stream from a metadata VDI
      some of these do have a hanging export job in XOA
      4eb8368a-57f8-4ba3-9060-9327b5a5ffa6-image.png

      fe0a22bf-c898-4f7e-90c0-dec47e934c07-image.png

      1 Reply Last reply Reply Quote 0
      • D Offline
        DG @olivierlambert
        last edited by

        @olivierlambert I updated today to confirm cb6cf and also got this error but only once in multiple backups.

        Both server runs 8.2.1 version with latest updates.

        6be093a9-14ef-4344-98a3-fc2dcb3fad3d-image.png

        1 Reply Last reply Reply Quote 0
        • V Offline
          Vinylrider
          last edited by

          We have upgraded from commit f2188 to cb6cf and have 3 hosts. With none of these hosts backups worked anymore after doing the upgrade. When reverting back to f2188 all works again.
          We also deleted orphaned VDIs and let the garbage collector do its job but it did not helped.

          Hosts/Errors :
          1.) XCP-ng 8.2.1 : "VDI must be free or attached to exactly one VM"
          2.) XCP-ng 8.2.1 (with latest updates): "VDI must be free or attached to exactly one VM"
          3.) XenServer 7.1.0 : "MESSAGE_METHOD_UNKOWN(VDI.get_cbt_enabled)

          With the Xenserver 7.1.0 CBT is not enabled (and can not be enabled).

          D 1 Reply Last reply Reply Quote 0
          • D Offline
            DG @Vinylrider
            last edited by

            @Vinylrider @olivierlambert I found that only backups on the host with the latest updates are having problems eventually with the backup.

            These patches with respective versions was not applied to others 2 hosts.

            xapi-core 1.249.36
            xapi-tests 1.249.36
            xapi-xe 1.249.36
            xen-dom0-libs 4.13.5
            xen-dom0-tools 4.13.5
            xen-hypervisor 4.13.5
            xen-libs 4.13.5
            xen-tools 4.13.5
            xsconsole 10.1.13

            1 Reply Last reply Reply Quote 0
            • A Offline
              Andrew Top contributor @olivierlambert
              last edited by

              @olivierlambert Running XO source master (commit d0bd6) and Delta backup to S3 is looking for an off-line host in the pool, so the backup fails. This host was evacuated and in maintenance mode and being rebooted by XO. The VM being backed up was running on a different host and the master was not on the off-line host. There are several other running hosts in the pool.

              There's no reason XO/XCP should be doing anything with this host...

              "error": "HOST_OFFLINE(OpaqueRef:65b7a047-094b-4c7a-a503-2823e92b9fe4)"

              F 1 Reply Last reply Reply Quote 0
              • F Offline
                flakpyro @Andrew
                last edited by

                So with the latest XO update released this week i experience a new behavior when trying to run a backup after a VM has moved from Host A to Host B (while staying on the same shared NFS SR)

                The new error is "Error: can't create a stream from a metadata VDI, fall back to a base " it then retries and runs a full backup.

                M 1 Reply Last reply Reply Quote 0
                • M Offline
                  manilx @flakpyro
                  last edited by

                  On my CR job I got this error again:
                  IMG_1594.jpeg on all VM’s
                  Next run was ok.
                  Running commit cb6cf

                  R 1 Reply Last reply Reply Quote 0
                  • R Offline
                    rtjdamen @manilx
                    last edited by

                    @manilx we see this error on some backups as well. ot so often as we saw them prior to this version so it seems like it has been a bit better.
                    as a fix i tried setting the retry on backups and this resolves it in most of the situations but sometimes i get this error

                    5cf0d961-3cdc-446a-9c06-887c919fe987-image.png

                    Also we still have the VDI in use errors now and then, vdi-data-destroy is not done in that situation leaving it a normal snapshot with CBT, not such a big deal as it is only on a very small number of vms, however they showup as orphan vdi's in XOA Health page what makes it a bit weird, i think they should not be visible there.

                    1 Reply Last reply Reply Quote 0
                    • J Offline
                      jimmymiller
                      last edited by jimmymiller

                      Has anyone seen issues migrating VDIs once CBT is enabled? We're seeing VDI_CBT_ENABLED errors when we try to live migrate disks between SRs. Obviously disabling CBT on the disk allows for the migration to move forward. 'Users' who have limited access don't seem to see specifics on the error but us as admins get a VDI_CBT_ENABLED error. Ideally I think we'd want to be able to still migrate VDIs with CBT enabled or maybe as a part of a VDI migration process CBT would be disabled temporarily, migrated then re-enabled?

                      User errors:
                      Screenshot 2024-08-07 at 17.42.07.png

                      Admins see:

                      {
                        "id": "7847a7c3-24a3-4338-ab3a-0c1cdbb3a12a",
                        "resourceSet": "q0iE-x7MpAg",
                        "sr_id": "5d671185-66f6-a292-e344-78e5106c3987"
                      }
                      {
                        "code": "VDI_CBT_ENABLED",
                        "params": [
                          "OpaqueRef:aeaa21fc-344d-45f1-9409-8e1e1cf3f515"
                        ],
                        "task": {
                          "uuid": "9860d266-d91a-9d0e-ec2a-a7752fa01a6d",
                          "name_label": "Async.VDI.pool_migrate",
                          "name_description": "",
                          "allowed_operations": [],
                          "current_operations": {},
                          "created": "20240807T21:33:29Z",
                          "finished": "20240807T21:33:29Z",
                          "status": "failure",
                          "resident_on": "OpaqueRef:8d372a96-f37c-4596-9610-1beaf26af9db",
                          "progress": 1,
                          "type": "<none/>",
                          "result": "",
                          "error_info": [
                            "VDI_CBT_ENABLED",
                            "OpaqueRef:aeaa21fc-344d-45f1-9409-8e1e1cf3f515"
                          ],
                          "other_config": {},
                          "subtask_of": "OpaqueRef:NULL",
                          "subtasks": [],
                          "backtrace": "(((process xapi)(filename ocaml/xapi/xapi_vdi.ml)(line 470))((process xapi)(filename ocaml/xapi/message_forwarding.ml)(line 4696))((process xapi)(filename ocaml/xapi/message_forwarding.ml)(line 199))((process xapi)(filename ocaml/xapi/message_forwarding.ml)(line 203))((process xapi)(filename lib/xapi-stdext-pervasives/pervasiveext.ml)(line 42))((process xapi)(filename lib/xapi-stdext-pervasives/pervasiveext.ml)(line 51))((process xapi)(filename ocaml/xapi/message_forwarding.ml)(line 4708))((process xapi)(filename ocaml/xapi/message_forwarding.ml)(line 4711))((process xapi)(filename lib/xapi-stdext-pervasives/pervasiveext.ml)(line 24))((process xapi)(filename lib/xapi-stdext-pervasives/pervasiveext.ml)(line 35))((process xapi)(filename ocaml/xapi/helpers.ml)(line 1503))((process xapi)(filename ocaml/xapi/message_forwarding.ml)(line 4705))((process xapi)(filename lib/xapi-stdext-pervasives/pervasiveext.ml)(line 24))((process xapi)(filename lib/xapi-stdext-pervasives/pervasiveext.ml)(line 35))((process xapi)(filename lib/xapi-stdext-pervasives/pervasiveext.ml)(line 24))((process xapi)(filename ocaml/xapi/rbac.ml)(line 205))((process xapi)(filename ocaml/xapi/server_helpers.ml)(line 95)))"
                        },
                        "message": "VDI_CBT_ENABLED(OpaqueRef:aeaa21fc-344d-45f1-9409-8e1e1cf3f515)",
                        "name": "XapiError",
                        "stack": "XapiError: VDI_CBT_ENABLED(OpaqueRef:aeaa21fc-344d-45f1-9409-8e1e1cf3f515)
                          at Function.wrap (file:///usr/local/lib/node_modules/xo-server/node_modules/xen-api/_XapiError.mjs:16:12)
                          at default (file:///usr/local/lib/node_modules/xo-server/node_modules/xen-api/_getTaskResult.mjs:13:29)
                          at Xapi._addRecordToCache (file:///usr/local/lib/node_modules/xo-server/node_modules/xen-api/index.mjs:1033:24)
                          at file:///usr/local/lib/node_modules/xo-server/node_modules/xen-api/index.mjs:1067:14
                          at Array.forEach (<anonymous>)
                          at Xapi._processEvents (file:///usr/local/lib/node_modules/xo-server/node_modules/xen-api/index.mjs:1057:12)
                          at Xapi._watchEvents (file:///usr/local/lib/node_modules/xo-server/node_modules/xen-api/index.mjs:1230:14)"
                      }```
                      R 1 Reply Last reply Reply Quote 0
                      • R Offline
                        rtjdamen @jimmymiller
                        last edited by

                        @jimmymiller as part of the migration the cbt should be disabled allready, this feature has been created as part of the first release, however it seems that there is a bug that does only disable cbt but leaves the metadata only snapshots on the vm, i believe this is causing the issue.

                        I think this pull request is created to solve this in the next release
                        https://github.com/vatesfr/xen-orchestra/pull/7903

                        MelissaFrncJrg opened this pull request in vatesfr/xen-orchestra

                        open feat(xo-web/disks): allow user to delete snapshots before migrating VDI #7903

                        F 1 Reply Last reply Reply Quote 0
                        • F Offline
                          flakpyro @rtjdamen
                          last edited by

                          So testing CBT in our test environment with migrations and this is what i have observed:

                          Host 1 and Host 2 are in a pool together with a shared NFS SR. If TestVM-01 i son Host 1 using the NFS SR with CBT backups enabled all is fine. Clicking on the VM and then disks shows that CBT is enabled on the drives. If i migrate the VM over to host 2, CBT is disabled and the VM is migrated successfully. On the next backup job run however the job will initially fail with the errror "can't create a stream from a metadata VDI, fall back to a base "...after a retry then the job will run.

                          If multiple jobs exist for a VM. Say a backup job and a replication job, will that result in 2 CBT snapshots then? That is a ton of space savings vs keeping 2 regular snapshots with the old backup method and cuts down on GC time and storage IO by quite a bit!

                          R 1 Reply Last reply Reply Quote 0
                          • R Offline
                            rtjdamen @flakpyro
                            last edited by

                            @flakpyro that’s exactly the reason we were asking for the cbt option to come available. It’s a huge difference in storage usage and the amount of writes done to the storage. Huge improvement!

                            1 Reply Last reply Reply Quote 0
                            • olivierlambertO Offline
                              olivierlambert Vates πŸͺ Co-Founder CEO
                              last edited by

                              In theory, migrating a VM to another host (but keeping the same shared SR) shouldn't re-trigger a full. It only happens when the VDI is migrated to another SR (the VDI UUID will change and the metadata will be lost).

                              At least, we can try to reproduce this internally (I couldn't on my prod).

                              F 1 Reply Last reply Reply Quote 0
                              • F Offline
                                flakpyro @olivierlambert
                                last edited by

                                @olivierlambert Looks like since the last XOA update it no longer triggers a full which is great news. Instead after a migration i see "can't create a stream from a metadata VDI, fall back to a base" when the job next runs. After that it retries and runs as usual.

                                1 Reply Last reply Reply Quote 0
                                • olivierlambertO Offline
                                  olivierlambert Vates πŸͺ Co-Founder CEO
                                  last edited by

                                  Interesting πŸ€” Where are you seeing this message exactly?

                                  F 1 Reply Last reply Reply Quote 0
                                  • F Offline
                                    flakpyro @olivierlambert
                                    last edited by

                                    @olivierlambert
                                    For me it appears in the backup summary on the failed task

                                    823e2f0a-0ac9-4cbc-b872-761164f8752a-image.png

                                    And after the job retries. (I have retries set to 3)

                                    5ca6e731-1f7e-4135-b0fb-d2d958e85c18-image.png

                                    I believe others a few posts up are also seeing this same error message. @rtjdamen @manilx

                                    1 Reply Last reply Reply Quote 0
                                    • olivierlambertO Offline
                                      olivierlambert Vates πŸͺ Co-Founder CEO
                                      last edited by

                                      And only after migrating the VM from a host to another, without changing its SR?

                                      F 1 Reply Last reply Reply Quote 0
                                      • F Offline
                                        flakpyro @olivierlambert
                                        last edited by

                                        @olivierlambert Correct. The VM in question is on a NFS 4 SR. The next backup run will run without the error for as long as it stays on that host, however if i move it again the process repeats it self.

                                        We are running paid XOA so if it helps i can enable a support tunnel for you guys to take a look at logs. However we have quite a few nightly backups now so not sure if the logs will have rotated. Either way its in our test environment so we can run the job anytime to generate fresh data.

                                        1 Reply Last reply Reply Quote 0
                                        • olivierlambertO Offline
                                          olivierlambert Vates πŸͺ Co-Founder CEO
                                          last edited by

                                          That's weird, I'm using RPU in our prod so the VM are moving, but the VDI doesn't change. Anyway, another test to try internally πŸ™‚

                                          R 1 Reply Last reply Reply Quote 0
                                          • R Offline
                                            rtjdamen @olivierlambert
                                            last edited by

                                            @olivierlambert we do not see this error in relation to migration, they just happen at a random situation. I have seen issues with cbt and multidisk vms, could it be that these are multi disk vms?

                                            F 1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post