XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    CBT: the thread to centralize your feedback

    Scheduled Pinned Locked Moved Backup
    439 Posts 37 Posters 386.5k Views 29 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • AnonabharA Offline
      Anonabhar
      last edited by

      I think I may have a bit of a similar problem here. About a week ago, I did an update to the broken version of XO and it threw the same error as is in the subject line here. I reverted and everything was OK, but then I started to get unhealthy VDI warnings on my backups.

      I tried to rescan the SR and I would see in the SMLog that it believed another GC was running, so it would abort. Rebooting the host was the only way to force the coalesce to complete; however as soon as the next inc-backup ran, it would go into the same problem (the GC thinking another is running and would no do any work).

      I then did a full power off of the host, reboot and let all the VM's sit in a "powered off" state, rescanned the SR and let it coalesce. Once everything was idle, I then deleted all snapshots and waited for the coalesce to finish. Only then did I restart the VM's. Now a few VM's immediately have come up as 'unhealthy' and once again the GC will not run, thinking there is another GC working..

      I'm kind of running out of idea's 8-) Does anyone know what might be stuck or what I need to look for to find out?


      Just a side note here. I noticed that all the VM's that I am having problems with have CBT enabled.

      I have a VM that is a snapshot only VM and even when the coalesces is stuck, I can delete snapshots off this non-cbt VM and the coalesces process runs (then gives an exception when it gets to the VM's that have CBT enabled)

      Is there a way to disable CBT?

      1 Reply Last reply Reply Quote 0
      • florentF Offline
        florent Vates πŸͺ XO Team
        last edited by Danp

        Hello everybody,
        thanks for your feedback, Here is a work branch with CBT enabled : https://github.com/vatesfr/xen-orchestra/pull/7792 . The branch name is fix_cbt

        It fixes :

        • snapshot retention with full backups

        • off by one error for retention length

        • parent locator error

        • can't destructure undefined error

        • it don't leak vdi attached in the dom0 in our lab

        • progress is back on the export task

          Please test it if you can , and don't hesitate to provide feedback

        Regards,

        Florent

        fbeauchamp opened this pull request in vatesfr/xen-orchestra

        draft fix(backups): CBT omnibus #7792

        R D 2 Replies Last reply Reply Quote 0
        • AnonabharA Offline
          Anonabhar
          last edited by Anonabhar

          For those that may be stuck, like I was, I finally have un-done the coaless nightmare the previous CBT did.

          For note: I am using XCP-ng 8.3 Beta fully patched.

          1. What I had to do was shutdown every VM and delete every snapshot
          2. Find every VDI that had CBT enabled and disable it. I did this in a simple bash command (not the best, I know)
          for i in `xe vdi-list cbt-enabled=true | grep "^uuid ( RO)" | cut -d " " -f 20`
          do
               echo $i
               xe vdi-disable-cbt uuid=$i
          done
          
          1. Reboot the server
          2. Create a snapshot on any VM and immidately delete it. (If you just do a rescan, it says that the GC is running when it is not but for whatever reason, deleting a shapshot seems to kick in the GC regardless)
          3. Keep an eye on the SMLog and look for exceptions... I tend to do something like: (It will sleep for 5 minutes - so dont get anxious)
          tail -f /var/log/SMLog | grep SMGC
          
          1. When it finishes, check XO to see if there are any remaining uncoalessed disk and repeat from step 4.

          It took about 5 iterations of the above to finally clean up all the stuck coalessed leafs but it eventually did it. The key, for me, was making sure the VM's were not running and turning CBT off.

          1 Reply Last reply Reply Quote 0
          • R Offline
            rtjdamen @florent
            last edited by

            @florent hi Florent, i would love to help u test this on our lab, i have XO from sources running there, but i have no cbt options, do i need to download it in a specific way?

            1 Reply Last reply Reply Quote 0
            • D Offline
              Delgado @florent
              last edited by

              @florent I'll be more than happy to help. I will get my homelab instance upgraded to that branch and report back with any issues,

              1 Reply Last reply Reply Quote 0
              • olivierlambertO Offline
                olivierlambert Vates πŸͺ Co-Founder CEO
                last edited by

                @rtjdamen You need to switch on fix_cbt branch, like git checkout fix_cbt and rebuild.

                R 1 Reply Last reply Reply Quote 0
                • R Offline
                  rtjdamen @olivierlambert
                  last edited by

                  @olivierlambert thank you, found it, i will run some backups with one or two vms to start with and will report the results.

                  R 1 Reply Last reply Reply Quote 0
                  • R Offline
                    rtjdamen @rtjdamen
                    last edited by

                    This seems to be working fine. Once the backup is complete, we'll execute the vdi_data_destroy command, right? Currently, it doesn't appear obvious that this is a CBT metadata-only snapshot. Is there a way to make this more visible?

                    1 Reply Last reply Reply Quote 0
                    • olivierlambertO Offline
                      olivierlambert Vates πŸͺ Co-Founder CEO
                      last edited by

                      You mean in the VM view/snapshot tab? You are seeing the VM snapshot, not the VDI snapshot, so I wonder if this VM snapshot can be reverted while being CBT metadata only, and if not, we must make it clear in the UI, yes!

                      R A 2 Replies Last reply Reply Quote 0
                      • D Offline
                        Delgado
                        last edited by

                        I enabled cbt on the disks and nbd + cbt in my delta backup and so far so good. I plan on letting another backup run over night. I also ran a full backup and it removed the snapshot like it's supposed to.

                        1 Reply Last reply Reply Quote 0
                        • R Offline
                          rtjdamen @olivierlambert
                          last edited by

                          @olivierlambert yes indeed, this is currently visible like a normal snapshot, i think it should be visible like a metadata only snapshot.

                          1 Reply Last reply Reply Quote 0
                          • R Offline
                            rtjdamen
                            last edited by

                            @florent i have been watching the backup process and in the end i only seed vdi.destroy happening nog vdi.data_destroy is this correct? are we handling this last step correct or do we remain data on the snapshot at this time?

                            1 Reply Last reply Reply Quote 0
                            • florentF Offline
                              florent Vates πŸͺ XO Team
                              last edited by

                              dataDestroy will be enable-able (not sure if it's really a word) today, in he meantime, the

                              Please note that the metadata snapshot won't be visible in the UI since it's not a VM Snapshot, but only the metadata of the vdi snapshots

                              latest commits in the fix_cbt branch add an additionnal check on dom0 connect, more error handling

                              R 1 Reply Last reply Reply Quote 0
                              • R Offline
                                rtjdamen @florent
                                last edited by

                                @florent ok so currently the data remains? When do u think this addition is ready for testing? I am interested as we saw some issues with this on nfs and i am curious if it will make a difference with this code.

                                @olivierlambert i now understand there is in general no difference on coalesce as long as the data destroy is not done. So u were right on that part and it’s safe pushing it this way!

                                florentF 1 Reply Last reply Reply Quote 0
                                • olivierlambertO Offline
                                  olivierlambert Vates πŸͺ Co-Founder CEO
                                  last edited by

                                  Yes, that's why we'll be able to offer a safe route for people not using the data destroy but leave people who want to explore it to do so in opt in πŸ™‚

                                  1 Reply Last reply Reply Quote 0
                                  • florentF Offline
                                    florent Vates πŸͺ XO Team @rtjdamen
                                    last edited by

                                    @rtjdamen it's still fresh, but on the other hand, the worse that can happen is falling back to a full backup. So for now I would not use it on the bigger VM ( multi terabytes )
                                    We are sure that it will be a game changer on thick provisioning ( because snapshot cost the full virtual size) or on fast changing VM , where coalescing an older snapshot is a major hurdle

                                    If everything goes well it will be on stable by the end of july, and we'll probably enable it by default on new backup in the near future

                                    R 3 Replies Last reply Reply Quote 1
                                    • Tristis OrisT Offline
                                      Tristis Oris Top contributor
                                      last edited by

                                      can't commit, too small for ticket.

                                      typo

                                      preferNbdInformation:
                                          'A network accessible by XO or the proxy must have NBD enabled,. Storage must support Change Block Tracking (CBT) to ue it in a backup',
                                      

                                      enabled,.
                                      to ue

                                      1 Reply Last reply Reply Quote 0
                                      • R Offline
                                        rtjdamen @florent
                                        last edited by

                                        This post is deleted!
                                        1 Reply Last reply Reply Quote 0
                                        • Tristis OrisT Offline
                                          Tristis Oris Top contributor
                                          last edited by

                                          updated to fix_cbt branch.

                                          CR NBD backup works.
                                          Delta NBD backup works.
                                          just once, so we can't be sure yet.

                                          No broken tasks is generated.

                                          Still confused why CBT toggle is enabled on some VMs.
                                          2 similars vms on same pool, same storage, same ubuntu version. One is enabled automaticaly, other is not.

                                          1 Reply Last reply Reply Quote 1
                                          • R Offline
                                            rtjdamen @florent
                                            last edited by

                                            @florent i did some testing with the data_destroy branch on my lab, it seems to work as required, indeed the snapshot is hidden when it is cbt only.

                                            What i am not shure is correct, when the data destroy action is done, i would expect a snapshot is showing up for coalesce but it does not. Is it too small, and quick removed so it will not be visible in XOA? on larger vms with our production i can see these snapshots showing for coalesce? Or when you do vdi.data_destroy will it try to coalesce directly without garbage collection afterwards?

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post