XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    CBT: the thread to centralize your feedback

    Scheduled Pinned Locked Moved Backup
    455 Posts 37 Posters 416.9k Views 29 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • R Offline
      rtjdamen @flakpyro
      last edited by

      @flakpyro i understand they did, but not shure if it is allready fixed or not. it had to do with the dedicated migration network being selected. maybe @olivierlambert is aware of the current status?

      F 1 Reply Last reply Reply Quote 0
      • F Offline
        flakpyro @rtjdamen
        last edited by

        @rtjdamen I do know the dedicated migration network was an issue. CBT data would be reset if you preformed a migration using a dedicated migration network, removing the dedicated network was a work around there. The next issue was taking a manual snapshot and removing would sometimes also reset CBT data. Perhaps i need to spin up a test VM and try again and i know there have been a lot of updates.

        1 Reply Last reply Reply Quote 0
        • R Offline
          rtjdamen
          last edited by

          Hi all,

          We recently upgraded our production pools to the latest XCP-ng 8.3 release. After some struggles during the upgrade (mostly around the pool master), everything seems to be running fine now in general.

          However, since upgrading, we’re seeing much longer durations for certain XAPI-related tasks, especially:

          VDI.enable_cbt

          VDI.destroy

          VDI.list_changed_blocks (during backups)

          In some cases, these tasks take up to 25 minutes to complete on specific VMs. Meanwhile, similar operations on other VMs are done in just a few minutes. The behavior is inconsistent but reproducible.

          We’ve checked:

          Storage performance is normal (LVM over local SSD)

          No I/O bottlenecks on the hosts

          No VM performance impact during these tasks

          It seems to affect CBT-enabled VMs more strongly, but we’re only seeing this behavior since the upgrade to 8.3 β€” especially after upgrading the pool master.

          Has anyone else seen this since upgrading?
          Is there a known issue with CBT or coalesce interaction in 8.3?
          Would love to hear if others experience this or have suggestions for tuning.

          1 Reply Last reply Reply Quote 0
          • olivierlambertO Offline
            olivierlambert Vates πŸͺ Co-Founder CEO
            last edited by

            I'm not aware of such issues (at least now) πŸ€” (doesn't mean it doesn't exist more broadly but it's the first report). We'll upgrade our own production to 8.3 relatively soon, so it will be the opportunity to also check internally.

            R 1 Reply Last reply Reply Quote 0
            • R Offline
              rtjdamen @olivierlambert
              last edited by

              @olivierlambert ok, maybe u experience some differences as well, we were on 8.3 since February and did patch last friday, since the patching is see a decrease.

              What i believe is happening is that GC processes are blocking other storage operations. So only if a coalece is done on an sr i see multiple actions like destroy, enable cbt and change block calculations being processed. As far as i know this was not the case before, they also could take some longer but it was not related to (or i have never noticed it).

              Maybe we can confirm if this behavior is by design or not?

              R 1 Reply Last reply Reply Quote 0
              • olivierlambertO Offline
                olivierlambert Vates πŸͺ Co-Founder CEO
                last edited by

                I'm not sure there's a change related to that but I can ask @Team-Storage

                1 Reply Last reply Reply Quote 0
                • R Offline
                  rtjdamen
                  last edited by

                  Anyone else experiencing this issue?
                  https://github.com/vatesfr/xen-orchestra/issues/8713

                  it's a long time bug that i believe is pretty easy to fix and get CBT backups to get more robust. Would be great if this can be implemented!

                  rtjdamen created this issue in vatesfr/xen-orchestra

                  open CBT backup fails repeatedly due to leftover invalid CBT snapshots #8713

                  1 Reply Last reply Reply Quote 0
                  • olivierlambertO Offline
                    olivierlambert Vates πŸͺ Co-Founder CEO
                    last edited by

                    Pinging @florent about this

                    florentF 1 Reply Last reply Reply Quote 0
                    • florentF Offline
                      florent Vates πŸͺ XO Team @olivierlambert
                      last edited by

                      @olivierlambert yes the fix is in the pipeline (on xapi side )
                      it won't migrate a snapshot with cbt enabled, and won't allow to disable cbt on a snapshot

                      R 1 Reply Last reply Reply Quote 1
                      • olivierlambertO Offline
                        olivierlambert Vates πŸͺ Co-Founder CEO
                        last edited by

                        Thanks!

                        1 Reply Last reply Reply Quote 0
                        • R Offline
                          rtjdamen @florent
                          last edited by

                          @florent so not an option solving this inside xoa? Could be usefull for the short term.

                          1 Reply Last reply Reply Quote 0
                          • R Offline
                            rtjdamen @rtjdamen
                            last edited by

                            Support found that automatic refresh of our sr every 30 seconds delayed this. It seems we had this for a longer time but now it’s more aggressive. Disabled this as this is not required. This resolves our issue here.

                            1 Reply Last reply Reply Quote 1
                            • olivierlambertO Offline
                              olivierlambert Vates πŸͺ Co-Founder CEO
                              last edited by

                              Wow, rescan every 30sec? I thought the default value was 10 min or something πŸ€” Do you remember setting it manually at some point?

                              R 1 Reply Last reply Reply Quote 0
                              • R Offline
                                rtjdamen @olivierlambert
                                last edited by

                                @olivierlambert No never changed it. From what i understand from Jon it's default this where his words "and all these SRs have auto-scan enabled. This means every 30 seconds, the pool master will scan the entirety of every SR". We changed this and the problem is gone.

                                1 Reply Last reply Reply Quote 1
                                • olivierlambertO Offline
                                  olivierlambert Vates πŸͺ Co-Founder CEO
                                  last edited by

                                  Okay, I thought the autoscan was only for like 10 minutes or so, but hey I'm not deep down in the stack anymore πŸ˜„

                                  1 Reply Last reply Reply Quote 0
                                  • First post
                                    Last post