XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    huge number of api call "sr.getAllUnhealthyVdiChainsLength" in tasks

    Scheduled Pinned Locked Moved Xen Orchestra
    28 Posts 8 Posters 4.2k Views 8 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • robytR Offline
      robyt
      last edited by

      Hi, i've this situation in Task
      6bd4ea87-1a46-48b4-a92f-0edba988c385-immagine.png
      What's this task?
      Now i've ~50 of this call..

      M 1 Reply Last reply Reply Quote 1
      • M Online
        manilx @robyt
        last edited by

        @robyt I have the same here. Started a while ago.
        And in addition I have a VM which doesn't coalesce (dashboard-health shows 'VDIs with invalid parent VHD') and I already made a clone but after a while (backups) I get the same again. I also tried the solution in https://docs.xenserver.com/en-us/xenserver/8/storage/manage.html#reclaim-space-by-using-the-offline-coalesce-tool but that resulted in 'VM has no leaf-coalesceable VDIs'.

        Might all be related.

        F 1 Reply Last reply Reply Quote 0
        • F Offline
          flakpyro @manilx
          last edited by

          My home install of XO from the sources is also doing this.

          1 Reply Last reply Reply Quote 1
          • olivierlambertO Offline
            olivierlambert Vates 🪐 Co-Founder CEO
            last edited by

            Ping @julien-f

            F 1 Reply Last reply Reply Quote 0
            • F Offline
              flakpyro @olivierlambert
              last edited by flakpyro

              Updating to the latest commit (93833) and its still happening. Backups are also no longer functioning for me as seen below:

              cccb07e9-7210-46ee-8de1-2dfb5fc30121-image.png

              It seems editing the backup job and disabling NBD/CBT in advanced settings gets backups back up and running again however the health check shows all my VDIs as unhealthy and needing to coalesce.

              As a test I tried creating an entirely new job with CBT enabled and while the job finishes now the task lingers and never goes away under tasks until you restart the toolstack of the host.

              Edit: Rolling back to commit 3c13e and removing all .cbtlog files from my SR got backups back up and running and made the sr.getAllUnhealthyVdiChainsLength loop end. Im guessing theres been a lot of change to enable CBT and something is misbehaving somewhere.

              M 2 Replies Last reply Reply Quote 0
              • M Online
                manilx @flakpyro
                last edited by

                @flakpyro Same here!

                All related:

                https://xcp-ng.org/forum/topic/9242/retry-the-vm-backup-due-to-an-error-error-handle_invalid-vdi/9?_=1719430401473

                https://xcp-ng.org/forum/topic/9215/backups-started-failing-error-vdi-must-be-free-or-attached-to-exactly-one-vm

                1 Reply Last reply Reply Quote 1
                • olivierlambertO Offline
                  olivierlambert Vates 🪐 Co-Founder CEO
                  last edited by

                  Hi,

                  We added 2 fixes in latest commit on master, please restart your XAPI (to clear all tasks), and try again with your updated XO 🙂

                  M ryan.gravelR 2 Replies Last reply Reply Quote 0
                  • M Online
                    manilx @olivierlambert
                    last edited by

                    @olivierlambert Still broken backups
                    ScreenShot 2024-06-27 at 18.02.12.png

                    1 Reply Last reply Reply Quote 0
                    • M Online
                      manilx @flakpyro
                      last edited by

                      @flakpyro said in huge number of api call "sr.getAllUnhealthyVdiChainsLength" in tasks:

                      removing all .cbtlog files from my SR got backups back up and running

                      What did you do exctly to remove the .cbtlog files? Where are they located?

                      I find that after backups with the "broken" versions (also the latest with the 2 fixes) I get my coalescing stopped. And checking the logs (cat /var/log/SMlog | grep -i coalesce) I have entries like

                      ['/usr/sbin/cbt-util', 'coalesce', '-p', '/var/run/sr-mount/ea3c92b7-0e82-5726-502b-482b40a8b097/0cc54304-4961-482f-a1b7-a8222dd143a1.cbtlog', '-c', '/var/run/sr-mount/ea3c92b7-0e82-5726-502b-482b40a8b097/fe11c7f1-7331-427e-8fa9-76412a2cbb75.cbtlog
                      

                      And those correspond to the hanged coalesces...

                      When I reboot the host they disappear but on next delta backup I have the issue again.

                      F 1 Reply Last reply Reply Quote 0
                      • F Offline
                        flakpyro @manilx
                        last edited by

                        @manilx the .cbt logs are located on your SR (/run/sr-mount/UUID) i don't totally know if removing them is what fixed things for me but once backups started failing my goal was to "revert" everything back to a known good working state. Doing this, reverting XOA and restarting the tool stack on the host got things back to how they were.

                        I'm very much looking forward to CBT backups though as i've been spoiled by Veeam / ESX CBT backups for a lot of years!

                        1 Reply Last reply Reply Quote 0
                        • ryan.gravelR Offline
                          ryan.gravel @olivierlambert
                          last edited by

                          @olivierlambert I'm not sure that I am doing it correctly. XOA is up to date (Master, commit 6ee7a) and I went through and re scanned the storage repositories to get snapshots to coalesce. Backups work properly now.

                          I have 50 or so instances of the following in XO tasks

                          • API call: sr.getAllUnhealthyVdiChainsLength
                          • API call: sr.getVdiChainsInfo
                          • API call: sr.reclaimSpace
                          • API call: proxy.getAll
                          • API call: session.getUser
                          • API call: pool.listMissingPatches

                          I've rebooted the host and ran 'xe-toolstack-restart' but I'm getting the same results. I must be missing something.

                          Thanks for your help.

                          1 Reply Last reply Reply Quote 0
                          • olivierlambertO Offline
                            olivierlambert Vates 🪐 Co-Founder CEO
                            last edited by

                            Hi,

                            XO tasks aren't related to XAPI tasks, it's normal to see those. They were hidden before, now it's helpful to see when we have so much calls 🙂

                            1 Reply Last reply Reply Quote 1
                            • olivierlambertO Offline
                              olivierlambert Vates 🪐 Co-Founder CEO
                              last edited by

                              FYI, CBT was reverted, we'll continue to fix bugs on it in a dedicated branch for people to test (with a community and Vates joint test campaign).

                              M 1 Reply Last reply Reply Quote 2
                              • M Online
                                manilx @olivierlambert
                                last edited by

                                @olivierlambert I applaud this decision. Running stable for many many months suddenly to have all these issues, without really knowing what caused it and how to resolve has been a nightmare for me 😞

                                I'm all in for those new features (!) but if updating to new build to get fixes results in those issues is not good.

                                I've spent many hours lately, rebooted the hosts more times than the last 5 years to get backups/coalesce working again and still am not sure if everything is back to normal....

                                So, yes, this is a good decision.

                                1 Reply Last reply Reply Quote 1
                                • olivierlambertO Offline
                                  olivierlambert Vates 🪐 Co-Founder CEO
                                  last edited by

                                  This thread will be used to centralize all backup issues related to CBT: https://xcp-ng.org/forum/topic/9268/cbt-the-thread-to-centralize-your-feedback

                                  F 1 Reply Last reply Reply Quote 1
                                  • F Offline
                                    flakpyro @olivierlambert
                                    last edited by

                                    @olivierlambert

                                    After i seen the CBT stuff was reverted on github, i updated to the latest commit on my home server, (253aa) i can report my backups are now working as they should and coalesce runs without issues leaving a clean health check dashboard again. :). Glad to see this has been held back on XOA as well as i was planning to stay on 5.95.1 otherwise! Looking forward to CBT eventually all the same!

                                    M 1 Reply Last reply Reply Quote 0
                                    • M Online
                                      manilx @flakpyro
                                      last edited by

                                      @flakpyro Just updated, run my delta backups and all is fine 🎂

                                      M 1 Reply Last reply Reply Quote 0
                                      • M Online
                                        manilx @manilx
                                        last edited by

                                        @manilx Was fine for a short while only: https://xcp-ng.org/forum/topic/9275/starting-getting-again-retry-the-vm-backup-due-to-an-error-error-vdi-must-be-free-or-attached-to-exactly-one-vm

                                        1 Reply Last reply Reply Quote 0
                                        • rvreugdeR Offline
                                          rvreugde
                                          last edited by rvreugde

                                          Same problem here 300+ sessions and counting
                                          07564582-c58a-490e-8921-5f5de0e98fa6-image.png
                                          XAPI restart did not solve the issue...

                                          1 Reply Last reply Reply Quote 1
                                          • ryan.gravelR Offline
                                            ryan.gravel
                                            last edited by

                                            Doesn't seem to be respecting [NOBAK] anymore for storage repos. Tried '[NOSNAP][NOBAK] StorageName' and it still grabs it.

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post