XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    SR Garbage Collection running permanently

    Scheduled Pinned Locked Moved Management
    28 Posts 5 Posters 709 Views 4 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Tristis OrisT Offline
      Tristis Oris Top contributor
      last edited by

      It takes few min, but i see that task runing almost non stop for only one storage.
      At VDI list i see something weird:

      4687a0b4-5448-44c4-880f-f87f4bda8428-image.png

      Nothing similar at other pools\sr.

      tjkreidlT 1 Reply Last reply Reply Quote 0
      • DanpD Offline
        Danp Pro Support Team
        last edited by

        Hi Tristis,

        On the GC issue, have you checked the SR's Advanced tab to see if it lists VDI to be coalesced? You may also want to check SMlog to see if there are exceptions preventing the GC from completing.

        For the image, check your backup jobs for the following settings --
        ea0fcdfa-b720-42a4-824c-d34da7875931-image.png

        This behavior should be present on all SRs where these options are enabled on the associated backups.

        Regards, Dan

        Tristis OrisT 1 Reply Last reply Reply Quote 0
        • Tristis OrisT Offline
          Tristis Oris Top contributor @Danp
          last edited by Tristis Oris

          @Danp 0de7e119-8b8e-478f-9cfe-ff1092c9da2e-image.png

          something like this?

          Jan 17 00:00:40 host SMGC: [2648626] *~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*
          Jan 17 00:00:40 host SMGC: [2648626]          ***********************
          Jan 17 00:00:40 host SMGC: [2648626]          *  E X C E P T I O N  *
          Jan 17 00:00:40 host SMGC: [2648626]          ***********************
          Jan 17 00:00:40 host SMGC: [2648626] _doCoalesceLeaf: EXCEPTION <class 'util.SMException'>, Timed out
          Jan 17 00:00:40 host SMGC: [2648626]   File "/opt/xensource/sm/cleanup.py", line 2449, in _liveLeafCoalesce
          Jan 17 00:00:40 host SMGC: [2648626]     self._doCoalesceLeaf(vdi)
          Jan 17 00:00:40 host SMGC: [2648626]   File "/opt/xensource/sm/cleanup.py", line 2483, in _doCoalesceLeaf
          Jan 17 00:00:40 host SMGC: [2648626]     vdi._coalesceVHD(timeout)
          Jan 17 00:00:40 host SMGC: [2648626]   File "/opt/xensource/sm/cleanup.py", line 933, in _coalesceVHD
          Jan 17 00:00:40 host SMGC: [2648626]     self.sr.uuid, abortTest, VDI.POLL_INTERVAL, timeOut)
          Jan 17 00:00:40 host SMGC: [2648626]   File "/opt/xensource/sm/cleanup.py", line 188, in runAbortable
          Jan 17 00:00:40 host SMGC: [2648626]     raise util.SMException("Timed out")
          Jan 17 00:00:40 host SMGC: [2648626]
          Jan 17 00:00:40 host SMGC: [2648626] *~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*
          Jan 17 00:00:40 host SMGC: [2648626] *** UNDO LEAF-COALESCE
          
          
          Jan 17 00:00:40 host SMGC: [2648626] Got sm-config for 3b0ee0f0(2000.000G/616.178G?): {'paused': 'true', 'vhd-blocks': 'eJzsvQ1gHFd1MHrv7Ghn5CjWyHGskeN4R45jCwpB
          ***
          
          Jan 17 00:00:40 host SMGC: [2648626] Unpausing VDI 3b0ee0f0(2000.000G/616.178G?)
          
          DanpD 1 Reply Last reply Reply Quote 0
          • DanpD Offline
            Danp Pro Support Team @Tristis Oris
            last edited by

            @Tristis-Oris said in SR Garbage Collection running permanently:

            something like this?

            Yes. Are you running XCP-ng 8.2.1 or 8.3?

            On the snapshot issue, do you have any idea how they are being created?

            Tristis OrisT 1 Reply Last reply Reply Quote 0
            • Tristis OrisT Offline
              Tristis Oris Top contributor @Danp
              last edited by

              @Danp
              VDIs was migrated 8.2>8.3. Without snapshots, because of CBT.
              Other pools was migrated by same way and have no such issue.

              i can't remember anything special, why that could happens.

              DanpD 1 Reply Last reply Reply Quote 0
              • DanpD Offline
                Danp Pro Support Team @Tristis Oris
                last edited by

                @Tristis-Oris There's a fix for the time out during leaf coalesce on XCP-ng 8.3 that should be released soon. You can install it now with the following command --

                yum install https://updates.xcp-ng.org/8/8.3/testing/x86_64/Packages/sm-3.2.3-1.15.xcpng8.3.x86_64.rpm https://updates.xcp-ng.org/8/8.3/testing/x86_64/Packages/sm-fairlock-3.2.3-1.15.xcpng8.3.x86_64.rpm
                
                Tristis OrisT 1 Reply Last reply Reply Quote 0
                • Tristis OrisT Offline
                  Tristis Oris Top contributor @Danp
                  last edited by

                  @Danp reboot is required?

                  F DanpD 2 Replies Last reply Reply Quote 0
                  • F Offline
                    flakpyro @Tristis Oris
                    last edited by flakpyro

                    @Tristis-Oris

                    I have ran into this numerous times. Its one of the reasons i have not switched to "Purge Snapshot data when using CBT" on all my jobs yet.

                    I hope the fixes in testing solve the issue, what has been fixing it for me in the meantime is modifying the following:

                    Edit
                    /opt/xensource/sm/cleanup.py :
                    Modify LIVE_LEAF_COALESCE_MAX_SIZE and LIVE_LEAF_COALESCE_TIMEOUT to the following values:
                    
                    LIVE_LEAF_COALESCE_MAX_SIZE = 1024 * 1024 * 1024 # bytes
                    LIVE_LEAF_COALESCE_TIMEOUT = 300 # seconds
                    
                    1 Reply Last reply Reply Quote 0
                    • DanpD Offline
                      Danp Pro Support Team @Tristis Oris
                      last edited by

                      @Tristis-Oris Not AFAIK but it wouldn't hurt to be sure that the latest fixes are running.

                      Tristis OrisT 1 Reply Last reply Reply Quote 0
                      • Tristis OrisT Offline
                        Tristis Oris Top contributor @Danp
                        last edited by

                        @Danp

                        • install patch, reboot pool.
                        • GC job started during restart and stuck at 0%, so i restart toolstack again.
                        • now nothing is running, bad snapshots not disappeared.

                        Should i wait longer or?

                        DanpD 1 Reply Last reply Reply Quote 0
                        • DanpD Offline
                          Danp Pro Support Team @Tristis Oris
                          last edited by

                          @Tristis-Oris Did you install the patch on all of the hosts in the pool? Have you tried rescanning an SR to kick off the GC process?

                          Tristis OrisT 1 Reply Last reply Reply Quote 0
                          • Tristis OrisT Offline
                            Tristis Oris Top contributor @Danp
                            last edited by

                            @Danp after some time GC task started automaticaly and running for 1 hour already. Still about 50%.

                            Tristis OrisT 1 Reply Last reply Reply Quote 0
                            • Tristis OrisT Offline
                              Tristis Oris Top contributor @Tristis Oris
                              last edited by

                              @Tristis-Oris GC done, ~5 items removed, ~20 left.

                              1 Reply Last reply Reply Quote 0
                              • olivierlambertO Offline
                                olivierlambert Vates 🪐 Co-Founder CEO
                                last edited by

                                Good, so it is working 🙂

                                Tristis OrisT 1 Reply Last reply Reply Quote 0
                                • Tristis OrisT Offline
                                  Tristis Oris Top contributor @olivierlambert
                                  last edited by

                                  @olivierlambert is it some limit for items removal per run?

                                  1 Reply Last reply Reply Quote 0
                                  • olivierlambertO Offline
                                    olivierlambert Vates 🪐 Co-Founder CEO
                                    last edited by

                                    The GC is doing one chain after another. We told XenServer team back in 2016 that it could probably merge multiple chains at once, but they told us it was too risky. So we did not focus on that. Patience is key there. Clearly, we'll do better in the future.

                                    Tristis OrisT 1 Reply Last reply Reply Quote 0
                                    • Tristis OrisT Offline
                                      Tristis Oris Top contributor @olivierlambert
                                      last edited by

                                      @olivierlambert got it. Will see what happens in few days.

                                      1 Reply Last reply Reply Quote 0
                                      • olivierlambertO Offline
                                        olivierlambert Vates 🪐 Co-Founder CEO
                                        last edited by

                                        It will accelerate. First merges are the slowest ones, but then it's going faster and faster.

                                        1 Reply Last reply Reply Quote 0
                                        • Tristis OrisT Offline
                                          Tristis Oris Top contributor
                                          last edited by

                                          2 days, few backup cycles, snapshots amount won't descrease.

                                          DanpD 1 Reply Last reply Reply Quote 0
                                          • DanpD Offline
                                            Danp Pro Support Team @Tristis Oris
                                            last edited by

                                            @Tristis-Oris Check SMlog for further exceptions.

                                            Tristis OrisT 1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post