XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    SR Garbage Collection running permanently

    Scheduled Pinned Locked Moved Management
    28 Posts 5 Posters 714 Views 4 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Tristis OrisT Offline
      Tristis Oris Top contributor @Danp
      last edited by

      @Danp

      • install patch, reboot pool.
      • GC job started during restart and stuck at 0%, so i restart toolstack again.
      • now nothing is running, bad snapshots not disappeared.

      Should i wait longer or?

      DanpD 1 Reply Last reply Reply Quote 0
      • DanpD Offline
        Danp Pro Support Team @Tristis Oris
        last edited by

        @Tristis-Oris Did you install the patch on all of the hosts in the pool? Have you tried rescanning an SR to kick off the GC process?

        Tristis OrisT 1 Reply Last reply Reply Quote 0
        • Tristis OrisT Offline
          Tristis Oris Top contributor @Danp
          last edited by

          @Danp after some time GC task started automaticaly and running for 1 hour already. Still about 50%.

          Tristis OrisT 1 Reply Last reply Reply Quote 0
          • Tristis OrisT Offline
            Tristis Oris Top contributor @Tristis Oris
            last edited by

            @Tristis-Oris GC done, ~5 items removed, ~20 left.

            1 Reply Last reply Reply Quote 0
            • olivierlambertO Offline
              olivierlambert Vates 🪐 Co-Founder CEO
              last edited by

              Good, so it is working 🙂

              Tristis OrisT 1 Reply Last reply Reply Quote 0
              • Tristis OrisT Offline
                Tristis Oris Top contributor @olivierlambert
                last edited by

                @olivierlambert is it some limit for items removal per run?

                1 Reply Last reply Reply Quote 0
                • olivierlambertO Offline
                  olivierlambert Vates 🪐 Co-Founder CEO
                  last edited by

                  The GC is doing one chain after another. We told XenServer team back in 2016 that it could probably merge multiple chains at once, but they told us it was too risky. So we did not focus on that. Patience is key there. Clearly, we'll do better in the future.

                  Tristis OrisT 1 Reply Last reply Reply Quote 0
                  • Tristis OrisT Offline
                    Tristis Oris Top contributor @olivierlambert
                    last edited by

                    @olivierlambert got it. Will see what happens in few days.

                    1 Reply Last reply Reply Quote 0
                    • olivierlambertO Offline
                      olivierlambert Vates 🪐 Co-Founder CEO
                      last edited by

                      It will accelerate. First merges are the slowest ones, but then it's going faster and faster.

                      1 Reply Last reply Reply Quote 0
                      • Tristis OrisT Offline
                        Tristis Oris Top contributor
                        last edited by

                        2 days, few backup cycles, snapshots amount won't descrease.

                        DanpD 1 Reply Last reply Reply Quote 0
                        • DanpD Offline
                          Danp Pro Support Team @Tristis Oris
                          last edited by

                          @Tristis-Oris Check SMlog for further exceptions.

                          Tristis OrisT 1 Reply Last reply Reply Quote 0
                          • tjkreidlT Offline
                            tjkreidl Ambassador @Tristis Oris
                            last edited by tjkreidl

                            @Tristis-Oris Note that if the SR storage device is around 90% or more full, a coalesce may not work. You have to either delete or move then enough storage so that there is adequate free space.
                            How full is the SR? That said, a coalesce process can take up to 24 hours to complete. I wonder if this shows up and with what progress when you run "xe task-list" ?

                            1 Reply Last reply Reply Quote 0
                            • Tristis OrisT Offline
                              Tristis Oris Top contributor @Danp
                              last edited by Tristis Oris

                              @Danp No, can't find any exceptions.
                              that typical log for now, it repeating a lot of times:

                              Jan 20 00:02:00 host SM: [2736362] Kicking GC
                              Jan 20 00:02:00 host SM: [2736362] Kicking SMGC@93d53646-e895-52cf-7c8e-df1d5e84f5e4...
                              Jan 20 00:02:00 host SM: [2736362] utilisation 40394752 <> 34451456
                              *
                              Jan 20 00:02:00 host SM: [2736362] VDIs changed on disk: ['34fa9f2d-95fa-468e-986c-ade22b92b1f3', '56b94e20-01ae-4da1-99f8-03aa901da64f', 'a75c6e7b-7f8d-4a4b-99$
                              Jan 20 00:02:00 host SM: [2736362] Updating VDI with location=34fa9f2d-95fa-468e-986c-ade22b92b1f3 uuid=34fa9f2d-95fa-468e-986c-ade22b92b1f3
                              *
                              Jan 20 00:02:00 host SM: [2736362] lock: released /var/lock/sm/93d53646-e895-52cf-7c8e-df1d5e84f5e4/sr
                              Jan 20 00:02:00 host SMGC: [2736466] === SR 93d53646-e895-52cf-7c8e-df1d5e84f5e4: gc ===
                              Jan 20 00:02:00 host SM: [2736466] lock: opening lock file /var/lock/sm/93d53646-e895-52cf-7c8e-df1d5e84f5e4/running
                              *
                              Jan 20 00:02:00 host SMGC: [2736466] Found 0 cache files
                              Jan 20 00:02:00 host SM: [2736466] lock: tried lock /var/lock/sm/93d53646-e895-52cf-7c8e-df1d5e84f5e4/sr, acquired: True (exists: True)
                              Jan 20 00:02:00 host SM: [2736466] ['/usr/bin/vhd-util', 'scan', '-f', '-m', '/var/run/sr-mount/93d53646-e895-52cf-7c8e-df1d5e84f5e4/*.vhd']
                              Jan 20 00:02:00 host SM: [2736614] lock: opening lock file /var/lock/sm/93d53646-e895-52cf-7c8e-df1d5e84f5e4/sr
                              Jan 20 00:02:00 host SM: [2736614] sr_update {'host_ref': 'OpaqueRef:3570b538-189d-6a16-fe61-f6d73cc545dc', 'command': 'sr_update', 'args': [], 'device_config':$
                              Jan 20 00:02:00 host SM: [2736614] lock: closed /var/lock/sm/93d53646-e895-52cf-7c8e-df1d5e84f5e4/sr
                              Jan 20 00:02:01 host SM: [2736466]   pread SUCCESS
                              *
                              Jan 20 00:02:01 host SMGC: [2736466] SR 93d5 ('VM Sol flash') (73 VDIs in 39 VHD trees):
                              Jan 20 00:02:01 host SMGC: [2736466]         *70119dcb(50.000G/45.497G?)
                              Jan 20 00:02:01 host SMGC: [2736466]             e6a3e53a(50.000G/107.500K?)
                              *
                              Jan 20 00:02:01 host SM: [2736466] lock: released /var/lock/sm/93d53646-e895-52cf-7c8e-df1d5e84f5e4/sr
                              Jan 20 00:02:01 host SMGC: [2736466] Got sm-config for *70119dcb(50.000G/45.497G?): {'vhd-blocks': 'eJzFlrFuwjAQhk/Kg5R36Ngq5EGQyJSsHTtUPj8WA4M3GDrwBvEEDB2yEaQQ$
                              *
                              Jan 20 00:02:01 host SMGC: [2736466] No work, exiting
                              Jan 20 00:02:01 host SMGC: [2736466] GC process exiting, no work left
                              Jan 20 00:02:01 host SM: [2736466] lock: released /var/lock/sm/93d53646-e895-52cf-7c8e-df1d5e84f5e4/gc_active
                              Jan 20 00:02:01 host SMGC: [2736466] In cleanup
                              Jan 20 00:02:01 host SMGC: [2736466] SR 93d5 ('VM Sol flash') (73 VDIs in 39 VHD trees): no changes
                              Jan 20 00:02:01 host SM: [2736466] lock: closed /var/lock/sm/93d53646-e895-52cf-7c8e-df1d5e84f5e4/running
                              

                              @tjkreidl Free space is enough. I never see GC job running for a long, so never see it at other pools or after fix. Have no any coalesce in queue.
                              xe task-list show nothing.

                              Maybe i need to cleanup bad VDIs manually for first time?

                              tjkreidlT 1 Reply Last reply Reply Quote 0
                              • tjkreidlT Offline
                                tjkreidl Ambassador @Tristis Oris
                                last edited by

                                @Tristis-Oris It wouldn't hurt to do a manual cleanup. Not sure if a reboot might help, but strange that no task is showing as active. Do you have other SRs on which you can try a scan/coalesce?
                                Are there any VMs in a weird power state?

                                Tristis OrisT 1 Reply Last reply Reply Quote 0
                                • Tristis OrisT Offline
                                  Tristis Oris Top contributor @tjkreidl
                                  last edited by

                                  @tjkreidl nothing unusual.
                                  I found same issue on another 8.3 pool, another SR, but never seen related GC tasks. No exceptions at log.
                                  But no problems with 3rd 8.3 pool.

                                  SR scan don't trigger GC, can i run it manually?

                                  tjkreidlT 1 Reply Last reply Reply Quote 0
                                  • tjkreidlT Offline
                                    tjkreidl Ambassador @Tristis Oris
                                    last edited by tjkreidl

                                    @Tristis-Oris By manually, do you mean from the CLI vs. from the GUI? If so, then:
                                    xe sr-scan sr-uuid=sr_uuid
                                    Check your logs (probably /var/log/SMlog) and run "xe task-list" to see what, if anything, is active.

                                    Tristis OrisT 1 Reply Last reply Reply Quote 0
                                    • Tristis OrisT Offline
                                      Tristis Oris Top contributor @tjkreidl
                                      last edited by

                                      @tjkreidl but that same scan as ff71c9ec-d825-4f3b-8256-43299e950545-image.png ?

                                      1 Reply Last reply Reply Quote 0
                                      • Tristis OrisT Offline
                                        Tristis Oris Top contributor
                                        last edited by

                                        I guess I'm a little confused.
                                        Probably, after fix all bad snapshots been removed, and now they are exist only for hatled (archive) VMs. They got backup only once with such job:
                                        46bcdd5e-9eed-4fda-8a3c-ff5fe1be5842-image.png

                                        so without backup tasks, GC for vdi chains not running. Is it safe to remove them manually, or better to run backup task again? (that very long and not required).

                                        tjkreidlT 1 Reply Last reply Reply Quote 0
                                        • tjkreidlT Offline
                                          tjkreidl Ambassador @Tristis Oris
                                          last edited by tjkreidl

                                          @Tristis-Oris Do first verify you have good backups before considering deleting snapshots. You could also just export the snapshots associated with the VMs..
                                          As to the GUI vs. CLI, it should do the same thing, but if it runs, it should show up in the task list.

                                          1 Reply Last reply Reply Quote 0
                                          • First post
                                            Last post