XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Too many snapshots

    Scheduled Pinned Locked Moved Backup
    31 Posts 5 Posters 284 Views 3 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • P Online
      Pilow @tjkreidl
      last edited by Pilow

      @tjkreidl @mchenry haaaa I remember how & when I was able to provoke this error
      I was trying to purge 12 "replica VM" with the new CR method by forcing CR manually to get 1 VM with 12 replicas

      so I ended up clicking START on the CR job as soon as the CR finished, and got this same error.
      this was because GC didn't finish the previous job. Just had to wait 2 min for GC to reduce the chain length and I could go manual again on the CR

      so I guess @tjkreidl is right, and the error message is misleading
      your CR probably finish before the one hour interval BUT Garbage Collector do not

      you have two options

      • space up your CR jobs to give GC some time to finish
      • find why GC is taking too much time (could be SR performance, nerver ending GC because of high I/O on the VM, ...)
      P tjkreidlT 2 Replies Last reply Reply Quote 0
      • P Online
        Pilow @Pilow
        last edited by

        @florent @bastien-nollet could it be possible to monitor GC job to pause the job instead of failing with misleading error message ?

        instead of TOO MANY SNAPSHOTS juste pause with WAITING PREVIOUS GARBAGE COLLECTOR TO FINISH and resume ASAP ?

        this would force the admin of backup to re think his CR RPO/RTO strategy but not fail jobs

        1 Reply Last reply Reply Quote 0
        • tjkreidlT Offline
          tjkreidl Ambassador @Pilow
          last edited by

          @Pilow I agree, the error message is misleading and indeed, garbage collection can take some time to complete and likely in some cases to be greater than one hour.
          Is there the option to monitor garbage collection with task-list or some other utility? Because if so, one could write a script to kick off backups instead of using the cron pattern in the backup setting. Just a suggestion ...

          P 2 Replies Last reply Reply Quote 0
          • P Online
            Pilow @tjkreidl
            last edited by

            @tjkreidl in DASHBOARD/HEALTH/UNHEALTHY VDIs
            there you can see GC doing its magic, with VDI Chain Length progressivly going down to zero when deleting a snapshot.

            my 2 cents, he has multiple VMs in the same CR job, and GC is sequential. in the one hour timeframe, next CR is launched and stumble upon VMs that are not yet sanitized

            downing the number of VM per job could do the trick, and chain/sequence 2 CR jobs with a dispatch of the VMs

            1 Reply Last reply Reply Quote 0
            • P Online
              Pilow @tjkreidl
              last edited by

              @tjkreidl said:

              Is there the option to monitor garbage collection with task-list or some other utility?

              # tail -f /var/log/SMlog |grep coalesce
              

              with this you can monitor live the coalescence of VDI chains

              tjkreidlT 1 Reply Last reply Reply Quote 0
              • tjkreidlT Offline
                tjkreidl Ambassador @Pilow
                last edited by

                @Pilow Ah, right. You'd have to check the time stamp if you worked on automating this.
                So maybe @McHenry could write a script to do the backups and that way, ensure there was no on-going task in progress before kicking off the next backup instance.
                It could be run periodically from a cron job and if there's still on-going activity, just exit and try again the next time.

                P 1 Reply Last reply Reply Quote 0
                • P Online
                  Pilow @tjkreidl
                  last edited by

                  @tjkreidl yes would be a good way to deal with the original problem

                  hope backup Devs @florent and/or @bastien-nollet can implement this, would profit to everyone

                  tjkreidlT 1 Reply Last reply Reply Quote 0
                  • tjkreidlT Offline
                    tjkreidl Ambassador @Pilow
                    last edited by

                    @Pilow Right, just skip the currently planned backup if a coalesce is still in progress and check again the next scheduled backup. This could potentially be implemented in the existing backup code.

                    P 1 Reply Last reply Reply Quote 1
                    • P Online
                      Pilow @tjkreidl
                      last edited by Pilow

                      @tjkreidl either skip or wait until possible
                      I'm used to veeam backup & recovery that is very resilient to these corner cases, on vmware if it understands that a Datastore has too many snapshots, or some backup ressouce is not ready yet (you can throttle number of active workers on a repository or per proxy), veeam will just wait for availability and keep going.

                      problem with this way of doing is it can shift in time the schedule where you expect CR or backup to be happening.

                      but can be a problem to skip altogether, if @mchenry need compliancy of a certain number of replicas happening

                      waiting vs skipping, in a perfect world the devs give us a switch to choose our destiny 😃

                      ps : I know XO Backup is not to be 100% mapped on Veeam functionnalities, but some of these functionnalities would really augment the XO Backup experience. just have to take into account Xen environment (no GC in vmware infrastructure)

                      tjkreidlT 1 Reply Last reply Reply Quote 0
                      • tjkreidlT Offline
                        tjkreidl Ambassador @Pilow
                        last edited by

                        @Pilow The other thing to to consider is being cognizant of how long your backups typically take (or even, planning a worst-case condition) and defining the backup intervals accordingly.
                        In other words, if you know you cannot consistently do your incremental backups in less than an hour, perform them 90 minutes or two hours between backups. It's better IMO to have a solid backup less frequently than have them fail on a regular basis.

                        P 1 Reply Last reply Reply Quote 0
                        • P Online
                          Pilow @tjkreidl
                          last edited by

                          @tjkreidl said:

                          It's better IMO to have a solid backup less frequently than have them fail on a regular basis.

                          totally agree.

                          1 Reply Last reply Reply Quote 0

                          Hello! It looks like you're interested in this conversation, but you don't have an account yet.

                          Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.

                          With your input, this post could be even better 💗

                          Register Login
                          • First post
                            Last post