XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Too many snapshots

    Scheduled Pinned Locked Moved Backup
    26 Posts 5 Posters 250 Views 3 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • P Offline
      Pilow @tjkreidl
      last edited by Pilow

      @tjkreidl yeah, but he has 16 snapshots.
      but the documentation also talks about vdi chain length

      but it seems to me impossible to have only 16 snaps and a vdi chain length >30

      thats why I wondered perhaps is it a cap limit of snapshots per SR, but I didn't find relevant info about this possibility

      you know a lot about Xen, ever heard of this type of per SR limit of snapshots ?
      Only info I found is that more than 100/150 VDIs in production per SR can degrade performance

      tjkreidlT 1 Reply Last reply Reply Quote 0
      • tjkreidlT Online
        tjkreidl Ambassador @Pilow
        last edited by

        @Pilow Yes, you are correct that the chain length is also limited. You might try to manually delete some of the snapshots and though the limit is supposed to be 30, perhaps there are other factors involved? Does that VM have a particularly large amount of storage and a lot of changes between snapshots? Are any other of your VMs experiencing similar issues? Your SR appears to be mostly empty, correct? Are there any related errors showing up in /var/log/SMlog ?

        M 1 Reply Last reply Reply Quote 0
        • M Offline
          McHenry @tjkreidl
          last edited by

          @tjkreidl

          I wish to maintain 16 restore points using CR, being an hourly restore point over the last two days (8 per day)
          I perform a full backup nightly to reset the chain.
          08c8a44a-4c6a-4509-9b44-cbe28fd6c4be-image.jpeg

          It appears that each CR creates a new snapshot and the old snapshot is removed when a new one is crated
          The documentation states this error is shown then there are more than 3 snapshots on a VM
          https://docs.xen-orchestra.com/manage_infrastructure#too-many-snapshots

          Is this a problematic backup strategy?

          tjkreidlT 1 Reply Last reply Reply Quote 0
          • tjkreidlT Online
            tjkreidl Ambassador @McHenry
            last edited by

            @McHenry Are you sure with that frequent running backups that each backup completes successfully before the next one starts? How long does the full backup typically take (less than 7 hours?) as well as the incrementals (under 1 hour?)? Again, I'd suggest looking in /var/log/SMlog for any error conditions that might help identify an issue. Also, how fragmented is your storage, as that can slow things down quite a bit, as can the lack of adequate CPU power as well as memory (run the top or xentop utility to view the load during backups).

            P 1 Reply Last reply Reply Quote 0
            • P Offline
              Pilow @tjkreidl
              last edited by Pilow

              @tjkreidl @mchenry haaaa I remember how & when I was able to provoke this error
              I was trying to purge 12 "replica VM" with the new CR method by forcing CR manually to get 1 VM with 12 replicas

              so I ended up clicking START on the CR job as soon as the CR finished, and got this same error.
              this was because GC didn't finish the previous job. Just had to wait 2 min for GC to reduce the chain length and I could go manual again on the CR

              so I guess @tjkreidl is right, and the error message is misleading
              your CR probably finish before the one hour interval BUT Garbage Collector do not

              you have two options

              • space up your CR jobs to give GC some time to finish
              • find why GC is taking too much time (could be SR performance, nerver ending GC because of high I/O on the VM, ...)
              P tjkreidlT 2 Replies Last reply Reply Quote 0
              • P Offline
                Pilow @Pilow
                last edited by

                @florent @bastien-nollet could it be possible to monitor GC job to pause the job instead of failing with misleading error message ?

                instead of TOO MANY SNAPSHOTS juste pause with WAITING PREVIOUS GARBAGE COLLECTOR TO FINISH and resume ASAP ?

                this would force the admin of backup to re think his CR RPO/RTO strategy but not fail jobs

                1 Reply Last reply Reply Quote 0
                • tjkreidlT Online
                  tjkreidl Ambassador @Pilow
                  last edited by

                  @Pilow I agree, the error message is misleading and indeed, garbage collection can take some time to complete and likely in some cases to be greater than one hour.
                  Is there the option to monitor garbage collection with task-list or some other utility? Because if so, one could write a script to kick off backups instead of using the cron pattern in the backup setting. Just a suggestion ...

                  P 2 Replies Last reply Reply Quote 0
                  • P Offline
                    Pilow @tjkreidl
                    last edited by

                    @tjkreidl in DASHBOARD/HEALTH/UNHEALTHY VDIs
                    there you can see GC doing its magic, with VDI Chain Length progressivly going down to zero when deleting a snapshot.

                    my 2 cents, he has multiple VMs in the same CR job, and GC is sequential. in the one hour timeframe, next CR is launched and stumble upon VMs that are not yet sanitized

                    downing the number of VM per job could do the trick, and chain/sequence 2 CR jobs with a dispatch of the VMs

                    1 Reply Last reply Reply Quote 0
                    • P Offline
                      Pilow @tjkreidl
                      last edited by

                      @tjkreidl said:

                      Is there the option to monitor garbage collection with task-list or some other utility?

                      # tail -f /var/log/SMlog |grep coalesce
                      

                      with this you can monitor live the coalescence of VDI chains

                      tjkreidlT 1 Reply Last reply Reply Quote 0
                      • tjkreidlT Online
                        tjkreidl Ambassador @Pilow
                        last edited by

                        @Pilow Ah, right. You'd have to check the time stamp if you worked on automating this.
                        So maybe @McHenry could write a script to do the backups and that way, ensure there was no on-going task in progress before kicking off the next backup instance.
                        It could be run periodically from a cron job and if there's still on-going activity, just exit and try again the next time.

                        1 Reply Last reply Reply Quote 0

                        Hello! It looks like you're interested in this conversation, but you don't have an account yet.

                        Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.

                        With your input, this post could be even better 💗

                        Register Login
                        • First post
                          Last post