XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Issue with SR and coalesce

    Scheduled Pinned Locked Moved Unsolved Backup
    62 Posts 7 Posters 1.1k Views 7 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Byte_SmarterB Offline
      Byte_Smarter @nikade
      last edited by

      @nikade

      In our case there are 0 Snapshots as these are backups restored on a new SR, so I am not sure what there is to coalesce.

      We get this when running: grep -i "coalescale" /var/log/SMlog

      272fd752-73f8-4ccb-89fb-74541128184d-image.png

      We get this when running: grep -A 5 -B 5 -i exception /var/log/SMlog (Was advised from Discord to find issues with Coalescing)

      ac64d18c-f122-4c9a-b9f4-1bc4ad3d8b0e-image.png

      As far as all the hosts ONLY the master which is 'ops-xen2' is showing the logs errors posted earlier.

      In theory the only snapshots these should have are the ones taken during the backup process but it never makes these and skips the VMs we want backed up that are the issue here

      The SR has several TB of free space and is using ISCSI and is running on its own storage network, along with the fact no VMs running on that SR are showing any issues with the SR IOPS. I am not thinking this is a SR related issues (I can be wrong here)

      2a6867fc-2050-4036-9aad-0947352f3b21-image.png

      As far as what was posted about the Multi plugin while informative what should I be looking for in the entirety of the SMLog?

      These host are running version 8.1.0 and have been running for a decent time so that could be a issue here, we are looking at possibly moving all VMs that are at issue to a totally new host/pool to see if the hosts are the issue.

      There are 10 VM's in this pool 5 of them are on a older NAS that we are wanting to replace we have not moved any more over as this issue kind of makes us not want to finish moving them all over until we are secure. We have 5 on the same pool in that new SR I mentioned and from both speed and IOPS seem to be fine and no errors to be seen with the SR.

      If we shut down the 5 VM's that are currently an issue they can run the backups just fine but you can not run the backups if they are running.

      I don't mind doing the work to fix the issues, I am just still in the process of finding out what the issue that I need to fix is. I would really like if someone has 15 mins to go over the logs with us from XCP and maybe point us in some sort of direction to resolve the issues we are seeing as it does not seem to fall into issues that are in the documentation or the Discord.

      P 1 Reply Last reply Reply Quote 0
      • P Offline
        ph7 @Byte_Smarter
        last edited by

        @Byte_Smarter said in Issue with SR and coalesce:

        We get this when running: grep -i "coalescale" /var/log/SMlog

        Maybe try with coalesce instead

        Byte_SmarterB 2 Replies Last reply Reply Quote 0
        • Byte_SmarterB Offline
          Byte_Smarter @ph7
          last edited by

          @ph7 a99e4f73-4ee2-4425-bd5a-f5747a9693d4-image.png

          Same result

          was copied from:

          f6fe5001-9d1a-4c2e-8b44-5ecd729796ed-image.png

          1 Reply Last reply Reply Quote 0
          • Byte_SmarterB Offline
            Byte_Smarter @ph7
            last edited by

            @ph7

            Also ran on the other hosts in pool

            dde454c1-8297-4e19-9511-4e5333d99a22-image.png

            7d4bc42e-2e7a-481a-9c9c-258ab0b90b26-image.png

            I am assuming that this means its not even attempting to coalesce?

            P 1 Reply Last reply Reply Quote 0
            • P Offline
              ph7 @Byte_Smarter
              last edited by

              @Byte_Smarter
              Sorry, can't help You 😞

              Byte_SmarterB 1 Reply Last reply Reply Quote 0
              • Byte_SmarterB Offline
                Byte_Smarter @ph7
                last edited by

                @ph7

                I am sorry too LOL

                1 Reply Last reply Reply Quote 0
                • Byte_SmarterB Offline
                  Byte_Smarter @lucasljorge
                  last edited by

                  @lucasljorge I don't want to feel like I hijacked your post, are you still having the issues posted ?

                  L 1 Reply Last reply Reply Quote 0
                  • L Offline
                    lucasljorge @Byte_Smarter
                    last edited by

                    @Byte_Smarter

                    its ok buddy, please don't feel it, i'm following the discussion and trying to figure out whats happening too.
                    I added another SR and i'm monitoring the status. Still having performance issues (1 SR is SAS) but, the coalesce number seems to be finally decreasing.
                    Watching out the backup jobs running and keep you all in touch

                    L Byte_SmarterB 2 Replies Last reply Reply Quote 0
                    • L Offline
                      lucasljorge @lucasljorge
                      last edited by

                      @lucasljorge

                      thanks to @nikade and @dthenot for the quick response. I wiped out the "old SR" with coalesce issues after migrating. Seems to be running ok.

                      1 Reply Last reply Reply Quote 0
                      • Byte_SmarterB Offline
                        Byte_Smarter @lucasljorge
                        last edited by

                        @lucasljorge

                        Yeah I am at a loss restored systems from Full backups onto a new SR multiple times and it just will not let us backup while they run, turn them off works fine.

                        Its like XCP's failing to make a snapshot, but you can make snapshots manually and its fine.

                        Maybe new SR is broken? if that's the case how does that happen when it was just created and added. If there a SR repair tool?

                        I can not find the error we are getting anywhere and the fact it does not have any logs of coalescing makes me thing its just not doing its job.

                        L 1 Reply Last reply Reply Quote 0
                        • tjkreidlT Offline
                          tjkreidl Ambassador @lucasljorge
                          last edited by

                          @lucasljorge How full is the storage, percentage-wise? If over around 90%, a coalesce operation sometimes will not work. You may have to shuffle some of your VM storage to a different SR.
                          If your host is not responding, you may have to do a reboot to clear out stuck taks if the "xe task-cancel" command isn't working.

                          L 1 Reply Last reply Reply Quote 0
                          • L Offline
                            lucasljorge @Byte_Smarter
                            last edited by

                            @Byte_Smarter hard to tell whats happening. The Xen version here is 8.2

                            I read somewhere in other XCP documentation, that the SR has to be at least 30% free to do backup and coalesce jobs properly.

                            I think that was the problem here, the SR went full and we're unable to run any job. But i had to recreate the old SR to get a success response from Orchestra.

                            Did you try to migrate the VDI's to another SR?

                            Byte_SmarterB 1 Reply Last reply Reply Quote 0
                            • L Offline
                              lucasljorge @tjkreidl
                              last edited by

                              @tjkreidl it was around 85% full.

                              tjkreidlT 1 Reply Last reply Reply Quote 0
                              • Byte_SmarterB Offline
                                Byte_Smarter @lucasljorge
                                last edited by

                                @lucasljorge Its a Brand New SR that we moved them too with TB of free space

                                Byte_SmarterB 1 Reply Last reply Reply Quote 0
                                • Byte_SmarterB Offline
                                  Byte_Smarter @Byte_Smarter
                                  last edited by

                                  @Byte_Smarter said in Issue with SR and coalesce:

                                  @lucasljorge Its a Brand New SR that we moved them too with TB of free space

                                  By moved we used a backup restore instead of a migration.

                                  L 1 Reply Last reply Reply Quote 0
                                  • L Offline
                                    lucasljorge @Byte_Smarter
                                    last edited by

                                    @Byte_Smarter I would try updating XCP first, I'm just not sure the impacts 😞

                                    don't know too if there are differences between migration and restoring a backup to the master/orchestra

                                    Byte_SmarterB 1 Reply Last reply Reply Quote 0
                                    • Byte_SmarterB Offline
                                      Byte_Smarter @lucasljorge
                                      last edited by

                                      @lucasljorge

                                      Yeah we are thinking that too, it would involve downtime on all 10 of the systems in the pool to complete so that would be a scheduled task.

                                      1 Reply Last reply Reply Quote 0
                                      • tjkreidlT Offline
                                        tjkreidl Ambassador @lucasljorge
                                        last edited by

                                        @lucasljorge That may be the issue. That's pretty full for a coalesce to work!

                                        Byte_SmarterB 1 Reply Last reply Reply Quote 1
                                        • Byte_SmarterB Offline
                                          Byte_Smarter @tjkreidl
                                          last edited by

                                          @tjkreidl Would a not full disk still have the issue be just a completely different issue ?

                                          tjkreidlT 1 Reply Last reply Reply Quote 0
                                          • tjkreidlT Offline
                                            tjkreidl Ambassador @Byte_Smarter
                                            last edited by tjkreidl

                                            @Byte_Smarter Sure, that's of course possible. Does "xe task-list" show any currently running tasks? Anything else of possible value in the logs?

                                            Byte_SmarterB 1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post