XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    unhealthy VDI chain

    Scheduled Pinned Locked Moved Xen Orchestra
    21 Posts 3 Posters 2.7k Views 2 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • olivierlambertO Offline
      olivierlambert Vates 🪐 Co-Founder CEO
      last edited by

      Hi,

      It's hard to assist without enough information. Are you using XO from the sources? If yes have you checked that before? https://xen-orchestra.com/docs/community.html

      Or are you using XOA, if yes, on which release channel and edition?

      Tristis OrisT 1 Reply Last reply Reply Quote 0
      • Tristis OrisT Offline
        Tristis Oris Top contributor @olivierlambert
        last edited by

        @olivierlambert from sources.
        Xen Orchestra, commit 25183
        xo-server 5.93.0
        xo-web 5.96.0

        1 Reply Last reply Reply Quote 0
        • olivierlambertO Offline
          olivierlambert Vates 🪐 Co-Founder CEO
          last edited by olivierlambert

          Please update on the latest commit available on master, rebuild and try again 🙂

          Tristis OrisT 1 Reply Last reply Reply Quote 0
          • Tristis OrisT Offline
            Tristis Oris Top contributor @olivierlambert
            last edited by

            @olivierlambert updated to commit 30874.
            after that all broken backup was fixed without moving vdi.
            but after 1-2 iterations same error again, on same VMs.

            manual backup works always? (as i see), scheduled fail from time to time.

            DanpD 1 Reply Last reply Reply Quote 0
            • DanpD Offline
              Danp Pro Support Team @Tristis Oris
              last edited by

              @Tristis-Oris What type of backup job and how often does it run?

              Tristis OrisT 1 Reply Last reply Reply Quote 0
              • Tristis OrisT Offline
                Tristis Oris Top contributor @Danp
                last edited by

                @Danp delta backups. some once 24hours, other 8hours.

                Tristis OrisT 1 Reply Last reply Reply Quote 0
                • Tristis OrisT Offline
                  Tristis Oris Top contributor @Tristis Oris
                  last edited by

                  @Tristis-Oris orchestra log. 14 VMs, 2 failed. 2022-05-20T11_00_00.005Z - backup NG.txt

                  1 Reply Last reply Reply Quote 0
                  • DanpD Offline
                    Danp Pro Support Team
                    last edited by

                    Have you checked your log files to see why the VDIs aren't coalescing? In XO, you can check under Dashboard > Health to observe the VDIs pending coalesce .

                    Tristis OrisT 1 Reply Last reply Reply Quote 0
                    • olivierlambertO Offline
                      olivierlambert Vates 🪐 Co-Founder CEO
                      last edited by

                      You might have simply a SR coalescing not fast enough for your backup 🤔

                      Tristis OrisT 1 Reply Last reply Reply Quote 0
                      • Tristis OrisT Offline
                        Tristis Oris Top contributor @Danp
                        last edited by

                        @Danp was a something here before, but now it empty.
                        i'll wait few more days to be sure is it still broken or not.

                        1 Reply Last reply Reply Quote 0
                        • Tristis OrisT Offline
                          Tristis Oris Top contributor @olivierlambert
                          last edited by

                          @olivierlambert looks like XO got a bad cache with that storage.

                          on same physical storage i have 2 shares connected to pool. All vm is usualy at 1st.
                          So i moved broken VDI to 2nd, made few backups, everything works fine. Then moved VDI back to 1st, got this error again.
                          a0363d40-4dfd-40d5-b744-b7b0f5822de4-изображение.png

                          Any options how to fix that without migrating all VMs and removing this storage?

                          Tristis OrisT 1 Reply Last reply Reply Quote 0
                          • Tristis OrisT Offline
                            Tristis Oris Top contributor @Tristis Oris
                            last edited by

                            problem still exist even for new VMs.

                            1 Reply Last reply Reply Quote 0
                            • olivierlambertO Offline
                              olivierlambert Vates 🪐 Co-Founder CEO
                              last edited by

                              You need to understand why it doesn't coalesce 🙂 /var/log/SMlog is your friend.

                              Tristis OrisT 1 Reply Last reply Reply Quote 0
                              • Tristis OrisT Offline
                                Tristis Oris Top contributor @olivierlambert
                                last edited by

                                @olivierlambert nothing about failed VMs at SMlog for this period.
                                backup task started at 1:00. since 1:03 to 1:06 few backups was failed. log looks like

                                Jun 14 01:00:35 name SM: [17853]   pread SUCCESS
                                Jun 14 01:00:35 name SM: [17853] lock: released /var/lock/sm/.nil/lvm
                                Jun 14 01:00:35 name SM: [17853] lock: acquired /var/lock/sm/.nil/lvm
                                Jun 14 01:00:36 name SM: [17853] lock: released /var/lock/sm/.nil/lvm
                                Jun 14 01:00:36 name SM: [17853] Calling tap unpause with minor 8
                                Jun 14 01:00:36 name SM: [17853] ['/usr/sbin/tap-ctl', 'unpause', '-p', '28995', '-m', '8', '-a', 'vhd:/dev/VG_XenStorage-f1a514f3-2ef9-5705-7a7e-c8c23483122c/V
                                HD-236b3cc3-80f7-40ff-9862-f8d4c2f69225']
                                Jun 14 01:00:36 name SM: [17853]  = 0
                                Jun 14 01:00:36 name SM: [17853] lock: released /var/lock/sm/236b3cc3-80f7-40ff-9862-f8d4c2f69225/vdi
                                Jun 14 01:06:02 name SM: [19600] on-slave.multi: {'vgName': 'VG_XenStorage-f1a514f3-2ef9-5705-7a7e-c8c23483122c', 'lvName1': 'VHD-1fa9eb66-49fe-4e93-b64d-fbb2c3
                                bb8745', 'action1': 'deactivateNoRefcount', 'action2': 'cleanupLockAndRefcount', 'uuid2': '1fa9eb66-49fe-4e93-b64d-fbb2c3bb8745', 'ns2': 'lvm-f1a514f3-2ef9-5705-7a7e-c8c2
                                3483122c'}
                                Jun 14 01:06:02 name SM: [19600] LVMCache created for VG_XenStorage-f1a514f3-2ef9-5705-7a7e-c8c23483122c
                                Jun 14 01:06:02 name SM: [19600] on-slave.action 1: deactivateNoRefcount
                                Jun 14 01:06:02 name SM: [19600] LVMCache: will initialize now
                                Jun 14 01:06:02 name SM: [19600] LVMCache: refreshing
                                Jun 14 01:06:02 name SM: [19600] lock: opening lock file /var/lock/sm/.nil/lvm
                                Jun 14 01:06:02 name SM: [19600] lock: acquired /var/lock/sm/.nil/lvm
                                Jun 14 01:06:02 name SM: [19600] ['/sbin/lvs', '--noheadings', '--units', 'b', '-o', '+lv_tags', '/dev/VG_XenStorage-f1a514f3-2ef9-5705-7a7e-c8c23483122c']
                                Jun 14 01:06:03 name SM: [19600]   pread SUCCESS
                                
                                

                                no any info for 6 min. 31325bcd-1e1c-4fec-9058-961772a13141-изображение.png

                                1 Reply Last reply Reply Quote 0
                                • olivierlambertO Offline
                                  olivierlambert Vates 🪐 Co-Founder CEO
                                  last edited by

                                  This XO message is not a failure. It's a protection.

                                  Your problem isn't in XO but in your storage that either doesn't coalesce (at all) or coalesce slower than you create new snapshots (ie when a new backup starts).

                                  So you have to check on storage side if you have coalesce issues.

                                  Tristis OrisT 1 Reply Last reply Reply Quote 0
                                  • Tristis OrisT Offline
                                    Tristis Oris Top contributor @olivierlambert
                                    last edited by

                                    @olivierlambert but backup works fine on another share at same storage. I think it easier to recreate this one.

                                    1 Reply Last reply Reply Quote 0
                                    • olivierlambertO Offline
                                      olivierlambert Vates 🪐 Co-Founder CEO
                                      last edited by

                                      In your XO, SR detailed view, Advanced, you should see the disks to coalesce.

                                      If you backup a VM having a disk not coalesced yet, XO will prevent the creation of a new snapshot.

                                      You can disable all jobs and see if you end with zero VDI to coalesce. If it works, then it means some disks are backup faster than your SR can coalesce.

                                      Tristis OrisT 1 Reply Last reply Reply Quote 0
                                      • Tristis OrisT Offline
                                        Tristis Oris Top contributor @olivierlambert
                                        last edited by

                                        @olivierlambert thanks, i will try.

                                        1 Reply Last reply Reply Quote 0
                                        • Tristis OrisT Offline
                                          Tristis Oris Top contributor
                                          last edited by

                                          solved problem.
                                          Storage was in weird state, can't dismount it from pool with bunch on unknown errors. Need a pool reboot to detach it. After reattach it works normally.

                                          1 Reply Last reply Reply Quote 0
                                          • olivierlambertO Offline
                                            olivierlambert Vates 🪐 Co-Founder CEO
                                            last edited by

                                            Good news then 🙂

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post