XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    CBT: the thread to centralize your feedback

    Scheduled Pinned Locked Moved Backup
    455 Posts 37 Posters 416.6k Views 29 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • olivierlambertO Offline
      olivierlambert Vates đŸȘ Co-Founder CEO
      last edited by

      I have no idea why you are the only one to have this issue, which is why it's weird 😄

      F 1 Reply Last reply Reply Quote 0
      • F Offline
        flakpyro @olivierlambert
        last edited by flakpyro

        @olivierlambert So today i installed the latest round of updates on the test pool which moved all the VMs back and forth during a rolling pool update. I then let everything sit for a couple hours then ran the backup job and this time it did not throw any errors. So thats even more confusing.

        Perhaps its because i am kicking off a backup job immediately after migrating the VMs? As a test i am going to move them around again now, wait an hour then attempt to run the job.

        Edit: Waiting did not seem to help. Running the job manually again resulted in a full being run again with the same
        Can't do delta with this vdi, transfer will be a full
        Can't do delta, will try to get a full stream

        F 1 Reply Last reply Reply Quote 0
        • F Offline
          flakpyro @flakpyro
          last edited by

          SMLog output on the test pool looks the same as production pool after a manual VM migration:

          I did also double check that the VM UUID does not change after the migration.

          Nov 15 13:59:40 xcpng-test-01 SM: [277865] lock: opening lock file /var/lock/sm/8b0ee29e-7cbe-4e15-bd13-330a974fde2a/cbtlog
          Nov 15 13:59:40 xcpng-test-01 SM: [277865] lock: acquired /var/lock/sm/8b0ee29e-7cbe-4e15-bd13-330a974fde2a/cbtlog
          Nov 15 13:59:40 xcpng-test-01 SM: [277865] ['/usr/sbin/cbt-util', 'get', '-n', '/var/run/sr-mount/45e457aa-16f8-41e0-d03d-8201e69638be/8b0ee29e-7cbe-4e15-bd13-330a974fde2a.cbtlog', '-c']
          Nov 15 13:59:40 xcpng-test-01 SM: [277865]   pread SUCCESS
          Nov 15 13:59:40 xcpng-test-01 SM: [277865] lock: released /var/lock/sm/8b0ee29e-7cbe-4e15-bd13-330a974fde2a/cbtlog
          Nov 15 13:59:40 xcpng-test-01 SM: [277865] Raising exception [460, Failed to calculate changed blocks for given VDIs. [opterr=Source and target VDI are unrelated]]
          Nov 15 13:59:40 xcpng-test-01 SM: [277865] ***** generic exception: vdi_list_changed_blocks: EXCEPTION <class 'xs_errors.SROSError'>, Failed to calculate changed blocks for given VDIs. [opterr=Source and target VDI are unrelated]
          Nov 15 13:59:40 xcpng-test-01 SM: [277865]   File "/opt/xensource/sm/SRCommand.py", line 111, in run
          Nov 15 13:59:40 xcpng-test-01 SM: [277865]     return self._run_locked(sr)
          Nov 15 13:59:40 xcpng-test-01 SM: [277865]   File "/opt/xensource/sm/SRCommand.py", line 161, in _run_locked
          Nov 15 13:59:40 xcpng-test-01 SM: [277865]     rv = self._run(sr, target)
          Nov 15 13:59:40 xcpng-test-01 SM: [277865]   File "/opt/xensource/sm/SRCommand.py", line 326, in _run
          Nov 15 13:59:40 xcpng-test-01 SM: [277865]     return target.list_changed_blocks()
          Nov 15 13:59:40 xcpng-test-01 SM: [277865]   File "/opt/xensource/sm/VDI.py", line 757, in list_changed_blocks
          Nov 15 13:59:40 xcpng-test-01 SM: [277865]     "Source and target VDI are unrelated")
          Nov 15 13:59:40 xcpng-test-01 SM: [277865]
          Nov 15 13:59:40 xcpng-test-01 SM: [277865] ***** NFS VHD: EXCEPTION <class 'xs_errors.SROSError'>, Failed to calculate changed blocks for given VDIs. [opterr=Source and target VDI are unrelated]
          Nov 15 13:59:40 xcpng-test-01 SM: [277865]   File "/opt/xensource/sm/SRCommand.py", line 385, in run
          Nov 15 13:59:40 xcpng-test-01 SM: [277865]     ret = cmd.run(sr)
          Nov 15 13:59:40 xcpng-test-01 SM: [277865]   File "/opt/xensource/sm/SRCommand.py", line 111, in run
          Nov 15 13:59:40 xcpng-test-01 SM: [277865]     return self._run_locked(sr)
          Nov 15 13:59:40 xcpng-test-01 SM: [277865]   File "/opt/xensource/sm/SRCommand.py", line 161, in _run_locked
          --
          Nov 15 13:59:45 xcpng-test-01 SM: [278274] lock: opening lock file /var/lock/sm/fa7929aa-a39c-437d-9787-5218e9bcbc1a/cbtlog
          Nov 15 13:59:45 xcpng-test-01 SM: [278274] lock: acquired /var/lock/sm/fa7929aa-a39c-437d-9787-5218e9bcbc1a/cbtlog
          Nov 15 13:59:45 xcpng-test-01 SM: [278274] ['/usr/sbin/cbt-util', 'get', '-n', '/var/run/sr-mount/45e457aa-16f8-41e0-d03d-8201e69638be/fa7929aa-a39c-437d-9787-5218e9bcbc1a.cbtlog', '-c']
          Nov 15 13:59:45 xcpng-test-01 SM: [278274]   pread SUCCESS
          Nov 15 13:59:45 xcpng-test-01 SM: [278274] lock: released /var/lock/sm/fa7929aa-a39c-437d-9787-5218e9bcbc1a/cbtlog
          Nov 15 13:59:45 xcpng-test-01 SM: [278274] Raising exception [460, Failed to calculate changed blocks for given VDIs. [opterr=Source and target VDI are unrelated]]
          Nov 15 13:59:45 xcpng-test-01 SM: [278274] ***** generic exception: vdi_list_changed_blocks: EXCEPTION <class 'xs_errors.SROSError'>, Failed to calculate changed blocks for given VDIs. [opterr=Source and target VDI are unrelated]
          Nov 15 13:59:45 xcpng-test-01 SM: [278274]   File "/opt/xensource/sm/SRCommand.py", line 111, in run
          Nov 15 13:59:45 xcpng-test-01 SM: [278274]     return self._run_locked(sr)
          Nov 15 13:59:45 xcpng-test-01 SM: [278274]   File "/opt/xensource/sm/SRCommand.py", line 161, in _run_locked
          Nov 15 13:59:45 xcpng-test-01 SM: [278274]     rv = self._run(sr, target)
          Nov 15 13:59:45 xcpng-test-01 SM: [278274]   File "/opt/xensource/sm/SRCommand.py", line 326, in _run
          Nov 15 13:59:45 xcpng-test-01 SM: [278274]     return target.list_changed_blocks()
          Nov 15 13:59:45 xcpng-test-01 SM: [278274]   File "/opt/xensource/sm/VDI.py", line 757, in list_changed_blocks
          Nov 15 13:59:45 xcpng-test-01 SM: [278274]     "Source and target VDI are unrelated")
          Nov 15 13:59:45 xcpng-test-01 SM: [278274]
          Nov 15 13:59:45 xcpng-test-01 SM: [278274] ***** NFS VHD: EXCEPTION <class 'xs_errors.SROSError'>, Failed to calculate changed blocks for given VDIs. [opterr=Source and target VDI are unrelated]
          Nov 15 13:59:45 xcpng-test-01 SM: [278274]   File "/opt/xensource/sm/SRCommand.py", line 385, in run
          Nov 15 13:59:45 xcpng-test-01 SM: [278274]     ret = cmd.run(sr)
          Nov 15 13:59:45 xcpng-test-01 SM: [278274]   File "/opt/xensource/sm/SRCommand.py", line 111, in run
          Nov 15 13:59:45 xcpng-test-01 SM: [278274]     return self._run_locked(sr)
          Nov 15 13:59:45 xcpng-test-01 SM: [278274]   File "/opt/xensource/sm/SRCommand.py", line 161, in _run_locked
          
          
          F 1 Reply Last reply Reply Quote 0
          • F Offline
            flakpyro @flakpyro
            last edited by flakpyro

            @olivierlambert Im making progress getting to the bottom of this thanks to some documentation from XenServer about using cbt-util.

            You can use the cbt-util utility, which helps establish chain relationship. If the VDI snapshots are not linked by changed block metadata, you get errors like “SR_BACKEND_FAILURE_460”, “Failed to calculate changed blocks for given VDIs”, and “Source and target VDI are unrelated”.
            
            Example usage of cbt-util:
            
             cbt-util get –c –n <name of cbt log file>
            
            The -c option prints the child log file UUID.
            

            I cleared all CBT snapshots from my test VMs and run a full backup on each VM. Then ensured the CBT chain was consistent using cbt-util, the output was:

            [14:22 xcpng-test-01 45e457aa-16f8-41e0-d03d-8201e69638be]# cbt-util get -c -n 867063fc-4d86-420a-9ad2-dfe1749ecbc1.cbtlog 
            1950d6a3-c6a9-4b0c-b79f-068dd44479cc
            
            

            After the backup was complete i then migrated the VM to the second host in the pool and ran the same command from both hosts:

            [14:26 xcpng-test-01 45e457aa-16f8-41e0-d03d-8201e69638be]# cbt-util get -c -n 867063fc-4d86-420a-9ad2-dfe1749ecbc1.cbtlog 
            00000000-0000-0000-0000-000000000000
            
            

            And from the second host:

            [14:26 xcpng-test-02 45e457aa-16f8-41e0-d03d-8201e69638be]# cbt-util get -c -n 867063fc-4d86-420a-9ad2-dfe1749ecbc1.cbtlog 
            00000000-0000-0000-0000-000000000000
            
            

            That clearly is the problem right there, question is, what is causing that to happen?

            After running another full the zero'd out cbtlog file is removed and a new one is created which will work fine until the VM is migrated again:

            [14:39 xcpng-test-01 45e457aa-16f8-41e0-d03d-8201e69638be]# cbt-util get -c -n 1eefb7bf-9dc3-4830-8352-441a77412576.cbtlog 
            1950d6a3-c6a9-4b0c-b79f-068dd44479cc
            
            
            R 1 Reply Last reply Reply Quote 1
            • R Offline
              rtjdamen @flakpyro
              last edited by

              @flakpyro i can't reproduce this on our end, after migration within pool on the same storage pool the cbt is preserved. When i migrate to a different storage pool the cbt is reset.

              F 1 Reply Last reply Reply Quote 0
              • F Offline
                flakpyro @rtjdamen
                last edited by

                @rtjdamen interesting, this is with iSCSI (block) or with an NFS SR?

                R 1 Reply Last reply Reply Quote 0
                • R Offline
                  rtjdamen @flakpyro
                  last edited by

                  @flakpyro both scenarios

                  F 1 Reply Last reply Reply Quote 0
                  • F Offline
                    flakpyro @rtjdamen
                    last edited by

                    @rtjdamen Hmm very strange.

                    The only thing i can think of is that this maybe due to the fact these VMs were imported from VMware.

                    Next week i can try creating a brand new NFSv3 SR (Since NFS4 has created issues in the past) as well as a new clean install VM that was not imported from VMware and see if the issue persists.

                    F R 2 Replies Last reply Reply Quote 0
                    • F Offline
                      flakpyro @flakpyro
                      last edited by flakpyro

                      This is a completely different 5 host pool backed by a Pure storage array with SRs mounted via NFSv3, migrating a VM between hosts results in the same issue.

                      Before migration:
                      [01:41 xcpng-prd-03 b04d9910-8671-750f-050e-8b55c64fbede]# cbt-util get -c -n 83035854-b5a9-4f7e-869f-abe43ddc658d.cbtlog 
                      e28065ff-342f-4eae-a910-b91842dd39ca
                      
                      After migration
                      [01:41 xcpng-prd-03 b04d9910-8671-750f-050e-8b55c64fbede]# cbt-util get -c -n 83035854-b5a9-4f7e-869f-abe43ddc658d.cbtlog 
                      00000000-0000-0000-0000-000000000000
                      

                      I dont think i have anything "custom" running that would be causing this so no idea why this is happening but its happening on multiple pools for us.

                      R florentF 2 Replies Last reply Reply Quote 0
                      • R Offline
                        rtjdamen @flakpyro
                        last edited by

                        @flakpyro is there any difference in migrating with the vm powered on or powered off?

                        1 Reply Last reply Reply Quote 0
                        • R Offline
                          rtjdamen @flakpyro
                          last edited by rtjdamen

                          @flakpyro i have just tested live migration and offline on our end, both kept the cbt alive. Tested on both iscsi and nfs.

                          F 1 Reply Last reply Reply Quote 0
                          • F Offline
                            flakpyro @rtjdamen
                            last edited by

                            @rtjdamen

                            Looks like it does this if the VM is powered off as well. Im really not sure what else to try since this is happening on 2 different pools for us.

                            I may need to end up submitting a ticket with Vates for them to get to the bottom of it.

                            R 1 Reply Last reply Reply Quote 0
                            • R Offline
                              rtjdamen @flakpyro
                              last edited by

                              @flakpyro are u running the latest xcp-ng version 8.2 or 8.3?

                              F 1 Reply Last reply Reply Quote 0
                              • F Offline
                                flakpyro @rtjdamen
                                last edited by

                                @rtjdamen Both pools are on 8.3 with all the latest updates.
                                I did find this PR on github and wonder if it may be related: https://github.com/vatesfr/xen-orchestra/pull/8127 but not sure why it would only happen after a migration....

                                fbeauchamp opened this pull request in vatesfr/xen-orchestra

                                open fix(backups): handle slow enable cbt #8127

                                R 1 Reply Last reply Reply Quote 0
                                • R Offline
                                  rtjdamen @flakpyro
                                  last edited by

                                  @flakpyro we are still on 8.2 sor maybe there is some difference there.

                                  1 Reply Last reply Reply Quote 0
                                  • olivierlambertO Offline
                                    olivierlambert Vates đŸȘ Co-Founder CEO
                                    last edited by

                                    Thanks for the feedback @flakpyro and it shows it's not an XO issue. There's something not preserving CBT in your case where it shouldn't, and IDK why. But clearly, you have a way to test it easily, which is progress 🙂

                                    F 1 Reply Last reply Reply Quote 0
                                    • F Offline
                                      flakpyro @olivierlambert
                                      last edited by

                                      @olivierlambert So i guess the next thing we need to do is have someone also running 8.3 test this using an NFS SR?

                                      1 Reply Last reply Reply Quote 0
                                      • florentF Offline
                                        florent Vates đŸȘ XO Team @flakpyro
                                        last edited by Danp

                                        @flakpyro said in CBT: the thread to centralize your feedback:

                                        This is a completely different 5 host pool backed by a Pure storage array with SRs mounted via NFSv3, migrating a VM between hosts results in the same issue.

                                        Before migration:
                                        [01:41 xcpng-prd-03 b04d9910-8671-750f-050e-8b55c64fbede]# cbt-util get -c -n 83035854-b5a9-4f7e-869f-abe43ddc658d.cbtlog 
                                        e28065ff-342f-4eae-a910-b91842dd39ca
                                        
                                        After migration
                                        [01:41 xcpng-prd-03 b04d9910-8671-750f-050e-8b55c64fbede]# cbt-util get -c -n 83035854-b5a9-4f7e-869f-abe43ddc658d.cbtlog 
                                        00000000-0000-0000-0000-000000000000
                                        

                                        I dont think i have anything "custom" running that would be causing this so no idea why this is happening but its happening on multiple pools for us.

                                        This is a very interesting clue, and we will investigate it with damien

                                        there is a lot of edges case that can happens ( a lying network/drive/... )
                                        and most of the time , xcp/xapi are self healing, but sometimes XO have to do a little work to cleanup. The CBT should be reset correctly after storage migration.
                                        We'll add the async call to enable/ disable CBT since it could lead to bogus state, and maybe a more in depth cleaning of cbt after a "vdi not related error "

                                        F 2 Replies Last reply Reply Quote 0
                                        • F Offline
                                          flakpyro @florent
                                          last edited by

                                          @florent thanks for checking into this as we'd love to be able to use this feature. If you need me to test anything or provide any additional logs/info about our environment let me know!

                                          1 Reply Last reply Reply Quote 0
                                          • F Offline
                                            flakpyro @florent
                                            last edited by

                                            @florent Testing a storage migration i do see CBT get disabled and reset during the process which is expected! I do notice it leaves the .cbtlog file on the old SR after the storage migration is complete but that's easy enough to clean up manually.

                                            The issue i posted above however is just a VM migration from host to host on a shared NFS SR, the SR the VM is on is not changing.

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post