@olivierlambert
MASSIVE EDIT AFTER FURTHER TESTING
So i have another one in my testing with CBT.
If i have VM running with CBT backups with Snapshot deletion enabled, and i remove the pool setting to specify a migration network everything appears fine and CBT data will not reset due to a migration.
However if it take a manual snapshot on a VM, and remove the snapshot after i find CBT data sometimes resets itself:
SM log shows:
[15:53 xcpng-test-01 45e457aa-16f8-41e0-d03d-8201e69638be]# grep -A 5 -B 5 -i exception /var/log/SMlog
Jan 28 11:55:00 xcpng-test-01 SMGC: [2041921] In cleanup
Jan 28 11:55:00 xcpng-test-01 SMGC: [2041921] SR 9330 ('Syn-TestLab-DS1') (0 VDIs in 0 VHD trees): no changes
Jan 28 11:55:00 xcpng-test-01 SM: [2041921] lock: closed /var/lock/sm/93308f90-1fcd-873b-292f-4a34dde2bfea/running
Jan 28 11:55:00 xcpng-test-01 SM: [2041921] lock: closed /var/lock/sm/93308f90-1fcd-873b-292f-4a34dde2bfea/gc_active
Jan 28 11:55:00 xcpng-test-01 SM: [2041921] lock: closed /var/lock/sm/93308f90-1fcd-873b-292f-4a34dde2bfea/sr
Jan 28 11:55:09 xcpng-test-01 SM: [2041073] ***** sr_scan: EXCEPTION <class 'util.CommandException'>, Input/output error
Jan 28 11:55:09 xcpng-test-01 SM: [2041073] File "/opt/xensource/sm/SRCommand.py", line 111, in run
Jan 28 11:55:09 xcpng-test-01 SM: [2041073] return self._run_locked(sr)
Jan 28 11:55:09 xcpng-test-01 SM: [2041073] File "/opt/xensource/sm/SRCommand.py", line 161, in _run_locked
Jan 28 11:55:09 xcpng-test-01 SM: [2041073] rv = self._run(sr, target)
Jan 28 11:55:09 xcpng-test-01 SM: [2041073] File "/opt/xensource/sm/SRCommand.py", line 370, in _run
Jan 28 11:55:09 xcpng-test-01 SM: [2041073] return sr.scan(self.params['sr_uuid'])
Jan 28 11:55:09 xcpng-test-01 SM: [2041073] File "/opt/xensource/sm/ISOSR", line 594, in scan
Jan 28 11:55:09 xcpng-test-01 SM: [2041073] if not util.isdir(self.path):
Jan 28 11:55:09 xcpng-test-01 SM: [2041073] File "/opt/xensource/sm/util.py", line 542, in isdir
Jan 28 11:55:09 xcpng-test-01 SM: [2041073] raise CommandException(errno.EIO, "os.stat(%s)" % path, "failed")
Jan 28 11:55:09 xcpng-test-01 SM: [2041073]
Jan 28 11:55:09 xcpng-test-01 SM: [2041073] Raising exception [40, The SR scan failed [opterr=Command os.stat(/var/run/sr-mount/d00054f9-e6a2-162f-f734-1c6c02541722) failed (failed): Input/output error]]
Jan 28 11:55:09 xcpng-test-01 SM: [2041073] ***** ISO: EXCEPTION <class 'xs_errors.SROSError'>, The SR scan failed [opterr=Command os.stat(/var/run/sr-mount/d00054f9-e6a2-162f-f734-1c6c02541722) failed (failed): Input/output error]
Jan 28 11:55:09 xcpng-test-01 SM: [2041073] File "/opt/xensource/sm/SRCommand.py", line 385, in run
Jan 28 11:55:09 xcpng-test-01 SM: [2041073] ret = cmd.run(sr)
Jan 28 11:55:09 xcpng-test-01 SM: [2041073] File "/opt/xensource/sm/SRCommand.py", line 121, in run
Jan 28 11:55:09 xcpng-test-01 SM: [2041073] raise xs_errors.XenError(excType, opterr=msg)
Jan 28 11:55:09 xcpng-test-01 SM: [2041073]
--
Jan 28 14:41:58 xcpng-test-01 SM: [2181235] lock: opening lock file /var/lock/sm/58242a5a-0a6f-4e4e-bada-8331ed32eae4/cbtlog
Jan 28 14:41:58 xcpng-test-01 SM: [2181235] lock: acquired /var/lock/sm/58242a5a-0a6f-4e4e-bada-8331ed32eae4/cbtlog
Jan 28 14:41:58 xcpng-test-01 SM: [2181235] ['/usr/sbin/cbt-util', 'get', '-n', '/var/run/sr-mount/45e457aa-16f8-41e0-d03d-8201e69638be/58242a5a-0a6f-4e4e-bada-8331ed32eae4.cbtlog', '-c']
Jan 28 14:41:58 xcpng-test-01 SM: [2181235] pread SUCCESS
Jan 28 14:41:58 xcpng-test-01 SM: [2181235] lock: released /var/lock/sm/58242a5a-0a6f-4e4e-bada-8331ed32eae4/cbtlog
Jan 28 14:41:58 xcpng-test-01 SM: [2181235] Raising exception [460, Failed to calculate changed blocks for given VDIs. [opterr=Source and target VDI are unrelated]]
Jan 28 14:41:58 xcpng-test-01 SM: [2181235] ***** generic exception: vdi_list_changed_blocks: EXCEPTION <class 'xs_errors.SROSError'>, Failed to calculate changed blocks for given VDIs. [opterr=Source and target VDI are unrelated]
Jan 28 14:41:58 xcpng-test-01 SM: [2181235] File "/opt/xensource/sm/SRCommand.py", line 111, in run
Jan 28 14:41:58 xcpng-test-01 SM: [2181235] return self._run_locked(sr)
Jan 28 14:41:58 xcpng-test-01 SM: [2181235] File "/opt/xensource/sm/SRCommand.py", line 161, in _run_locked
Jan 28 14:41:58 xcpng-test-01 SM: [2181235] rv = self._run(sr, target)
Jan 28 14:41:58 xcpng-test-01 SM: [2181235] File "/opt/xensource/sm/SRCommand.py", line 326, in _run
Jan 28 14:41:58 xcpng-test-01 SM: [2181235] return target.list_changed_blocks()
Jan 28 14:41:58 xcpng-test-01 SM: [2181235] File "/opt/xensource/sm/VDI.py", line 759, in list_changed_blocks
Jan 28 14:41:58 xcpng-test-01 SM: [2181235] "Source and target VDI are unrelated")
Jan 28 14:41:58 xcpng-test-01 SM: [2181235]
Jan 28 14:41:58 xcpng-test-01 SM: [2181235] ***** NFS VHD: EXCEPTION <class 'xs_errors.SROSError'>, Failed to calculate changed blocks for given VDIs. [opterr=Source and target VDI are unrelated]
Jan 28 14:41:58 xcpng-test-01 SM: [2181235] File "/opt/xensource/sm/SRCommand.py", line 385, in run
Jan 28 14:41:58 xcpng-test-01 SM: [2181235] ret = cmd.run(sr)
Jan 28 14:41:58 xcpng-test-01 SM: [2181235] File "/opt/xensource/sm/SRCommand.py", line 111, in run
Jan 28 14:41:58 xcpng-test-01 SM: [2181235] return self._run_locked(sr)
Jan 28 14:41:58 xcpng-test-01 SM: [2181235] File "/opt/xensource/sm/SRCommand.py", line 161, in _run_locked
--
Jan 28 15:53:18 xcpng-test-01 SMGC: [2250532] Removed leaf-coalesce from fe6e3edd(100.000G/7.483M?)
Jan 28 15:53:18 xcpng-test-01 SMGC: [2250532] *~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*
Jan 28 15:53:18 xcpng-test-01 SMGC: [2250532] ***********************
Jan 28 15:53:18 xcpng-test-01 SMGC: [2250532] * E X C E P T I O N *
Jan 28 15:53:18 xcpng-test-01 SMGC: [2250532] ***********************
Jan 28 15:53:18 xcpng-test-01 SMGC: [2250532] leaf-coalesce: EXCEPTION <class 'util.SMException'>, VDI fe6e3edd-4d63-4005-b0f3-932f5f34e036 could not be coalesced
Jan 28 15:53:18 xcpng-test-01 SMGC: [2250532] File "/opt/xensource/sm/cleanup.py", line 2098, in coalesceLeaf
Jan 28 15:53:18 xcpng-test-01 SMGC: [2250532] self._coalesceLeaf(vdi)
Jan 28 15:53:18 xcpng-test-01 SMGC: [2250532] File "/opt/xensource/sm/cleanup.py", line 2380, in _coalesceLeaf
Jan 28 15:53:18 xcpng-test-01 SMGC: [2250532] .format(uuid=vdi.uuid))
Jan 28 15:53:18 xcpng-test-01 SMGC: [2250532]
I have been able to once again reproduce this multiple times.
Steps to reproduce:
-
Setup a backup job, Enable CBT with snapshot removal and run your first initial full backup.
-
Take a manual snapshot of the VM. After a few mins remove the snapshot and let GC run and complete.
-
Run the same backup job again. For me anyways this usually results in a full backup with the above being dumped to the SM log.
-
Afterwards all backups after this will go back to being delta and CBT will work fine again, unless i take another manual snapshot.
Is anyone else able to reproduce this?
Edit 2: Here is an example of what i am running into.
After running the initial backup job runs:
[23:27 xcpng-test-01 45e457aa-16f8-41e0-d03d-8201e69638be]# cbt-util get -c -n 2be6b6ec-9308-4e63-9975-19259108eba2.cbtlog
adde7aaf-6b13-498a-b0e3-f756a57b2e78
After taking a manual snapshot, the CBT log reference changes as expected:
[23:27 xcpng-test-01 45e457aa-16f8-41e0-d03d-8201e69638be]# cbt-util get -c -n 2be6b6ec-9308-4e63-9975-19259108eba2.cbtlog
b6e33794-120a-4a95-b035-af64c6605ee2
After removing the manual snapshot:
[23:29 xcpng-test-01 45e457aa-16f8-41e0-d03d-8201e69638be]# cbt-util get -c -n 2be6b6ec-9308-4e63-9975-19259108eba2.cbtlog
00000000-0000-0000-0000-000000000000