Well I have not had any issues since the 9th which is when I disabled Vinchin Backups. However, my count did not go down since then either.
Posts
-
RE: Internal error: Not_found after Vinchin backup
-
RE: Internal error: Not_found after Vinchin backup
@olivierlambert Should I not worry about deleting any more of the extra 0B disk listed on the SR page Disks tab and just watch the number on this page?
-
RE: Internal error: Not_found after Vinchin backup
@olivierlambert Thank you. It's just that the host that some of these VMs are on is disconnecting the SR from the host. So I have been shutting them down and moving them to another host and powering back on. I was just hoping I could finish the coalesce for them manually to prevent unknown downtimes.
-
RE: Internal error: Not_found after Vinchin backup
@olivierlambert What entry would I look for to see a successful and/or failed Coalesce? I'm looking at the SMlog.
-
RE: Internal error: Not_found after Vinchin backup
@olivierlambert So the count has gone down to 25. The host that all of these servers were on of course disconnected from the SR again. Is there a way to run the Garbage Collection and/or Coalesce on one host only? I was thinking if I move the VMs one at a time over to a host that has nothing else on it I could run that against a powered off VM to clean it up. Then move it to another host and power it back on. Then to the next and next until it's all cleaned up. Does that make sense?
-
RE: Internal error: Not_found after Vinchin backup
I don't know if this helps from the SMlog
[15:18 iahost-xcpng-server2 ~]# grep -i "coalesce" /var/log/SMlog Jul 8 14:32:53 iahost-xcpng-server2 SM: [31275] Aborting GC/coalesce Jul 8 14:33:00 iahost-xcpng-server2 SM: [31789] Entering doesFileHaveOpenHandles with file: /dev/mapper/VG_XenStorage--88d7607c--f807--3b06--6f70--2dcb319d97ea-coalesce_8241ba22--3125--4f45--b3b1--254792a525c7_1 Jul 8 14:33:00 iahost-xcpng-server2 SM: [31789] Entering findRunningProcessOrOpenFile with params: ['/dev/mapper/VG_XenStorage--88d7607c--f807--3b06--6f70--2dcb319d97ea-coalesce_8241ba22--3125--4f45--b3b1--254792a525c7_1', False] Jul 8 14:33:00 iahost-xcpng-server2 SM: [31789] ['/sbin/dmsetup', 'remove', '/dev/mapper/VG_XenStorage--88d7607c--f807--3b06--6f70--2dcb319d97ea-coalesce_8241ba22--3125--4f45--b3b1--254792a525c7_1'] Jul 8 14:59:34 iahost-xcpng-server2 SMGC: [20458] Coalesced size = 316.035G Jul 8 14:59:34 iahost-xcpng-server2 SMGC: [20458] Coalesce candidate: *8241ba22[VHD](600.000G//88.477G|n) (tree height 5) Jul 8 14:59:35 iahost-xcpng-server2 SMGC: [20458] Coalesced size = 316.035G Jul 8 14:59:35 iahost-xcpng-server2 SMGC: [20458] Coalesce candidate: *8241ba22[VHD](600.000G//88.477G|a) (tree height 5) Jul 8 14:59:35 iahost-xcpng-server2 SM: [20458] ['/sbin/lvremove', '-f', '/dev/VG_XenStorage-88d7607c-f807-3b06-6f70-2dcb319d97ea/coalesce_8241ba22-3125-4f45-b3b1-254792a525c7_1'] Jul 8 14:59:35 iahost-xcpng-server2 SM: [20458] ['/sbin/dmsetup', 'status', 'VG_XenStorage--88d7607c--f807--3b06--6f70--2dcb319d97ea-coalesce_8241ba22--3125--4f45--b3b1--254792a525c7_1'] Jul 8 14:59:36 iahost-xcpng-server2 SMGC: [20458] Coalesced size = 316.035G Jul 8 14:59:36 iahost-xcpng-server2 SMGC: [20458] Coalesce candidate: *8241ba22[VHD](600.000G//88.477G|a) (tree height 5) Jul 8 14:59:36 iahost-xcpng-server2 SM: [20458] ['/sbin/lvcreate', '-n', 'coalesce_8241ba22-3125-4f45-b3b1-254792a525c7_1', '-L', '4', 'VG_XenStorage-88d7607c-f807-3b06-6f70-2dcb319d97ea', '--addtag', 'journaler', '-W', 'n'] Jul 8 15:01:41 iahost-xcpng-server2 SMGC: [20458] Coalesced size = 316.035G Jul 8 15:02:11 iahost-xcpng-server2 SMGC: [20458] Running VHD coalesce on *8241ba22[VHD](600.000G//88.477G|a) Jul 8 15:02:11 iahost-xcpng-server2 SM: [22617] ['/usr/bin/vhd-util', 'coalesce', '--debug', '-n', '/dev/VG_XenStorage-88d7607c-f807-3b06-6f70-2dcb319d97ea/VHD-8241ba22-3125-4f45-b3b1-254792a525c7']
-
RE: Internal error: Not_found after Vinchin backup
What does this mean exactly? When will this happen? I'm sorry I am very nervous.
-
RE: Internal error: Not_found after Vinchin backup
I deleted the snapshot from this one vm.
However, that one remains in the list on here:
-
RE: Pool Master
@olivierlambert Dang ok. I waited a few minute then clicked the Connect in XOA for that host and it connected. Not sure what to do really.
-
RE: Pool Master
I rebooted the pool master. Even though it looks like it came up ok it is not connected to storage. In xcp-ng center it is showing a red X for the SR and this is the errors in dmesg.
[Tue Jul 8 14:43:19 2025] Loading iSCSI transport class v2.0-870. [Tue Jul 8 14:43:19 2025] iscsi: registered transport (tcp) [Tue Jul 8 14:43:20 2025] scsi host7: iSCSI Initiator over TCP/IP [Tue Jul 8 14:43:20 2025] scsi 7:0:0:1: Direct-Access SYNOLOGY Storage 4.0 PQ: 0 ANSI: 5 [Tue Jul 8 14:43:20 2025] scsi 7:0:0:1: alua: supports implicit TPGS [Tue Jul 8 14:43:20 2025] scsi 7:0:0:1: alua: device naa.60014059c706a83d1e77d47a8da5c5d1 port group 0 rel port 1 [Tue Jul 8 14:43:20 2025] sd 7:0:0:1: Attached scsi generic sg1 type 0 [Tue Jul 8 14:43:20 2025] sd 7:0:0:1: alua: transition timeout set to 60 seconds [Tue Jul 8 14:43:20 2025] sd 7:0:0:1: alua: port group 00 state A non-preferred supports TOlUSNA [Tue Jul 8 14:43:25 2025] sd 7:0:0:1: [sdb] 26109542400 512-byte logical blocks: (13.4 TB/12.2 TiB) [Tue Jul 8 14:43:25 2025] sd 7:0:0:1: [sdb] Write Protect is off [Tue Jul 8 14:43:25 2025] sd 7:0:0:1: [sdb] Mode Sense: 43 00 10 08 [Tue Jul 8 14:43:25 2025] sd 7:0:0:1: [sdb] Write cache: enabled, read cache: enabled, supports DPO and FUA [Tue Jul 8 14:43:25 2025] sd 7:0:0:1: [sdb] Attached SCSI disk [Tue Jul 8 14:43:26 2025] sd 7:0:0:1: [sdb] Synchronizing SCSI cache [Tue Jul 8 14:43:32 2025] Buffer I/O error on dev sdb, logical block 3263692798, async page read [Tue Jul 8 14:43:32 2025] scsi 7:0:0:1: alua: Detached [Tue Jul 8 14:43:40 2025] scsi host7: iSCSI Initiator over TCP/IP [Tue Jul 8 14:43:40 2025] scsi 7:0:0:1: Direct-Access SYNOLOGY Storage 4.0 PQ: 0 ANSI: 5 [Tue Jul 8 14:43:40 2025] scsi 7:0:0:1: alua: supports implicit TPGS [Tue Jul 8 14:43:40 2025] scsi 7:0:0:1: alua: device naa.60014059c706a83d1e77d47a8da5c5d1 port group 0 rel port 1 [Tue Jul 8 14:43:40 2025] sd 7:0:0:1: Attached scsi generic sg1 type 0 [Tue Jul 8 14:43:40 2025] sd 7:0:0:1: alua: transition timeout set to 60 seconds [Tue Jul 8 14:43:40 2025] sd 7:0:0:1: alua: port group 00 state A non-preferred supports TOlUSNA [Tue Jul 8 14:43:46 2025] sd 7:0:0:1: [sdb] 26109542400 512-byte logical blocks: (13.4 TB/12.2 TiB) [Tue Jul 8 14:43:46 2025] sd 7:0:0:1: [sdb] Write Protect is off [Tue Jul 8 14:43:46 2025] sd 7:0:0:1: [sdb] Mode Sense: 43 00 10 08 [Tue Jul 8 14:43:46 2025] sd 7:0:0:1: [sdb] Write cache: enabled, read cache: enabled, supports DPO and FUA [Tue Jul 8 14:43:49 2025] ldm_validate_partition_table(): Disk read failed. [Tue Jul 8 14:43:49 2025] sdb: unable to read partition table [Tue Jul 8 14:43:49 2025] sd 7:0:0:1: [sdb] Attached SCSI disk [Tue Jul 8 14:43:49 2025] sd 7:0:0:1: [sdb] Synchronizing SCSI cache [Tue Jul 8 14:43:49 2025] scsi 7:0:0:1: alua: Detached
-
RE: Internal error: Not_found after Vinchin backup
So even taking Vinchin out of the picture for now, I am still worried that xcp-ng will run the Garbage Collection and disconnect the ST, brining the VM down again.
-
RE: Internal error: Not_found after Vinchin backup
Just keeps bouncing over and over
-
RE: Internal error: Not_found after Vinchin backup
Is that due to the Master being disconnected from the SR. That is the one I was referring to in my previous Topic post "Pool Master".
-
RE: Internal error: Not_found after Vinchin backup
@olivierlambert I did that on one and noticed a task API call: sr.scan I opened the raw log and seeing errors. AttachedSR Scan Errors.txt
-
RE: Internal error: Not_found after Vinchin backup
Sorry I'm just a bit confused because it is showing no snapshots on the VM view
-
RE: Internal error: Not_found after Vinchin backup
So I just click "Destroy VDI" on the snapshot?
-
RE: Internal error: Not_found after Vinchin backup
@olivierlambert Ok, will do. Synology says that xcp-ng is sending LUN disconnects to the LUN based on the dmesg on on the Synology.
iSCSI:target_core_tmr.c:629:core_tmr_lun_reset LUN_RESET: TMR starting for [fileio/Synology-VM-LUN/9c706a83-1e77-47a8-a5c5-18cfe815459d], tas: 1
-
RE: Internal error: Not_found after Vinchin backup
@olivierlambert Ok. Why does XOA show some with CBT and some without? How does that get set? I apologize, I am just trying to narrow this down. I appreciate all the help.
-
RE: Internal error: Not_found after Vinchin backup
@olivierlambert Is there anything scheduled by default in xcp-ng on Sundays? Or how often does it coalesce. Just trying to figure out why every Sunday these issues happen.