VDI_IO_ERROR Continuous Replication on clean install.
-
@yomono nope still investigating. i got
SR_NOT_SUPPORTED
at log. -
-
Can you double check you are using a recent commit on
master
? -
@olivierlambert In my case, I'm indeed using the latest commit. I played around with old commits yesterday (as old as two or three months) but same result. Right now, I'm using the latest (commited an hour ago).
I can share my SMlogs if you want but I'm also getting the "SR_NOT SUPPORTED" error. I tried to backup different VMs on different sources servers, and to different servers destinations. My next try will be reinstalling XO -
Indeed, try to wipe it entirely, and rebuild.
-
@olivierlambert that a old problem) but yes, usually about latest. Just repeated all tests on
3c7d3
.- CR stop working right after clean 8.2.1 installation. at same day.
- replaced FC to iscsi (only bcz i need iscsi here ), created new LUN > same story.
- 8.2.0 clean install, no updated - CR works.
- don't tried usual backups to this SR, don't need them here.
- VM migration works at any setup.
problem only with one storage
Dell EMC PowerVault ME4012
. all other huawei, iscsi - works fine. But not sure if i have another clean 8.2.1 pools. Maybe only some nodes.logs now. I'm doing 2 CR backups.
- Xen 8.2.1, clean host, no any VM. iscsi 60Tb lun. Xen show only 50Tb
Jan 16 15:15:30 test SMGC: [10010] SR f3fd ('LUN') (2 VDIs in 2 VHD trees): Jan 16 15:15:30 test SMGC: [10010] a459d14f[VHD](50.000G//50.105G|ao) Jan 16 15:15:30 test SMGC: [10010] 1789f7a7[VHD](50.000G//50.105G|ao) Jan 16 15:15:47 test SMGC: [10338] SR f3fd ('LUN') (1 VDIs in 1 VHD trees): Jan 16 15:15:47 test SMGC: [10338] a459d14f[VHD](50.000G//50.105G|ao) Jan 16 15:15:47 test SMGC: [10338]
here it 60Tb.
2 vms, 2 error -
SR_NOT_SUPPORTED
Jan 16 15:10:43 test SM: [6596] result: {'params_nbd': 'nbd:unix:/run/blktap-control/nbd/f3fd46f7-5ce4-e5e0-53e9-059ce4775a7b/1789f7a7-05a0-411c-aa80-dcc659f8b45f', 'o_direct_reason': 'SR_NOT_SUPPORTED', 'params': '/dev/sm/backend/f3fd46f7-5ce4-e5e0-53e9-059ce4775a7b/1789f7a7-05a0-411c-aa80-dcc659f8b45f', 'o_direct': True, 'xenstore_data': {'scsi/0x12/0x80': 'AIAAEjE3ODlmN2E3LTA1YTAtNDEgIA==', 'scsi/0x12/0x83': 'AIMAMQIBAC1YRU5TUkMgIDE3ODlmN2E3LTA1YTAtNDExYy1hYTgwLWRjYzY1OWY4YjQ1ZiA=', 'vdi-uuid': '1789f7a7-05a0-411c-aa80-dcc659f8b45f', 'mem-pool': 'f3fd46f7-5ce4-e5e0-53e9-059ce4775a7b'}} Jan 16 15:10:49 test SM: [6834] result: {'params_nbd': 'nbd:unix:/run/blktap-control/nbd/f3fd46f7-5ce4-e5e0-53e9-059ce4775a7b/a459d14f-ae92-4a77-8574-30442126624b', 'o_direct_reason': 'SR_NOT_SUPPORTED', 'params': '/dev/sm/backend/f3fd46f7-5ce4-e5e0-53e9-059ce4775a7b/a459d14f-ae92-4a77-8574-30442126624b', 'o_direct': True, 'xenstore_data': {'scsi/0x12/0x80': 'AIAAEmE0NTlkMTRmLWFlOTItNGEgIA==', 'scsi/0x12/0x83': 'AIMAMQIBAC1YRU5TUkMgIGE0NTlkMTRmLWFlOTItNGE3Ny04NTc0LTMwNDQyMTI2NjI0YiA=', 'vdi-uuid': 'a459d14f-ae92-4a77-8574-30442126624b', 'mem-pool': 'f3fd46f7-5ce4-e5e0-53e9-059ce4775a7b'}}
and some small like
Jan 16 15:14:30 test SM: [9387] Failed to lock /var/lock/sm/.nil/lvm on first attempt, blocked by PID 9357 Line 184: Jan 16 15:10:29 test SM: [6376] Failed to lock /var/lock/sm/.nil/lvm on first attempt, blocked by PID 6348 Line 571: Jan 16 15:10:59 test SM: [7141] Failed to lock /var/lock/sm/.nil/lvm on first attempt, blocked by PID 7115 Line 717: Jan 16 15:11:30 test SM: [7457] Failed to lock /var/lock/sm/.nil/lvm on first attempt, blocked by PID 7428 Line 1927: Jan 16 15:15:43 test SM: [10146] unlink of attach_info failed
nothing more with error status at log.
-
because of weird size, tried 8.2.1 with iscsi 20Tb LUN.
same resultSR_NOT_SUPPORTED
. -
now 8.2.0 clean install, no any updates.
it works.
here both hosts connected to same SR.
- 8.2.0 full updates > 8.2.1 release/yangtze/master/58
CR still working.
4.1. unmount LUN, mount again.
working. -
I have no idea, sorry. So to recap:
- doesn't work on a 8.2.1 fresh install with updates
- works on older 8.2.0
- work on 8.2.0 updated to 8.2.1
It doesn't sound like an XO bug in your case.
-
@olivierlambert yes. so what to do next?)
-
Trying to figure the setup so we can try to reproduce, and also switching various things until there's a clear pattern.
Eg: can you try with an NFS share to see if you have the same issue? If it's iSCSI related, that would help us to investigate.
-
i can't, this is only SAN storage.
any point to test on 8.3 alpha? -
You can, it's still another test that might help us to pinpoint something
-
@olivierlambert I would like to add that after this recap I realized... I also had to reinstall XCP so in my case it's also a fresh 8.2.1 install! At least. knowing that, I can do a 8.2.0 + upgrade installation.. (that's what I used to have). I can also try 8.3 alpha, it's not like I have anything to lose at this point (that server is only to contain XO, there is nothing else there)
Anyways.. the fresh 8.2.1 install is definitely the common point here -
Also with iSCSI storage, right?
-
@olivierlambert not really. This time is just local ext storage, SATA drives.
-
In LVM or thin? It might be 2 different problems, so I'm trying to sort this out.
-
@olivierlambert both! I have both mixed in my servers and I tried in both when I did the tests
-
just remember i have one server with fresh 8.2.1 and nfs backups to TrueNAS. it working.
will do other tests tomorrow. -
@olivierlambert
sr_not_supported
that not a error and not a reason. That because of default multipath Dell config for 3xxx series. Persist at 8.2.0 where CR working, so that just a warning.
As we have no any problems before, we never investigate to this setting. My bad again yay.Replaced it to official for 4xxx and this warning gone. I see at 8.3 it already more universal for any generation.
device { vendor "DellEMC" product "ME4" path_grouping_policy "group_by_prio" path_checker "tur" hardware_handler "1 alua" prio "alua" failback immediate path_selector "service-time 0" }
since it no default config for huawei, so we always used the official one.
device { vendor "HUAWEI" product "XSG1" path_grouping_policy multibus path_checker tur prio const path_selector "round-robin 0" failback immediate fast_io_fail_tmo 5 dev_loss_tmo 30 }
-
8.2.1:
-
CR not working:
both huawei, dell iscsi - multipath enabled
both huawei, dell iscsi - multipath disabled -
working:
nfs vm disk
local thin\ext
local thick\lvm -
8.3
-
working:
both huawei, dell iscsi - multipath enabled
local thick\lvm
and now interesting. After i solved this false warning, detach extra hosts from pool, detach all additional links (trunk, backup) to decrease comunications and log itself - it's no any SMlog generated during backup task.
MP enabled - with 2nd link for backup https://pastebin.com/URcnDckR
MP enabled - only Mng link, no SMlog generated https://pastebin.com/RHw40uzg -
-
I have the impression it's good news, but I'm not 100% sure to get it, can you rephrase a bit your conclusion?
-
if i have no smlog - xen\dom0 not related with backup task. right?
smlog that usualy i got during this 5min have no any errors anyway, only some locking operations.
And it always takes 5min, some hardcoded timings?don't forget that problem also happens with FC connection, so it may concern any block based storage types.