XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    VDI_IO_ERROR Continuous Replication on clean install.

    Scheduled Pinned Locked Moved Solved Xen Orchestra
    66 Posts 7 Posters 7.0k Views 8 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Tristis OrisT Offline
      Tristis Oris Top contributor
      last edited by

      Hi.
      We upgrade few servers from old pool, reinstalled fresh Xen 8.2.1, joined them as new pool, connected to same physical storage via fiber channel.
      i changed CR Backup job to new pool and now i got errors. Other pools (iscsi connection) still works.
      What that could be?
      log https://pastebin.com/fA6KYniK
      f410392d-0f5f-4c12-affb-2d368e12fada-image.png
      old vs new
      c0356729-9fba-4d1f-a4b3-5257603e73af-image.png

      1 Reply Last reply Reply Quote 0
      • Tristis OrisT Offline
        Tristis Oris Top contributor
        last edited by

        a bit of necroposting.
        Issue was fixed for me, not sure when. But now CR working with that storage, same config.

        1 Reply Last reply Reply Quote 1
        • olivierlambertO Offline
          olivierlambert Vates 🪐 Co-Founder CEO
          last edited by

          I would take a look at the SMlog, it's probably not an XO issue.

          1 Reply Last reply Reply Quote 0
          • EddieCh08666741E Offline
            EddieCh08666741
            last edited by EddieCh08666741

            Did you managed to solve this Tristis ?

            I start to have these VDI Error too fresh Xen 8.2 on continuous replications.

            I'm using xoa from sources.

            1 Reply Last reply Reply Quote 1
            • olivierlambertO Offline
              olivierlambert Vates 🪐 Co-Founder CEO
              last edited by

              Same answer, take a look at your logs 🙂

              1 Reply Last reply Reply Quote 0
              • EddieCh08666741E Offline
                EddieCh08666741
                last edited by

                Thanks for your reply @olivierlambert .

                1 Reply Last reply Reply Quote 0
                • EddieCh08666741E Offline
                  EddieCh08666741
                  last edited by olivierlambert

                  Is the following logs normal ? I tried to do CR to this server on SSD it fails < 5 mins. When I do CR between Sata Disk -> Sata Disk it works well. When I use another software like xackup to copy it managed to migrate over.

                  Dec 16 16:03:09 megaman SM: [19074] sr_update {'sr_uuid': '0c89eabe-f70c-a8df-d86e-8ebfa93acbd2', 'subtask_of': 'DummyRef:|1bd73082-34e9-4f14-bcef-fd6af84372d7|SR.stat', 'args': [], 'host_ref': 'OpaqueRe$
                  Dec 16 16:03:39 megaman SM: [19252] sr_update {'sr_uuid': '0c89eabe-f70c-a8df-d86e-8ebfa93acbd2', 'subtask_of': 'DummyRef:|7120995b-0b5f-4ffb-8615-0aee36bce30f|SR.stat', 'args': [], 'host_ref': 'OpaqueRe$
                  Dec 16 16:04:09 megaman SM: [19427] sr_update {'sr_uuid': '0c89eabe-f70c-a8df-d86e-8ebfa93acbd2', 'subtask_of': 'DummyRef:|e5077e90-da43-4aba-a3ab-4cb37a4bfa5c|SR.stat', 'args': [], 'host_ref': 'OpaqueRe$
                  Dec 16 16:04:39 megaman SM: [19603] sr_update {'sr_uuid': '0c89eabe-f70c-a8df-d86e-8ebfa93acbd2', 'subtask_of': 'DummyRef:|f325d88f-bd39-4759-b070-645cbbb9d225|SR.stat', 'args': [], 'host_ref': 'OpaqueRe$
                  Dec 16 16:05:09 megaman SM: [19915] sr_update {'sr_uuid': '0c89eabe-f70c-a8df-d86e-8ebfa93acbd2', 'subtask_of': 'DummyRef:|fd6270d5-e090-432c-bd1c-7c5837b7fd24|SR.stat', 'args': [], 'host_ref': 'OpaqueRe$
                  Dec 16 16:05:39 megaman SM: [20108] sr_update {'sr_uuid': '0c89eabe-f70c-a8df-d86e-8ebfa93acbd2', 'subtask_of': 'DummyRef:|ee983899-4c6f-49cf-a960-4bf7e4d6cd25|SR.stat', 'args': [], 'host_ref': 'OpaqueRe$
                  Dec 16 16:06:09 megaman SM: [20290] sr_update {'sr_uuid': '0c89eabe-f70c-a8df-d86e-8ebfa93acbd2', 'subtask_of': 'DummyRef:|d275ee38-1bac-4917-b929-e91553274e11|SR.stat', 'args': [], 'host_ref': 'OpaqueRe$
                  Dec 16 16:06:39 megaman SM: [20478] sr_update {'sr_uuid': '0c89eabe-f70c-a8df-d86e-8ebfa93acbd2', 'subtask_of': 'DummyRef:|cbf822ed-f23a-444e-b34e-9c4192c6a09a|SR.stat', 'args': [], 'host_ref': 'OpaqueRe$
                  Dec 16 16:07:09 megaman SM: [20658] sr_update {'sr_uuid': '0c89eabe-f70c-a8df-d86e-8ebfa93acbd2', 'subtask_of': 'DummyRef:|9f48a062-b351-4d50-bf2b-d8de31ee6d53|SR.stat', 'args': [], 'host_ref': 'OpaqueRe$
                  Dec 16 16:07:39 megaman SM: [20842] sr_update {'sr_uuid': '0c89eabe-f70c-a8df-d86e-8ebfa93acbd2', 'subtask_of': 'DummyRef:|e0ae9a2b-2f19-46ec-90e0-89e017cfc3ae|SR.stat', 'args': [], 'host_ref': 'OpaqueRe$
                  Dec 16 16:08:09 megaman SM: [21062] sr_update {'sr_uuid': '0c89eabe-f70c-a8df-d86e-8ebfa93acbd2', 'subtask_of': 'DummyRef:|4123612d-f2d7-4a8d-b500-6bc8e1aadb7d|SR.stat', 'args': [], 'host_ref': 'OpaqueRe$
                  Dec 16 16:08:27 megaman SM: [21308] lock: opening lock file /var/lock/sm/14e167a3-8cff-a180-4108-9d72795811d1/sr
                  Dec 16 16:08:27 megaman SM: [21308] lock: acquired /var/lock/sm/14e167a3-8cff-a180-4108-9d72795811d1/sr
                  Dec 16 16:08:27 megaman SM: [21308] ['/usr/sbin/td-util', 'query', 'vhd', '-vpfb', '/var/run/sr-mount/14e167a3-8cff-a180-4108-9d72795811d1/f3449a8c-a6fb-4fac-83b0-e7770bce5480.vhd']
                  Dec 16 16:08:27 megaman SM: [21308]   pread SUCCESS
                  Dec 16 16:08:27 megaman SM: [21308] vdi_snapshot {'sr_uuid': '14e167a3-8cff-a180-4108-9d72795811d1', 'subtask_of': 'DummyRef:|c89e8410-e7de-4495-ab2b-fee217285d01|VDI.snapshot', 'vdi_ref': 'OpaqueRef:a55$
                  Dec 16 16:08:27 megaman SM: [21308] Pause request for f3449a8c-a6fb-4fac-83b0-e7770bce5480
                  Dec 16 16:08:27 megaman SM: [21308] Calling tap-pause on host OpaqueRef:4f79dd6c-0bc5-42b8-bfc6-092fe7e291c5
                  Dec 16 16:08:27 megaman SM: [21341] lock: opening lock file /var/lock/sm/f3449a8c-a6fb-4fac-83b0-e7770bce5480/vdi
                  Dec 16 16:08:27 megaman SM: [21341] lock: acquired /var/lock/sm/f3449a8c-a6fb-4fac-83b0-e7770bce5480/vdi
                  Dec 16 16:08:27 megaman SM: [21341] Pause for f3449a8c-a6fb-4fac-83b0-e7770bce5480
                  Dec 16 16:08:27 megaman SM: [21341] Calling tap pause with minor 2
                  Dec 16 16:08:27 megaman SM: [21341] ['/usr/sbin/tap-ctl', 'pause', '-p', '26926', '-m', '2']
                  Dec 16 16:08:27 megaman SM: [21341]  = 0
                  Dec 16 16:08:27 megaman SM: [21341] lock: released /var/lock/sm/f3449a8c-a6fb-4fac-83b0-e7770bce5480/vdi
                  Dec 16 16:08:27 megaman SM: [21308] FileVDI._snapshot for f3449a8c-a6fb-4fac-83b0-e7770bce5480 (type 2)
                  
                  1 Reply Last reply Reply Quote 0
                  • Tristis OrisT Offline
                    Tristis Oris Top contributor
                    last edited by Tristis Oris

                    yes i think im dumb. waiting when you will look at logs which i not provided 😃

                    i see nothing interesting except one repeatable "Failed to lock /var/lock/sm/.nil/lvm on first attempt, blocked by PID 25021"

                    and task duration before fail always 5min. i think that some hardcoded timers?

                    Start: Dec 16, 2022, 12:44:50 PM
                    End: Dec 16, 2022, 12:50:15 PM
                    part 1 https://pastebin.com/xZAXEiq1
                    part 2 https://pastebin.com/Lmhermgx

                    Y 1 Reply Last reply Reply Quote 0
                    • Tristis OrisT Tristis Oris referenced this topic on
                    • Y Offline
                      yomono @Tristis Oris
                      last edited by

                      @Tristis-Oris Hey man, were you able to solve this? I'm facing the same issue after a reinstall. Continuous Replication fails exactly at 5 minutes.
                      Thanks in advance

                      Tristis OrisT 1 Reply Last reply Reply Quote 0
                      • Tristis OrisT Offline
                        Tristis Oris Top contributor @yomono
                        last edited by

                        @yomono nope still investigating. i got SR_NOT_SUPPORTED at log.

                        Y 1 Reply Last reply Reply Quote 0
                        • Y Offline
                          yomono @Tristis Oris
                          last edited by

                          @Tristis-Oris 😞

                          1 Reply Last reply Reply Quote 0
                          • olivierlambertO Offline
                            olivierlambert Vates 🪐 Co-Founder CEO
                            last edited by

                            Can you double check you are using a recent commit on master?

                            Y Tristis OrisT 2 Replies Last reply Reply Quote 0
                            • Y Offline
                              yomono @olivierlambert
                              last edited by

                              @olivierlambert In my case, I'm indeed using the latest commit. I played around with old commits yesterday (as old as two or three months) but same result. Right now, I'm using the latest (commited an hour ago).
                              I can share my SMlogs if you want but I'm also getting the "SR_NOT SUPPORTED" error. I tried to backup different VMs on different sources servers, and to different servers destinations. My next try will be reinstalling XO

                              1 Reply Last reply Reply Quote 0
                              • olivierlambertO Offline
                                olivierlambert Vates 🪐 Co-Founder CEO
                                last edited by

                                Indeed, try to wipe it entirely, and rebuild.

                                1 Reply Last reply Reply Quote 0
                                • Tristis OrisT Offline
                                  Tristis Oris Top contributor @olivierlambert
                                  last edited by

                                  @olivierlambert that a old problem) but yes, usually about latest. Just repeated all tests on 3c7d3.

                                  1. CR stop working right after clean 8.2.1 installation. at same day.
                                  2. replaced FC to iscsi (only bcz i need iscsi here 😃 ), created new LUN > same story.
                                  3. 8.2.0 clean install, no updated - CR works.
                                  4. don't tried usual backups to this SR, don't need them here.
                                  5. VM migration works at any setup.

                                  problem only with one storage Dell EMC PowerVault ME4012. all other huawei, iscsi - works fine. But not sure if i have another clean 8.2.1 pools. Maybe only some nodes.

                                  logs now. I'm doing 2 CR backups.

                                  1. Xen 8.2.1, clean host, no any VM. iscsi 60Tb lun. Xen show only 50Tb
                                  Jan 16 15:15:30 test SMGC: [10010] SR f3fd ('LUN') (2 VDIs in 2 VHD trees):
                                  Jan 16 15:15:30 test SMGC: [10010]         a459d14f[VHD](50.000G//50.105G|ao)
                                  Jan 16 15:15:30 test SMGC: [10010]         1789f7a7[VHD](50.000G//50.105G|ao)
                                  
                                  Jan 16 15:15:47 test SMGC: [10338] SR f3fd ('LUN') (1 VDIs in 1 VHD trees):
                                  Jan 16 15:15:47 test SMGC: [10338]         a459d14f[VHD](50.000G//50.105G|ao)
                                  Jan 16 15:15:47 test SMGC: [10338]
                                  

                                  here it 60Tb.
                                  9623bb15-87f4-4396-8165-b98c480cb25d-изображение.png

                                  2 vms, 2 error - SR_NOT_SUPPORTED

                                  Jan 16 15:10:43 test SM: [6596] result: {'params_nbd': 'nbd:unix:/run/blktap-control/nbd/f3fd46f7-5ce4-e5e0-53e9-059ce4775a7b/1789f7a7-05a0-411c-aa80-dcc659f8b45f', 'o_direct_reason': 'SR_NOT_SUPPORTED', 'params': '/dev/sm/backend/f3fd46f7-5ce4-e5e0-53e9-059ce4775a7b/1789f7a7-05a0-411c-aa80-dcc659f8b45f', 'o_direct': True, 'xenstore_data': {'scsi/0x12/0x80': 'AIAAEjE3ODlmN2E3LTA1YTAtNDEgIA==', 'scsi/0x12/0x83': 'AIMAMQIBAC1YRU5TUkMgIDE3ODlmN2E3LTA1YTAtNDExYy1hYTgwLWRjYzY1OWY4YjQ1ZiA=', 'vdi-uuid': '1789f7a7-05a0-411c-aa80-dcc659f8b45f', 'mem-pool': 'f3fd46f7-5ce4-e5e0-53e9-059ce4775a7b'}}
                                  Jan 16 15:10:49 test SM: [6834] result: {'params_nbd': 'nbd:unix:/run/blktap-control/nbd/f3fd46f7-5ce4-e5e0-53e9-059ce4775a7b/a459d14f-ae92-4a77-8574-30442126624b', 'o_direct_reason': 'SR_NOT_SUPPORTED', 'params': '/dev/sm/backend/f3fd46f7-5ce4-e5e0-53e9-059ce4775a7b/a459d14f-ae92-4a77-8574-30442126624b', 'o_direct': True, 'xenstore_data': {'scsi/0x12/0x80': 'AIAAEmE0NTlkMTRmLWFlOTItNGEgIA==', 'scsi/0x12/0x83': 'AIMAMQIBAC1YRU5TUkMgIGE0NTlkMTRmLWFlOTItNGE3Ny04NTc0LTMwNDQyMTI2NjI0YiA=', 'vdi-uuid': 'a459d14f-ae92-4a77-8574-30442126624b', 'mem-pool': 'f3fd46f7-5ce4-e5e0-53e9-059ce4775a7b'}}
                                  

                                  and some small like

                                  Jan 16 15:14:30 test SM: [9387] Failed to lock /var/lock/sm/.nil/lvm on first attempt, blocked by PID 9357
                                  	Line  184: Jan 16 15:10:29 test SM: [6376] Failed to lock /var/lock/sm/.nil/lvm on first attempt, blocked by PID 6348
                                  	Line  571: Jan 16 15:10:59 test SM: [7141] Failed to lock /var/lock/sm/.nil/lvm on first attempt, blocked by PID 7115
                                  	Line  717: Jan 16 15:11:30 test SM: [7457] Failed to lock /var/lock/sm/.nil/lvm on first attempt, blocked by PID 7428
                                  	Line 1927: Jan 16 15:15:43 test SM: [10146] unlink of attach_info failed
                                  

                                  nothing more with error status at log.

                                  1. because of weird size, tried 8.2.1 with iscsi 20Tb LUN.
                                    same result SR_NOT_SUPPORTED.

                                  2. now 8.2.0 clean install, no any updates.
                                    it works.

                                  here both hosts connected to same SR.
                                  56fe4456-4d1b-4fc7-85b5-6ecfd6090421-изображение.png

                                  1. 8.2.0 full updates > 8.2.1 release/yangtze/master/58
                                    CR still working.

                                  4.1. unmount LUN, mount again.
                                  working.

                                  1 Reply Last reply Reply Quote 0
                                  • olivierlambertO Offline
                                    olivierlambert Vates 🪐 Co-Founder CEO
                                    last edited by

                                    I have no idea, sorry. So to recap:

                                    • doesn't work on a 8.2.1 fresh install with updates
                                    • works on older 8.2.0
                                    • work on 8.2.0 updated to 8.2.1

                                    It doesn't sound like an XO bug in your case.

                                    Tristis OrisT Y 2 Replies Last reply Reply Quote 0
                                    • Tristis OrisT Offline
                                      Tristis Oris Top contributor @olivierlambert
                                      last edited by

                                      @olivierlambert yes. so what to do next?)

                                      1 Reply Last reply Reply Quote 0
                                      • olivierlambertO Offline
                                        olivierlambert Vates 🪐 Co-Founder CEO
                                        last edited by

                                        Trying to figure the setup so we can try to reproduce, and also switching various things until there's a clear pattern.

                                        Eg: can you try with an NFS share to see if you have the same issue? If it's iSCSI related, that would help us to investigate.

                                        Tristis OrisT 1 Reply Last reply Reply Quote 0
                                        • Tristis OrisT Offline
                                          Tristis Oris Top contributor @olivierlambert
                                          last edited by

                                          i can't, this is only SAN storage.
                                          any point to test on 8.3 alpha?

                                          1 Reply Last reply Reply Quote 0
                                          • olivierlambertO Offline
                                            olivierlambert Vates 🪐 Co-Founder CEO
                                            last edited by

                                            You can, it's still another test that might help us to pinpoint something

                                            1 Reply Last reply Reply Quote 0
                                            • Y Offline
                                              yomono @olivierlambert
                                              last edited by

                                              @olivierlambert I would like to add that after this recap I realized... I also had to reinstall XCP so in my case it's also a fresh 8.2.1 install! At least. knowing that, I can do a 8.2.0 + upgrade installation.. (that's what I used to have). I can also try 8.3 alpha, it's not like I have anything to lose at this point (that server is only to contain XO, there is nothing else there)
                                              Anyways.. the fresh 8.2.1 install is definitely the common point here

                                              1 Reply Last reply Reply Quote 0
                                              • First post
                                                Last post