XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    VDI_IO_ERROR(Device I/O errors) when you run scheduled backup

    Scheduled Pinned Locked Moved Xen Orchestra
    66 Posts 11 Posters 17.8k Views 7 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • S Offline
      shorian @olivierlambert
      last edited by shorian

      @olivierlambert Nothing jumps out from dmesg and nothing useful in SMlog when running the CR, however do get the following exceptions in SMlog upon boot (same error for each disk array):

       SMGC: [24044] *~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*
       SMGC: [24044]          ***********************
       SMGC: [24044]          *  E X C E P T I O N  *
       SMGC: [24044]          ***********************
       SMGC: [24044] gc: EXCEPTION <class 'util.SMException'>, SR 7e772759-f44e-af14-eb45-e839cc67689c not attached on this host
       SMGC: [24044]   File "/opt/xensource/sm/cleanup.py", line 3354, in gc
       SMGC: [24044]     _gc(None, srUuid, dryRun)
       SMGC: [24044]   File "/opt/xensource/sm/cleanup.py", line 3233, in _gc
       SMGC: [24044]     sr = SR.getInstance(srUuid, session)
       SMGC: [24044]   File "/opt/xensource/sm/cleanup.py", line 1552, in getInstance
       SMGC: [24044]     return FileSR(uuid, xapi, createLock, force)
       SMGC: [24044]   File "/opt/xensource/sm/cleanup.py", line 2330, in __init__
       SMGC: [24044]     SR.__init__(self, uuid, xapi, createLock, force)
       SMGC: [24044]   File "/opt/xensource/sm/cleanup.py", line 1582, in __init__
       SMGC: [24044]     raise util.SMException("SR %s not attached on this host" % uuid)
       SMGC: [24044]
       SMGC: [24044] *~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*
       SMGC: [24044] * * * * * SR 7e772759-f44e-af14-eb45-e839cc67689c: ERROR
       SMGC: [24044]
       SM: [24066] lock: opening lock file /var/lock/sm/7e772759-f44e-af14-eb45-e839cc67689c/sr
       SM: [24066] sr_update {'sr_uuid': '7e772759-f44e-af14-eb45-e839cc67689c', 'subtask_of': 'DummyRef:|ed453e0f-248d-403f-9499-ee7255fdf429|SR.stat', 'args': [], 'host_ref': 'OpaqueRef:c1b3f3e8-579b-4b35-9bb6-dcad830583c3', 'session_ref': 'OpaqueRef:dafc19b5-65af-4501-a151-1999d8e7f550', 'device_config': {'device': '/dev/disk/by-id/scsi-2ad3a730200d00000-part1', 'SRmaster': 'true'}, 'command': 'sr_update', 'sr_ref': 'OpaqueRef:bfd47d28-1be1-4dd6-8a7b-2c0a986b8d47', 'local_cache_sr': '34bd3a97-3562-75af-a24e-a266caf368e3'}
       SM: [24088] lock: opening lock file /var/lock/sm/7e772759-f44e-af14-eb45-e839cc67689c/sr
       SM: [24088] lock: acquired /var/lock/sm/7e772759-f44e-af14-eb45-e839cc67689c/sr
       SM: [24088] sr_scan {'sr_uuid': '7e772759-f44e-af14-eb45-e839cc67689c', 'subtask_of': 'DummyRef:|017c9f00-7289-4983-a942-f91c830fed33|SR.scan', 'args': [], 'host_ref': 'OpaqueRef:c1b3f3e8-579b-4b35-9bb6-dcad830583c3', 'session_ref': 'OpaqueRef:a907696c-b7cf-4ac8-b772-b22833073585', 'device_config': {'device': '/dev/disk/by-id/scsi-2ad3a730200d00000-part1', 'SRmaster': 'true'}, 'command': 'sr_scan', 'sr_ref': 'OpaqueRef:bfd47d28-1be1-4dd6-8a7b-2c0a986b8d47', 'local_cache_sr': '34bd3a97-3562-75af-a24e-a266caf368e3'}
       SM: [24088] ['/usr/bin/vhd-util', 'scan', '-f', '-m', '/var/run/sr-mount/7e772759-f44e-af14-eb45-e839cc67689c/*.vhd']
       SM: [24088]   pread SUCCESS
       SM: [24088] ['ls', '/var/run/sr-mount/7e772759-f44e-af14-eb45-e839cc67689c', '-1', '--color=never']
       SM: [24088]   pread SUCCESS
       SM: [24088] lock: opening lock file /var/lock/sm/7e772759-f44e-af14-eb45-e839cc67689c/running
       SM: [24088] lock: tried lock /var/lock/sm/7e772759-f44e-af14-eb45-e839cc67689c/running, acquired: True (exists: True)
       SM: [24088] lock: released /var/lock/sm/7e772759-f44e-af14-eb45-e839cc67689c/running
       SM: [24088] Kicking GC
       SMGC: [24088] === SR 7e772759-f44e-af14-eb45-e839cc67689c: gc ===
       SMGC: [24104] Will finish as PID [24105]
       SM: [24105] lock: opening lock file /var/lock/sm/7e772759-f44e-af14-eb45-e839cc67689c/running
       SMGC: [24088] New PID [24104]
       SM: [24105] lock: opening lock file /var/lock/sm/7e772759-f44e-af14-eb45-e839cc67689c/gc_active
       SM: [24088] lock: released /var/lock/sm/7e772759-f44e-af14-eb45-e839cc67689c/sr
       SM: [24105] lock: opening lock file /var/lock/sm/7e772759-f44e-af14-eb45-e839cc67689c/sr
       SMGC: [24105] Found 0 cache files
       SM: [24105] lock: tried lock /var/lock/sm/7e772759-f44e-af14-eb45-e839cc67689c/gc_active, acquired: True (exists: True)
       SM: [24105] lock: tried lock /var/lock/sm/7e772759-f44e-af14-eb45-e839cc67689c/sr, acquired: True (exists: True)
       SM: [24105] ['/usr/bin/vhd-util', 'scan', '-f', '-m', '/var/run/sr-mount/7e772759-f44e-af14-eb45-e839cc67689c/*.vhd']
       SM: [24105]   pread SUCCESS
      

      Subsequent to above, found that get identical error when reattaching SR (after detaching it).

      xapi-explore-sr does not show anything interesting

      Thanks chap; v kind of you

      1 Reply Last reply Reply Quote 0
      • olivierlambertO Offline
        olivierlambert Vates ๐Ÿช Co-Founder CEO
        last edited by

        @shorian said in VDI_IO_ERROR(Device I/O errors) when you run scheduled backup:

        SR 7e772759-f44e-af14-eb45-e839cc67689c not attached on this host

        Weird. Is it a shared SR? If it's the case, the master node need to have access to the SR (because coalesce mechanism works on the master).

        S 1 Reply Last reply Reply Quote 0
        • S Offline
          shorian @olivierlambert
          last edited by

          @olivierlambert Nope, not shared. Standalone box with two local disk arrays, both reporting the same error on boot / reattaching, yet seem performant for normal operations. We only get the exception error in SMlog when booting and frequent (but not always) VDI_IO_ERRORs when taking backups. VMs themselves seem to be fine. Quite strange

          Both SRs are ext4 in RAID1, one array of 2xSSD and one of 2xSATA.

          Should I try reinstalling the host?

          1 Reply Last reply Reply Quote 0
          • olivierlambertO Offline
            olivierlambert Vates ๐Ÿช Co-Founder CEO
            last edited by

            That's weird. Can you double check there's no "ghost" host that have been there bore? Eg with xe host-list.

            Also a xe sr-param-list uuid=7e772759-f44e-af14-eb45-e839cc67689c to see if we can spot anything.

            S 1 Reply Last reply Reply Quote 0
            • S Offline
              shorian @olivierlambert
              last edited by

              @olivierlambert Afraid nothing interesting:

              # xe host-list
              uuid ( RO)                : 3d60f1d4-595e-4428-9c24-6409db9593bc
                        name-label ( RW): xen11
                  name-description ( RW): ABC
              
              # xe sr-param-list uuid=7e772759-f44e-af14-eb45-e839cc67689c
              
              uuid ( RO)                    : 7e772759-f44e-af14-eb45-e839cc67689c
                            name-label ( RW): Xen11 SSD
                      name-description ( RW): Xen11 SSD
                                  host ( RO): xen11
                    allowed-operations (SRO): VDI.enable_cbt; VDI.list_changed_blocks; unplug; plug; PBD.create; VDI.disable_cbt; update; PBD.destroy; VDI.resize; VDI.clone; VDI.data_destroy; scan; VDI.snapshot; VDI.mirror; VDI.create; VDI.destroy; VDI.set_on_boot
                    current-operations (SRO):
                                  VDIs (SRO):
                                  PBDs (SRO): 9e967e89-7263-40b5-5706-cd2d449b7192
                    virtual-allocation ( RO): 0
                  physical-utilisation ( RO): 75665408
                         physical-size ( RO): 491294261248
                                  type ( RO): ext
                          content-type ( RO): user
                                shared ( RW): false
                         introduced-by ( RO): <not in database>
                           is-tools-sr ( RO): false
                          other-config (MRW):
                             sm-config (MRO): devserial: scsi-2ad3a730200d00000
                                 blobs ( RO):
                   local-cache-enabled ( RO): false
                                  tags (SRW):
                             clustered ( RO): false
              
              1 Reply Last reply Reply Quote 0
              • olivierlambertO Offline
                olivierlambert Vates ๐Ÿช Co-Founder CEO
                last edited by

                What about xe pbd-param-list uuid=9e967e89-7263-40b5-5706-cd2d449b7192?

                S 1 Reply Last reply Reply Quote 0
                • S Offline
                  shorian @olivierlambert
                  last edited by

                  ๐Ÿ˜ž

                  # xe pbd-param-list uuid=9e967e89-7263-40b5-5706-cd2d449b7192
                  uuid ( RO)                  : 9e967e89-7263-40b5-5706-cd2d449b7192
                       host ( RO) [DEPRECATED]: 3d60f1d4-595e-4428-9c24-6409db9593bc
                               host-uuid ( RO): 3d60f1d4-595e-4428-9c24-6409db9593bc
                         host-name-label ( RO): xen11
                                 sr-uuid ( RO): 7e772759-f44e-af14-eb45-e839cc67689c
                           sr-name-label ( RO): Xen11 SSD
                           device-config (MRO): device: /dev/disk/by-id/scsi-2ad3a730200d00000-part1
                      currently-attached ( RO): true
                            other-config (MRW): storage_driver_domain: OpaqueRef:ce759085-6e0d-4484-9f8a-abf75b822f75
                  
                  1 Reply Last reply Reply Quote 0
                  • olivierlambertO Offline
                    olivierlambert Vates ๐Ÿช Co-Founder CEO
                    last edited by

                    Indeed, nothing special here.

                    S 1 Reply Last reply Reply Quote 0
                    • S Offline
                      shorian @olivierlambert
                      last edited by shorian

                      I'll try a fresh install over the next couple of days and see if it reoccurs. Looking at the other boxes, I have the same error on one of the other hosts, but it's not across all of them despite identical installs and hardware.

                      Thanks for your efforts and help, shame there wasn't an easy answer but let's see if it reoccurs after a completely fresh install.

                      1 Reply Last reply Reply Quote 1
                      • olivierlambertO Offline
                        olivierlambert Vates ๐Ÿช Co-Founder CEO
                        last edited by

                        Please keep us posted ๐Ÿ™‚

                        S 1 Reply Last reply Reply Quote 0
                        • S Offline
                          shorian @olivierlambert
                          last edited by shorian

                          Have just run a clean install - identical error in SMlog ๐Ÿ˜ž

                          Following was the last message in dmesg which seems counter to the SMlog error

                          EXT4-fs (dm-0): mounted filesystem with ordered data mode. Opts: (null)
                          

                          Can send full SMlog and anything else you can think of - but it's either a hardware error (in which case why identical error accross two hosts?) or an incorrect setting in the install process. I'm at a loss

                          S 1 Reply Last reply Reply Quote 0
                          • S Offline
                            shorian @shorian
                            last edited by

                            Have tried reformatting disks after changing the boot record - previously was GPT with ext4, have tried using MBR instead but no difference, even after another fresh install.

                            I'm going through a comparison with another machine that is identical kit but doesn't have those errors, but can't see any differences whatsoever ๐Ÿ˜ž

                            1 Reply Last reply Reply Quote 0
                            • olivierlambertO Offline
                              olivierlambert Vates ๐Ÿช Co-Founder CEO
                              last edited by

                              Do you have a 4K sector disk?

                              S 1 Reply Last reply Reply Quote 0
                              • S Offline
                                shorian @olivierlambert
                                last edited by shorian

                                The (large) SATA drives are, but the SSDs are not; problem is reported for both.

                                # parted /dev/sdb print
                                
                                Model: AVAGO MR9363-4i (scsi)
                                Disk /dev/sdb: 6001GB
                                Sector size (logical/physical): 512B/4096B
                                Partition Table: gpt
                                Disk Flags:
                                
                                Number  Start   End     Size    File system     Name  Flags
                                 5      1049kB  4296MB  4295MB  ext3
                                 2      4296MB  23.6GB  19.3GB
                                 1      23.6GB  43.0GB  19.3GB  ext3
                                 4      43.0GB  43.5GB  537MB   fat16                 boot
                                 6      43.5GB  44.6GB  1074MB  linux-swap(v1)
                                 3      44.6GB  6001GB  5956GB                        lvm
                                
                                # parted /dev/sda print
                                
                                Model: AVAGO MR9363-4i (scsi)
                                Disk /dev/sda: 500GB
                                Sector size (logical/physical): 512B/512B
                                Partition Table: gpt
                                Disk Flags:
                                
                                Number  Start   End    Size   File system  Name              Flags
                                 1      1049kB  500GB  500GB               Linux filesystem
                                

                                Also contrary to my thoughts above - the other machines do have the same errors in SMlog, only they've not been rebooted for many moons so it was in the archived off logs rather than current. So the problem is consistent across the entire estate.

                                1 Reply Last reply Reply Quote 0
                                • olivierlambertO Offline
                                  olivierlambert Vates ๐Ÿช Co-Founder CEO
                                  last edited by

                                  Before refocusing on the original issue, the SMlog error you have on reboot/boot, is it preventing you to use the SR?

                                  S 1 Reply Last reply Reply Quote 0
                                  • S Offline
                                    shorian @olivierlambert
                                    last edited by shorian

                                    No, everything seems to work just fine, except for the failure errors when running CR backups, and the messages in the logs when booting / attaching. The VMs themselves are performant, can be migrated on and off the host, and are handling production loads. If it wasn't for the CR errors, I would never have noticed them.

                                    Meanwhile I've dug through the old Xen bug lists and found some similar SR issues but they seem to result in failure to mount but nothing that matches our specific situation.

                                    1 Reply Last reply Reply Quote 0
                                    • olivierlambertO Offline
                                      olivierlambert Vates ๐Ÿช Co-Founder CEO
                                      last edited by

                                      Hmm that's weird. I don't know what's causing this. Can you try with a local LVM SR and see if you have the exact same issue? (trying to see if it's related to the storage driver)

                                      S 1 Reply Last reply Reply Quote 0
                                      • S Offline
                                        shorian @olivierlambert
                                        last edited by shorian

                                        Fresh install, LVM on boot device, same error, created second disk as LVM through XO, same error, rebooted, same pair of errors.

                                        Error appears to be agnostic of storage type.

                                        Found a host with same drives, same CPU, same config, different RAID card (Adaptec instead of LSI), same error.

                                        Error appears to be agnostic of storage manufacturer.

                                        It seems to be the gc that causes it - elsewhere in the logs gc takes a lock, checks, decides there's nothing to cleanup, releases lock. Just when it boots, or when it is attached, its as though there is a slight delay where the storage reports as not being attached, fires the exception into SMlog, then gc tries again and connects just fine and gets the lock. We all continue as before, no other issues, no other errors, until either a reboot or a storage re-attach via CLI or XO. The EXCEPTION error does not occur during normal usage, nor during a backup.

                                        Looking into xensource.log, we do get some errors as per below:

                                        xapi: [ warn||1568 INET :::80|event.from D:87552104bbaf|xapi_message] get_since_for_events: no in_memory_cache!
                                        xapi: [debug||1569 INET :::80|host.call_plugin R:91601df4a1d6|audit] Host.call_plugin host = 'c7634b1b-79b1-4eb2-840b-ecc401b7743d (xcp-ng-xen11)'; plugin = 'updater.py'; fn = 'check_update' args = [ 'hidden' ]
                                        xapi: [ warn||1568 INET :::80|event.from D:87552104bbaf|xapi_message] get_since_for_events: no in_memory_cache!
                                        xapi: [ warn||1570 INET :::80|event.from D:b4ea13ab245c|xapi_message] get_since_for_events: no in_memory_cache!
                                        xapi: [ warn||1570 INET :::80|event.from D:b4ea13ab245c|xapi_message] get_since_for_events: no in_memory_cache!
                                        xapi: [ warn||1571 INET :::80|event.from D:8b9b61c7b0b2|xapi_message] get_since_for_events: no in_memory_cache!
                                        xapi: [error||1572 INET :::80||backtrace] host.get_by_uuid D:e4eb5b632522 failed with exception Db_exn.Read_missing_uuid("host", "", "df8fe71d-f748-4df4-bd53-97d8232249e4")
                                        xapi: [error||1572 INET :::80||backtrace] Raised Db_exn.Read_missing_uuid("host", "", "df8fe71d-f748-4df4-bd53-97d8232249e4")
                                        xapi: [error||1572 INET :::80||backtrace] 1/9 xapi Raised at file ocaml/database/db_cache_impl.ml, line 232
                                        xapi: [error||1572 INET :::80||backtrace] 2/9 xapi Called from file ocaml/xapi/db_actions.ml, line 8179
                                        xapi: [error||1572 INET :::80||backtrace] 3/9 xapi Called from file ocaml/xapi/rbac.ml, line 223
                                        xapi: [error||1572 INET :::80||backtrace] 4/9 xapi Called from file ocaml/xapi/rbac.ml, line 231
                                        xapi: [error||1572 INET :::80||backtrace] 5/9 xapi Called from file ocaml/xapi/server_helpers.ml, line 100
                                        xapi: [error||1572 INET :::80||backtrace] 6/9 xapi Called from file ocaml/xapi/server_helpers.ml, line 121
                                        xapi: [error||1572 INET :::80||backtrace] 7/9 xapi Called from file lib/xapi-stdext-pervasives/pervasiveext.ml, line 24
                                        xapi: [error||1572 INET :::80||backtrace] 8/9 xapi Called from file map.ml, line 135
                                        xapi: [error||1572 INET :::80||backtrace] 9/9 xapi Called from file sexp_conv.ml, line 147
                                        xapi: [error||1572 INET :::80||backtrace]
                                        xapi: [ warn||1571 INET :::80|event.from D:8b9b61c7b0b2|xapi_message] get_since_for_events: no in_memory_cache!
                                        

                                        However, given this only seems to occur at the point of boot or attaching the SR, I'm not convinced that this error is related to the VDI_IO_ERROR that we're seeing when running CR backups, whereupon xensource has the following:

                                        xapi: [debug||257548 INET :::80|[XO] VDI Export R:2a08e12cc000|taskhelper] the status of R:2a08e12cc000 is failure; cannot set it to failure
                                        xapi: [error||257548 INET :::80||backtrace] [XO] VDI Export R:2a08e12cc000 failed with exception Server_error(VDI_IO_ERROR, [ Device I/O errors ])
                                        xapi: [error||257548 INET :::80||backtrace] Raised Server_error(VDI_IO_ERROR, [ Device I/O errors ])
                                        xapi: [error||257548 INET :::80||backtrace] 1/13 xapi Raised at file ocaml/xapi/vhd_tool_wrapper.ml, line 75
                                        xapi: [error||257548 INET :::80||backtrace] 2/13 xapi Called from file lib/xapi-stdext-pervasives/pervasiveext.ml, line 24
                                        xapi: [error||257548 INET :::80||backtrace] 3/13 xapi Called from file lib/xapi-stdext-pervasives/pervasiveext.ml, line 35
                                        xapi: [error||257548 INET :::80||backtrace] 4/13 xapi Called from file ocaml/xapi/export_raw_vdi.ml, line 49
                                        xapi: [error||257548 INET :::80||backtrace] 5/13 xapi Called from file lib/xapi-stdext-pervasives/pervasiveext.ml, line 24
                                        xapi: [error||257548 INET :::80||backtrace] 6/13 xapi Called from file lib/xapi-stdext-pervasives/pervasiveext.ml, line 35
                                        xapi: [error||257548 INET :::80||backtrace] 7/13 xapi Called from file ocaml/xapi/export_raw_vdi.ml, line 86
                                        xapi: [error||257548 INET :::80||backtrace] 8/13 xapi Called from file ocaml/xapi/export_raw_vdi.ml, line 110
                                        xapi: [error||257548 INET :::80||backtrace] 9/13 xapi Called from file ocaml/xapi/server_helpers.ml, line 100
                                        xapi: [error||257548 INET :::80||backtrace] 10/13 xapi Called from file ocaml/xapi/server_helpers.ml, line 121
                                        xapi: [error||257548 INET :::80||backtrace] 11/13 xapi Called from file lib/xapi-stdext-pervasives/pervasiveext.ml, line 24
                                        xapi: [error||257548 INET :::80||backtrace] 12/13 xapi Called from file lib/xapi-stdext-pervasives/pervasiveext.ml, line 35
                                        xapi: [error||257548 INET :::80||backtrace] 13/13 xapi Called from file lib/backtrace.ml, line 177
                                        xapi: [error||257548 INET :::80||backtrace]
                                        xapi: [debug||258603 UNIX /var/lib/xcp/xapi||dummytaskhelper] task dispatch:session.logout D:f14c1d127f16 created by task D:4493412bc527
                                        xapi: [ info||258603 UNIX /var/lib/xcp/xapi|session.logout D:4a1ab726bfb5|xapi_session] Session.destroy trackid=fc26bb0306ef13c29fa128ddb1338ca7
                                        xapi: [error||257548 INET :::80||backtrace] VDI.export_raw_vdi D:4493412bc527 failed with exception Server_error(VDI_IO_ERROR, [ Device I/O errors ])
                                        xapi: [error||257548 INET :::80||backtrace] Raised Server_error(VDI_IO_ERROR, [ Device I/O errors ])
                                        xapi: [error||257548 INET :::80||backtrace] 1/13 xapi Raised at file lib/debug.ml, line 218
                                        xapi: [error||257548 INET :::80||backtrace] 2/13 xapi Called from file lib/xapi-stdext-pervasives/pervasiveext.ml, line 24
                                        xapi: [error||257548 INET :::80||backtrace] 3/13 xapi Called from file lib/xapi-stdext-pervasives/pervasiveext.ml, line 35
                                        xapi: [error||257548 INET :::80||backtrace] 4/13 xapi Called from file lib/xapi-stdext-pervasives/pervasiveext.ml, line 24
                                        xapi: [error||257548 INET :::80||backtrace] 5/13 xapi Called from file lib/xapi-stdext-pervasives/pervasiveext.ml, line 35
                                        xapi: [error||257548 INET :::80||backtrace] 6/13 xapi Called from file ocaml/xapi/xapi_http.ml, line 193
                                        xapi: [error||257548 INET :::80||backtrace] 7/13 xapi Called from file lib/xapi-stdext-pervasives/pervasiveext.ml, line 24
                                        xapi: [error||257548 INET :::80||backtrace] 8/13 xapi Called from file lib/xapi-stdext-pervasives/pervasiveext.ml, line 35
                                        xapi: [error||257548 INET :::80||backtrace] 9/13 xapi Called from file ocaml/xapi/server_helpers.ml, line 100
                                        xapi: [error||257548 INET :::80||backtrace] 10/13 xapi Called from file ocaml/xapi/server_helpers.ml, line 121
                                        xapi: [error||257548 INET :::80||backtrace] 11/13 xapi Called from file lib/xapi-stdext-pervasives/pervasiveext.ml, line 24
                                        xapi: [error||257548 INET :::80||backtrace] 12/13 xapi Called from file lib/xapi-stdext-pervasives/pervasiveext.ml, line 35
                                        xapi: [error||257548 INET :::80||backtrace] 13/13 xapi Called from file lib/backtrace.ml, line 177
                                        xapi: [error||257548 INET :::80||backtrace]
                                        xapi: [error||257548 INET :::80||backtrace] VDI.export_raw_vdi D:2f552cd49c3a failed with exception Server_error(VDI_IO_ERROR, [ Device I/O errors ])
                                        xapi: [error||257548 INET :::80||backtrace] Raised Server_error(VDI_IO_ERROR, [ Device I/O errors ])
                                        xapi: [error||257548 INET :::80||backtrace] 1/1 xapi Raised at file (Thread 257548 has no backtrace table. Was with_backtraces called?, line 0
                                        xapi: [error||257548 INET :::80||backtrace]
                                        xapi: [error||257548 INET :::80||xapi_http] Unhandled Api_errors.Server_error(VDI_IO_ERROR, [ Device I/O errors ])
                                        

                                        Two separate issues? The VDI_IO_ERROR is one that people would see through the GUI, so likely to be spotted, whereas the EXCEPTION in the SMlog is not going to be noticed unless you go digging - it seems to have no impact.

                                        1 Reply Last reply Reply Quote 0
                                        • olivierlambertO Offline
                                          olivierlambert Vates ๐Ÿช Co-Founder CEO
                                          last edited by

                                          I suppose there's no timing correlation between both? Indeed, the error in xensource seems more meaningful, because it happens during the backup process.

                                          S 1 Reply Last reply Reply Quote 0
                                          • S Offline
                                            shorian @olivierlambert
                                            last edited by

                                            Nothing in SMlog of interest at the point of backup. We only started looking there when hunting for any SR related issues, but Iโ€™m inclined to view it as ancillary. The SR exception doesnโ€™t seem to be impacting day to day operations, whereas the CR item is a big concern for us, and now weโ€™ve dug into it further there doesnโ€™t seem to be a correlation between the two.

                                            So...I agree with your synopsis and propose putting the SMlogโ€™s exception to one side and we concentrate on the original issue - with the xensource.log error from my previous post.

                                            Tomorrow Iโ€™m going to see if I can find a way of simulating a brief access delay it and see if that recreates the problem - Iโ€™ve seen something vaguely similar in some old Xen 6 bug reports and Iโ€™ve run out of anything sensible to try. Strangest thing is that itโ€™s consistent across all of the boxes, and survives a fresh installation - which reminds me, Iโ€™ll also try it from and to an lvm target.

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post