XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Host disconnected midway during backup, now unable to start/restart/cancel

    Scheduled Pinned Locked Moved Xen Orchestra
    19 Posts 6 Posters 1.7k Views 4 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • J Offline
      justjosh
      last edited by justjosh

      My pool master kept disconnecting intermittently throughout the backup process.

      Job status is still "started". Some VMs have already failed with errors:

      Error: HANDLE_INVALID(VBD, OpaqueRef:8fe3c750-5277-454c-9cad-23481645cd1e)
      Error: task has been destroyed before completion
      

      Some are stuck as "started"

      Nothing is left under "Tasks"

      Cannot restart/force restart them

      the job (e8c6772b-0bab-489d-a24e-b41e07f9298b) is already running
      

      Pressing cancel on the entire job does nothing

      Out of ideas!

      Edit: I just want a way to stop the jobs that are stuck in purgatory. I don't mind restarting the backup from the beginning.

      1 Reply Last reply Reply Quote 0
      • olivierlambertO Offline
        olivierlambert Vates 🪐 Co-Founder CEO
        last edited by

        You can restart xo-server on XOA side, and restart the toolstack on XCP-ng side to be entirely sure there's nothing left.

        However, the root cause should be investigated.

        J 1 Reply Last reply Reply Quote 1
        • J Offline
          justjosh @olivierlambert
          last edited by

          @olivierlambert

          Restarting xo-server allowed me to restart all the jobs except for one particular VM. How do I unblock this?

          Start: Dec 5, 2020, 06:53:28 PM
          End: Dec 5, 2020, 06:53:41 PM
          Error: SR_BACKEND_FAILURE_82(, Failed to snapshot VDI [opterr=['MAP_DUPLICATE_KEY', 'VDI', 'sm_config', 'OpaqueRef:92d82ecf-f03c-4f6d-9f2f-6d4f8beced23', 'paused']], )
          Start: Dec 5, 2020, 06:53:28 PM
          End: Dec 5, 2020, 06:53:41 PM
          Duration: a few seconds
          Error: SR_BACKEND_FAILURE_82(, Failed to snapshot VDI [opterr=['MAP_DUPLICATE_KEY', 'VDI', 'sm_config', 'OpaqueRef:92d82ecf-f03c-4f6d-9f2f-6d4f8beced23', 'paused']], )
          
          DanpD 1 Reply Last reply Reply Quote 0
          • DanpD Offline
            Danp Pro Support Team @justjosh
            last edited by

            @justjosh Seem to recall running into this issue once before. IIRC, I used this method to resolve the issue --

            https://discussions.citrix.com/topic/399028-paused-vdi/#comment-2025338

            J 1 Reply Last reply Reply Quote 0
            • J Offline
              justjosh @Danp
              last edited by

              @danp Did you just remove the VDI or took other steps?

              DanpD 1 Reply Last reply Reply Quote 0
              • DanpD Offline
                Danp Pro Support Team @justjosh
                last edited by

                @justjosh I ran the commands from the link I posted to remove the "paused" flag from sm-config

                J N 2 Replies Last reply Reply Quote 0
                • J Offline
                  justjosh @Danp
                  last edited by

                  @danp Which UUID did you use as the reference? I've used the UUID provided under OpaqueRef but it's saying that UUID is invalid.

                  >>> vdi_ref = session.xenapi.VDI.get_by_uuid('92d82ecf-f03c-4f6d-9f2f-6d4f8beced23')
                  Traceback (most recent call last):
                    File "<stdin>", line 1, in <module>
                    File "/usr/lib/python2.7/site-packages/XenAPI.py", line 264, in __call__
                      return self.__send(self.__name, args)
                    File "/usr/lib/python2.7/site-packages/XenAPI.py", line 160, in xenapi_request
                      result = _parse_result(getattr(self, methodname)(*full_params))
                    File "/usr/lib/python2.7/site-packages/XenAPI.py", line 238, in _parse_result
                      raise Failure(result['ErrorDescription'])
                  XenAPI.Failure: ['UUID_INVALID', 'VDI', '92d82ecf-f03c-4f6d-9f2f-6d4f8beced23']
                  
                  DanpD 1 Reply Last reply Reply Quote 0
                  • DanpD Offline
                    Danp Pro Support Team @justjosh
                    last edited by

                    @justjosh You should be able to get the correct UUID by going to the VM's Disks tab in XO and click the copy icon for the desired disk.

                    J 1 Reply Last reply Reply Quote 0
                    • J Offline
                      justjosh @Danp
                      last edited by

                      @danp I was under the impression I would need to remove the VDI for the snapshot and not the VDI for the VM's disk. Is the remove_from_sm_config command supposed to be run on the VM disk?

                      DanpD 1 Reply Last reply Reply Quote 0
                      • DanpD Offline
                        Danp Pro Support Team @justjosh
                        last edited by

                        @justjosh Yes, the goal is to 'unpause' the the VDI for the VM's disk.

                        If you look back at the thread I posted, they showed the output from xe vdi-list uuid=d50a85ca-eda2-4cbd-a348-80c7d6808ac1 params=all, where the UUID was from the VM's disk VDI. You could perform the same on your VDI to confirm that the issue is present in sm-config entry.

                        J 1 Reply Last reply Reply Quote 0
                        • J Offline
                          justjosh @Danp
                          last edited by justjosh

                          @danp I've cleared the pause but I'm still encountering errors regarding pause. Any ideas?

                          I still have a snapshot from 4th Dec when the error first started. Is it safe to delete that? Will the delta backup be able to merge a full snapshot with the existing chain?

                           Snapshot 
                          Start: Dec 8, 2020, 09:15:07 AM
                          End: Dec 8, 2020, 09:15:08 AM
                          Error: SR_BACKEND_FAILURE_82(, Failed to snapshot VDI [opterr=failed to pause VDI d11dc884-b91d-4ea0-87ef-6b96ce5b0ad4], )
                          Start: Dec 8, 2020, 09:15:07 AM
                          End: Dec 8, 2020, 09:15:08 AM
                          Duration: a few seconds
                          Error: SR_BACKEND_FAILURE_82(, Failed to snapshot VDI [opterr=failed to pause VDI d11dc884-b91d-4ea0-87ef-6b96ce5b0ad4], )
                          
                          DanpD 1 Reply Last reply Reply Quote 0
                          • DanpD Offline
                            Danp Pro Support Team @justjosh
                            last edited by

                            @justjosh Yes, it should be fine to delete the snapshot and then rerun the backup job.

                            1 Reply Last reply Reply Quote 0
                            • N Offline
                              nicolas @Danp
                              last edited by

                              @Danp Did you remember what command you used to remove the "paused" flag, as the link from Citrix is not working anymore? Thanks!

                              N 1 Reply Last reply Reply Quote 0
                              • N Offline
                                nicolas @nicolas
                                last edited by

                                As I needed it urgently, I wrote a python script which remove the flag from the db with this command :

                                vdi_ref = session.xenapi.VDI.get_by_uuid(vdi_uuid)
                                session.xenapi.VDI.remove_from_sm_config(vdi_ref, "paused")
                                

                                It worked and my vm are back to business!

                                DanpD 1 Reply Last reply Reply Quote 0
                                • DanpD Offline
                                  Danp Pro Support Team @nicolas
                                  last edited by

                                  @nicolas Yes, that is essentially what was shown in the link. Note: the developers have warned to be careful using this technique because the disk is in a paused state for a reason, so simply clearing the flag could lead to unintended consequences.

                                  Alternatively, you can try running this command --

                                  /opt/xensource/sm/resetvdis.py single <VDI UUID>

                                  1 Reply Last reply Reply Quote 0
                                  • M Offline
                                    madrianr
                                    last edited by

                                    Hello, I have the same problem - see also here:
                                    https://community.citrix.com/topic/253636-disable-cbt-sr_backend_failure_202-map_duplicate_key/#comment-86996

                                    If I use the following script as posted here I have the following result:

                                    [root@wkkctxhy01 ~]# /opt/xensource/sm/resetvdis.py single 1e1c1ed3-9eb9-4b95-aabd-91b95111cc70
                                    VDI 1e1c1ed3-9eb9-4b95-aabd-91b95111cc70 is not marked as attached anywhere, nothing to do

                                    Any help is welcome
                                    robert

                                    M 1 Reply Last reply Reply Quote 0
                                    • M Offline
                                      madrianr @madrianr
                                      last edited by

                                      @madrianr said in Host disconnected midway during backup, now unable to start/restart/cancel:

                                      [root@wkkctxhy01 ~]# /opt/xensource/sm/resetvdis.py single 1e1c1ed3-9eb9-4b95-aabd-91b95111cc70
                                      VDI 1e1c1ed3-9eb9-4b95-aabd-91b95111cc70 is not marked as attached anywhere, nothing to do

                                      after "Forget" the disks and Rescan/Reattach it works now...
                                      xe vdi-forget uuid=7a3b69fb-08d3-4e10-8b07-c05ee876eabe
                                      xe vdi-forget uuid=433d1e56-80af-4691-93f6-84af4c411565
                                      xe vdi-forget uuid=78a38bc3-bba3-4be6-8bdb-cfd20eaf8b44
                                      xe vdi-forget uuid=1e1c1ed3-9eb9-4b95-aabd-91b95111cc70
                                      xe vdi-forget uuid=671fc875-cb61-462d-98f2-62912b217570

                                      J 1 Reply Last reply Reply Quote 0
                                      • J Offline
                                        jaayb @madrianr
                                        last edited by

                                        @madrianr had similar issues tried xe vdi-foget uuid and rescanned after that but cannot find the vdi.

                                        Any ideas?

                                        J 1 Reply Last reply Reply Quote 0
                                        • J Offline
                                          jaayb @jaayb
                                          last edited by

                                          @jaayb said in Host disconnected midway during backup, now unable to start/restart/cancel:

                                          @madrianr had similar issues tried xe vdi-foget uuid and rescanned after that but cannot find the vdi.

                                          Any ideas?

                                          never mind was able to recover...

                                          1 Reply Last reply Reply Quote 0
                                          • First post
                                            Last post