XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login
    1. Home
    2. jshiells
    J
    Offline
    • Profile
    • Following 0
    • Followers 0
    • Topics 4
    • Posts 30
    • Groups 0

    jshiells

    @jshiells

    4
    Reputation
    9
    Profile views
    30
    Posts
    0
    Followers
    0
    Following
    Joined
    Last Online

    jshiells Unfollow Follow

    Best posts made by jshiells

    • RE: MAP_DUPLICATE_KEY error in XOA backup - VM's wont START now!

      I have, what is hopefully a final update to this issue(s).

      we upgraded to xoa version .99 a few weeks ago and the problem has now gone away. we suspect that some changes were made for timeouts in xoa that have resolved this , and a few other related problems.

      posted in Backup
      J
      jshiells

    Latest posts made by jshiells

    • maybe a bug when restarting mirroring delta backups that have failed

      Hi,

      last night we had a couple of vm's in a MIRRORING of delta backups task fail. 3 VM's did not mirror for reasons that do not matter for this bug.

      when we corrected the issue that caused them mirroring fail we went into XOA and pressed the button to just restart the tasks for the failed VM's only... however OXA decided to redo the entire mirror task and sync EVERY vm on the src backup location.

      a0130ea1-1993-47f2-bafc-887fc7a0dc42-image.png
      clicking this to restart just those 3 vm's

      caused this to happen... it re synced ALL of them, not just the 3 that failed
      c7e709b3-cf77-4e7e-93bb-5d155f4c00d3-image.png

      XOA version: Current version: 5.103.1

      i am assuming this is not working correctly?

      posted in Backup
      J
      jshiells
    • RE: Question on backup sequence

      I would like to ask a followup question to confirm.

      so IF using sequences, we should disable the backup tasks on the overview tab?

      posted in Backup
      J
      jshiells
    • RE: MAP_DUPLICATE_KEY error in XOA backup - VM's wont START now!

      I have, what is hopefully a final update to this issue(s).

      we upgraded to xoa version .99 a few weeks ago and the problem has now gone away. we suspect that some changes were made for timeouts in xoa that have resolved this , and a few other related problems.

      posted in Backup
      J
      jshiells
    • RE: MAP_DUPLICATE_KEY error in XOA backup - VM's wont START now!

      @olivierlambert

      Just an update on this:

      we made sure all our server times were synced. issue happened again the next run.

      just for shits and giggles we restarted toolstack on the all the hosts yesterday and the problem went away. no issues with the backup last night. maybe just a coincidence, we are continuing to monitor.

      we also noticed that even though this CHOP error is coming up, snapshots are getting created

      posted in Backup
      J
      jshiells
    • RE: MAP_DUPLICATE_KEY error in XOA backup - VM's wont START now!

      @olivierlambert

      After much digging I traced the following formal exceptions below from
      XEN. For most of them, the "chop" error is burried in the bowels of XAPI.

      ie

                      Oct 18 21:46:29 fen-xcp-01 SMGC: [28249]   File
      "/usr/lib/python2.7/site-packages/XenAPI.py", line 238, in _parse_result
               Oct 18 21:46:29 fen-xcp-01 SMGC: [28249]     raise
      Failure(result['ErrorDescription'])
      

      It's worth noting due to the time involved in this, the only things
      considered in play where nfs-01/02 and lesser degree 03. The one XCP
      pool (fen-xcp), and even lesser degree XOA itself.'

      I could not find any obvious issues with storage, neither hardware nor
      data. Scrubs were fine, no cpu/hardware errors.

      I could not find any obvious issues with xcp hosts, neither hardware nor
      data. No cpu/hardware errors.

      The only real change made was to correct the clock on nfs-01. I don't
      see how that could affect this since most if not all locking is done
      with flock.

      There is a valid argument to be made that Xen technically was responding
      to an issue, though not entirely clear what nor how. Most of the other
      errors / wtfbbq states are either directly related (in call path), or
      indirectly (xen wanted a thing, didn't get it). Those are some deep
      rabbit holes.

      There is more pre/post context to these, tried to include what I thought
      made them a bit more easier to understand.


      ./SMlog.4.gz:Oct 19 00:19:36 fen-xcp-01 SMGC: [3729]          *  E X C E
      P T I O N  *
              Oct 19 00:19:36 fen-xcp-01 SMGC: [3729] leaf-coalesce: EXCEPTION <class
      'XenAPI.Failure'>, ['INTERNAL_ERROR', 'Invalid argument: chop']
              Oct 19 00:19:36 fen-xcp-01 SMGC: [3729]   File
      "/opt/xensource/sm/cleanup.py", line 1774, in coalesceLeaf
              Oct 19 00:19:36 fen-xcp-01 SMGC: [3729]     self._coalesceLeaf(vdi)
              Oct 19 00:19:36 fen-xcp-01 SMGC: [3729]   File
      "/opt/xensource/sm/cleanup.py", line 2048, in _coalesceLeaf
              Oct 19 00:19:36 fen-xcp-01 SMGC: [3729]     if not
      self._snapshotCoalesce(vdi):
              Oct 19 00:19:36 fen-xcp-01 SMGC: [3729]   File
      "/opt/xensource/sm/cleanup.py", line 2153, in _snapshotCoalesce
              Oct 19 00:19:36 fen-xcp-01 SMGC: [3729]     self._coalesce(tempSnap)
              Oct 19 00:19:36 fen-xcp-01 SMGC: [3729]   File
      "/opt/xensource/sm/cleanup.py", line 1962, in _coalesce
              Oct 19 00:19:36 fen-xcp-01 SMGC: [3729]     self.deleteVDI(vdi)
              Oct 19 00:19:36 fen-xcp-01 SMGC: [3729]   File
      "/opt/xensource/sm/cleanup.py", line 2469, in deleteVDI
              Oct 19 00:19:36 fen-xcp-01 SMGC: [3729]     self._checkSlaves(vdi)
              Oct 19 00:19:36 fen-xcp-01 SMGC: [3729]   File
      "/opt/xensource/sm/cleanup.py", line 2482, in _checkSlaves
              Oct 19 00:19:36 fen-xcp-01 SMGC: [3729]     self._checkSlave(hostRef, vdi)
              Oct 19 00:19:36 fen-xcp-01 SMGC: [3729]   File
      "/opt/xensource/sm/cleanup.py", line 2491, in _checkSlave
              Oct 19 00:19:36 fen-xcp-01 SMGC: [3729]     text  =
      _host.call_plugin(*call)
              Oct 19 00:19:36 fen-xcp-01 SMGC: [3729]   File
      "/usr/lib/python2.7/site-packages/XenAPI.py", line 264, in __call__
              Oct 19 00:19:36 fen-xcp-01 SMGC: [3729]     return
      self.__send(self.__name, args)
              Oct 19 00:19:36 fen-xcp-01 SMGC: [3729]   File
      "/usr/lib/python2.7/site-packages/XenAPI.py", line 160, in xenapi_request
              Oct 19 00:19:36 fen-xcp-01 SMGC: [3729]     result =
      _parse_result(getattr(self, methodname)(*full_params))
              Oct 19 00:19:36 fen-xcp-01 SMGC: [3729]   File
      "/usr/lib/python2.7/site-packages/XenAPI.py", line 238, in _parse_result
              Oct 19 00:19:36 fen-xcp-01 SMGC: [3729]     raise
      Failure(result['ErrorDescription'])
      
      ./SMlog.4.gz:Oct 19 00:22:04 fen-xcp-01 SMGC: [3729]          *  E X C E
      P T I O N  *
      
      Oct 19 00:22:04 fen-xcp-01 SMGC: [3729] leaf-coalesce: EXCEPTION <class
      'XenAPI.Failure'>, ['INTERNAL_ERROR', 'Invalid argument: chop']
      Oct 19 00:22:04 fen-xcp-01 SMGC: [3729]   File
      "/opt/xensource/sm/cleanup.py", line 1774, in coalesceLeaf
      Oct 19 00:22:04 fen-xcp-01 SMGC: [3729]     self._coalesceLeaf(vdi)
      Oct 19 00:22:04 fen-xcp-01 SMGC: [3729]   File
      "/opt/xensource/sm/cleanup.py", line 2048, in _coalesceLeaf
      Oct 19 00:22:04 fen-xcp-01 SMGC: [3729]     if not
      self._snapshotCoalesce(vdi):
      Oct 19 00:22:04 fen-xcp-01 SMGC: [3729]   File
      "/opt/xensource/sm/cleanup.py", line 2153, in _snapshotCoalesce
      Oct 19 00:22:04 fen-xcp-01 SMGC: [3729]     self._coalesce(tempSnap)
      Oct 19 00:22:04 fen-xcp-01 SMGC: [3729]   File
      "/opt/xensource/sm/cleanup.py", line 1962, in _coalesce
      Oct 19 00:22:04 fen-xcp-01 SMGC: [3729]     self.deleteVDI(vdi)
      Oct 19 00:22:04 fen-xcp-01 SMGC: [3729]   File
      "/opt/xensource/sm/cleanup.py", line 2469, in deleteVDI
      Oct 19 00:22:04 fen-xcp-01 SMGC: [3729]     self._checkSlaves(vdi)
      Oct 19 00:22:04 fen-xcp-01 SMGC: [3729]   File
      "/opt/xensource/sm/cleanup.py", line 2482, in _checkSlaves
      Oct 19 00:22:04 fen-xcp-01 SMGC: [3729]     self._checkSlave(hostRef, vdi)
      Oct 19 00:22:04 fen-xcp-01 SMGC: [3729]   File
      "/opt/xensource/sm/cleanup.py", line 2491, in _checkSlave
      Oct 19 00:22:04 fen-xcp-01 SMGC: [3729]     text  = _host.call_plugin(*call)
      Oct 19 00:22:04 fen-xcp-01 SMGC: [3729]   File
      "/usr/lib/python2.7/site-packages/XenAPI.py", line 264, in __call__
      Oct 19 00:22:04 fen-xcp-01 SMGC: [3729]     return
      self.__send(self.__name, args)
      Oct 19 00:22:04 fen-xcp-01 SMGC: [3729]   File
      "/usr/lib/python2.7/site-packages/XenAPI.py", line 160, in xenapi_request
      Oct 19 00:22:04 fen-xcp-01 SMGC: [3729]     result =
      _parse_result(getattr(self, methodname)(*full_params))
      Oct 19 00:22:04 fen-xcp-01 SMGC: [3729]   File
      "/usr/lib/python2.7/site-packages/XenAPI.py", line 238, in _parse_result
      Oct 19 00:22:04 fen-xcp-01 SMGC: [3729]     raise
      Failure(result['ErrorDescription'])
      
      ./SMlog.4.gz:Oct 19 00:22:11 fen-xcp-01 SMGC: [3729]          *  E X C E
      P T I O N  *
      
              Oct 19 00:22:11 fen-xcp-01 SMGC: [3729] gc: EXCEPTION <class
      'XenAPI.Failure'>, ['INTERNAL_ERROR', 'Invalid argument: chop']
               Oct 19 00:22:11 fen-xcp-01 SMGC: [3729]   File
      "/opt/xensource/sm/cleanup.py", line 3388, in gc
               Oct 19 00:22:11 fen-xcp-01 SMGC: [3729]     _gc(None, srUuid,
      dryRun)
               Oct 19 00:22:11 fen-xcp-01 SMGC: [3729]   File
      "/opt/xensource/sm/cleanup.py", line 3273, in _gc
               Oct 19 00:22:11 fen-xcp-01 SMGC: [3729]     _gcLoop(sr, dryRun)
               Oct 19 00:22:11 fen-xcp-01 SMGC: [3729]   File
      "/opt/xensource/sm/cleanup.py", line 3214, in _gcLoop
               Oct 19 00:22:11 fen-xcp-01 SMGC: [3729]
      sr.garbageCollect(dryRun)
               Oct 19 00:22:11 fen-xcp-01 SMGC: [3729]   File
      "/opt/xensource/sm/cleanup.py", line 1794, in garbageCollect
               Oct 19 00:22:11 fen-xcp-01 SMGC: [3729]
      self.deleteVDIs(vdiList)
               Oct 19 00:22:11 fen-xcp-01 SMGC: [3729]   File
      "/opt/xensource/sm/cleanup.py", line 2374, in deleteVDIs
               Oct 19 00:22:11 fen-xcp-01 SMGC: [3729]     SR.deleteVDIs(self,
      vdiList)
               Oct 19 00:22:11 fen-xcp-01 SMGC: [3729]   File
      "/opt/xensource/sm/cleanup.py", line 1808, in deleteVDIs
               Oct 19 00:22:11 fen-xcp-01 SMGC: [3729]     self.deleteVDI(vdi)
               Oct 19 00:22:11 fen-xcp-01 SMGC: [3729]   File
      "/opt/xensource/sm/cleanup.py", line 2469, in deleteVDI
               Oct 19 00:22:11 fen-xcp-01 SMGC: [3729]     self._checkSlaves(vdi)
               Oct 19 00:22:11 fen-xcp-01 SMGC: [3729]   File
      "/opt/xensource/sm/cleanup.py", line 2482, in _checkSlaves
               Oct 19 00:22:11 fen-xcp-01 SMGC: [3729]
      self._checkSlave(hostRef, vdi)
               Oct 19 00:22:11 fen-xcp-01 SMGC: [3729]   File
      "/opt/xensource/sm/cleanup.py", line 2491, in _checkSlave
               Oct 19 00:22:11 fen-xcp-01 SMGC: [3729]     text  =
      _host.call_plugin(*call)
               Oct 19 00:22:11 fen-xcp-01 SMGC: [3729]   File
      "/usr/lib/python2.7/site-packages/XenAPI.py", line 264, in __call__
               Oct 19 00:22:11 fen-xcp-01 SMGC: [3729]     return
      self.__send(self.__name, args)
               Oct 19 00:22:11 fen-xcp-01 SMGC: [3729]   File
      "/usr/lib/python2.7/site-packages/XenAPI.py", line 160, in xenapi_request
               Oct 19 00:22:11 fen-xcp-01 SMGC: [3729]     result =
      _parse_result(getattr(self, methodname)(*full_params))
               Oct 19 00:22:11 fen-xcp-01 SMGC: [3729]   File
      "/usr/lib/python2.7/site-packages/XenAPI.py", line 238, in _parse_result
               Oct 19 00:22:11 fen-xcp-01 SMGC: [3729]     raise
      Failure(result['ErrorDescription'])
               Oct 19 00:22:11 fen-xcp-01 SMGC: [3729]
      
      
      
      ./SMlog.5.gz:Oct 18 21:51:23 fen-xcp-01 SMGC: [30714]          *  E X C
      E P T I O N  *
              Oct 18 21:51:23 fen-xcp-01 SMGC: [30714] GC process exiting, no work left
              Oct 18 21:51:23 fen-xcp-01 SM: [30714] lock: released
      /var/lock/sm/0cff5362-5c89-2241-2207-a1d736d9ef5e/gc_active
              Oct 18 21:51:23 fen-xcp-01 SMGC: [30714] In cleanup
              Oct 18 21:51:23 fen-xcp-01 SMGC: [30714] SR 0cff ('fen-nfs-03 - DR
      (Diaster Recovery Storage ZFS/NFS)') (608 VDIs in 524 VHD trees): no changes
              Oct 18 21:51:23 fen-xcp-01 SMGC: [30714]
      *~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*
              Oct 18 21:51:23 fen-xcp-01 SMGC: [30714]          ***********************
              Oct 18 21:51:23 fen-xcp-01 SMGC: [30714]          *  E X C E P T I O N  *
              Oct 18 21:51:23 fen-xcp-01 SMGC: [30714]          ***********************
              Oct 18 21:51:23 fen-xcp-01 SMGC: [30714] gc: EXCEPTION <class
      'XenAPI.Failure'>, ['INTERNAL_ERROR', 'Invalid argument: chop']
              Oct 18 21:51:23 fen-xcp-01 SMGC: [30714]   File
      "/opt/xensource/sm/cleanup.py", line 3388, in gc
              Oct 18 21:51:23 fen-xcp-01 SMGC: [30714]     _gc(None, srUuid, dryRun)
              Oct 18 21:51:23 fen-xcp-01 SMGC: [30714]   File
      "/opt/xensource/sm/cleanup.py", line 3273, in _gc
              Oct 18 21:51:23 fen-xcp-01 SMGC: [30714]     _gcLoop(sr, dryRun)
              Oct 18 21:51:23 fen-xcp-01 SMGC: [30714]   File
      "/opt/xensource/sm/cleanup.py", line 3214, in _gcLoop
              Oct 18 21:51:23 fen-xcp-01 SMGC: [30714]     sr.garbageCollect(dryRun)
              Oct 18 21:51:23 fen-xcp-01 SMGC: [30714]   File
      "/opt/xensource/sm/cleanup.py", line 1794, in garbageCollect
              Oct 18 21:51:23 fen-xcp-01 SMGC: [30714]     self.deleteVDIs(vdiList)
              Oct 18 21:51:23 fen-xcp-01 SMGC: [30714]   File
      "/opt/xensource/sm/cleanup.py", line 2374, in deleteVDIs
              Oct 18 21:51:23 fen-xcp-01 SMGC: [30714]     SR.deleteVDIs(self, vdiList)
              Oct 18 21:51:23 fen-xcp-01 SMGC: [30714]   File
      "/opt/xensource/sm/cleanup.py", line 1808, in deleteVDIs
              Oct 18 21:51:23 fen-xcp-01 SMGC: [30714]     self.deleteVDI(vdi)
              Oct 18 21:51:23 fen-xcp-01 SMGC: [30714]   File
      "/opt/xensource/sm/cleanup.py", line 2469, in deleteVDI
              Oct 18 21:51:23 fen-xcp-01 SMGC: [30714]     self._checkSlaves(vdi)
              Oct 18 21:51:23 fen-xcp-01 SMGC: [30714]   File
      "/opt/xensource/sm/cleanup.py", line 2482, in _checkSlaves
              Oct 18 21:51:23 fen-xcp-01 SMGC: [30714]     self._checkSlave(hostRef, vdi)
              Oct 18 21:51:23 fen-xcp-01 SMGC: [30714]   File
      "/opt/xensource/sm/cleanup.py", line 2491, in _checkSlave
              Oct 18 21:51:23 fen-xcp-01 SMGC: [30714]     text  =
      _host.call_plugin(*call)
              Oct 18 21:51:23 fen-xcp-01 SMGC: [30714]   File
      "/usr/lib/python2.7/site-packages/XenAPI.py", line 264, in __call__
              Oct 18 21:51:23 fen-xcp-01 SMGC: [30714]     return
      self.__send(self.__name, args)
              Oct 18 21:51:23 fen-xcp-01 SMGC: [30714]   File
      "/usr/lib/python2.7/site-packages/XenAPI.py", line 160, in xenapi_request
              Oct 18 21:51:23 fen-xcp-01 SMGC: [30714]     result =
      _parse_result(getattr(self, methodname)(*full_params))
              Oct 18 21:51:23 fen-xcp-01 SMGC: [30714]   File
      "/usr/lib/python2.7/site-packages/XenAPI.py", line 238, in _parse_result
              Oct 18 21:51:23 fen-xcp-01 SMGC: [30714]     raise
      Failure(result['ErrorDescription'])
              Oct 18 21:51:23 fen-xcp-01 SMGC: [30714]
              Oct 18 21:51:23 fen-xcp-01 SMGC: [30714]
      *~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*
              Oct 18 21:51:23 fen-xcp-01 SMGC: [30714] * * * * * SR
      0cff5362-5c89-2241-2207-a1d736d9ef5e: ERROR
              Oct 18 21:51:23 fen-xcp-01 SMGC: [30714]
              Oct 18 21:51:28 fen-xcp-01 SM: [26746] lock: opening lock file
      /var/lock/sm/894e5d0d-c100-be00-4fc4-b0c6db478a26/sr
      
      posted in Backup
      J
      jshiells
    • RE: MAP_DUPLICATE_KEY error in XOA backup - VM's wont START now!

      @tuxen no sorry, great idea but we are not seeing any errors like that in kern.log. this problem when it happens is across several xen hosts all at the same time. it would be wild if all of the xen hosts were having hardware issues during the small window of time this problem happened in. if it was one xen server then i would look at hardware but its all of them, letting me believe its XOA, a BUG in xcp-ng or a storage problem (even though we have seen no errors or monitoring blips at all on the truenas server)

      posted in Backup
      J
      jshiells
    • RE: MAP_DUPLICATE_KEY error in XOA backup - VM's wont START now!

      @olivierlambert digging more into this today, we did find this error in xensource.log related to that "CHOP" message

      xensource.log.11.gz:Oct 20 10:22:08 xxx-xxx-01 xapi: [error||115 pool_db_backup_thread|Pool DB sync D:d79f115776bd|pool_db_sync] Failed to synchronise DB with host OpaqueRef:a87f2682-dd77-4a2d-aa1a-b831b1d5107f: Server_error(INTERNAL_ERROR, [ Invalid argument: chop ])

      xensource.log.22.gz:Oct 19 06:06:03 fen-xcp-01 xapi: [error||27967996 :::80||backtrace] host.get_servertime D:61ad83a0cd72 failed with exception (Invalid_argument chop)

      posted in Backup
      J
      jshiells
    • RE: MAP_DUPLICATE_KEY error in XOA backup - VM's wont START now!

      and its happening again tonight.

      I have 55 failed backups tonight with a mix of the following errors.

      Error: INTERNAL_ERROR(Invalid argument: chop)
      Error: SR_BACKEND_FAILURE_82(, Failed to snapshot VDI [opterr=failed to pause VDI 346298a8-cfad-4e9b-84fe-6185fd5e7fbb], )

      Zero TX/RX errors on SFP's at on XEN hosts and Storage
      ZERO TX/RX errors on switch ports
      no errors on the ZFS/NFS storage devices
      no traps in monitoring for any networking or storage issues.

      posted in Backup
      J
      jshiells
    • RE: MAP_DUPLICATE_KEY error in XOA backup - VM's wont START now!

      @olivierlambert I can provide information on infrastructure if it would help. the issue has not happened again though , and i cannot re create the issue in my lab. even when backups are running we are REALLY under utilizing our NFS/ZFS NVME/SSD based storage with 40gb/s links. i'm not saying its not possible, but i would be shocked. we have no recorded tx/rx errors on truenas/swtich/hosts and no record of any snmp traps coming in for any issues with all the equipment involved.

      784b66c6-6f81-4b46-a110-b863ea1eb73b-image.png
      42c3e698-7c23-4e92-9fea-cd42212be1ea-image.png

      posted in Backup
      J
      jshiells
    • RE: MAP_DUPLICATE_KEY error in XOA backup - VM's wont START now!

      @SeanMiller no sorry, we ended up having to restore the VDI's form a known good backup.

      posted in Backup
      J
      jshiells