@olivierlambert
After much digging I traced the following formal exceptions below from
XEN. For most of them, the "chop" error is burried in the bowels of XAPI.
ie
Oct 18 21:46:29 fen-xcp-01 SMGC: [28249] File
"/usr/lib/python2.7/site-packages/XenAPI.py", line 238, in _parse_result
Oct 18 21:46:29 fen-xcp-01 SMGC: [28249] raise
Failure(result['ErrorDescription'])
It's worth noting due to the time involved in this, the only things
considered in play where nfs-01/02 and lesser degree 03. The one XCP
pool (fen-xcp), and even lesser degree XOA itself.'
I could not find any obvious issues with storage, neither hardware nor
data. Scrubs were fine, no cpu/hardware errors.
I could not find any obvious issues with xcp hosts, neither hardware nor
data. No cpu/hardware errors.
The only real change made was to correct the clock on nfs-01. I don't
see how that could affect this since most if not all locking is done
with flock.
There is a valid argument to be made that Xen technically was responding
to an issue, though not entirely clear what nor how. Most of the other
errors / wtfbbq states are either directly related (in call path), or
indirectly (xen wanted a thing, didn't get it). Those are some deep
rabbit holes.
There is more pre/post context to these, tried to include what I thought
made them a bit more easier to understand.
./SMlog.4.gz:Oct 19 00:19:36 fen-xcp-01 SMGC: [3729] * E X C E
P T I O N *
Oct 19 00:19:36 fen-xcp-01 SMGC: [3729] leaf-coalesce: EXCEPTION <class
'XenAPI.Failure'>, ['INTERNAL_ERROR', 'Invalid argument: chop']
Oct 19 00:19:36 fen-xcp-01 SMGC: [3729] File
"/opt/xensource/sm/cleanup.py", line 1774, in coalesceLeaf
Oct 19 00:19:36 fen-xcp-01 SMGC: [3729] self._coalesceLeaf(vdi)
Oct 19 00:19:36 fen-xcp-01 SMGC: [3729] File
"/opt/xensource/sm/cleanup.py", line 2048, in _coalesceLeaf
Oct 19 00:19:36 fen-xcp-01 SMGC: [3729] if not
self._snapshotCoalesce(vdi):
Oct 19 00:19:36 fen-xcp-01 SMGC: [3729] File
"/opt/xensource/sm/cleanup.py", line 2153, in _snapshotCoalesce
Oct 19 00:19:36 fen-xcp-01 SMGC: [3729] self._coalesce(tempSnap)
Oct 19 00:19:36 fen-xcp-01 SMGC: [3729] File
"/opt/xensource/sm/cleanup.py", line 1962, in _coalesce
Oct 19 00:19:36 fen-xcp-01 SMGC: [3729] self.deleteVDI(vdi)
Oct 19 00:19:36 fen-xcp-01 SMGC: [3729] File
"/opt/xensource/sm/cleanup.py", line 2469, in deleteVDI
Oct 19 00:19:36 fen-xcp-01 SMGC: [3729] self._checkSlaves(vdi)
Oct 19 00:19:36 fen-xcp-01 SMGC: [3729] File
"/opt/xensource/sm/cleanup.py", line 2482, in _checkSlaves
Oct 19 00:19:36 fen-xcp-01 SMGC: [3729] self._checkSlave(hostRef, vdi)
Oct 19 00:19:36 fen-xcp-01 SMGC: [3729] File
"/opt/xensource/sm/cleanup.py", line 2491, in _checkSlave
Oct 19 00:19:36 fen-xcp-01 SMGC: [3729] text =
_host.call_plugin(*call)
Oct 19 00:19:36 fen-xcp-01 SMGC: [3729] File
"/usr/lib/python2.7/site-packages/XenAPI.py", line 264, in __call__
Oct 19 00:19:36 fen-xcp-01 SMGC: [3729] return
self.__send(self.__name, args)
Oct 19 00:19:36 fen-xcp-01 SMGC: [3729] File
"/usr/lib/python2.7/site-packages/XenAPI.py", line 160, in xenapi_request
Oct 19 00:19:36 fen-xcp-01 SMGC: [3729] result =
_parse_result(getattr(self, methodname)(*full_params))
Oct 19 00:19:36 fen-xcp-01 SMGC: [3729] File
"/usr/lib/python2.7/site-packages/XenAPI.py", line 238, in _parse_result
Oct 19 00:19:36 fen-xcp-01 SMGC: [3729] raise
Failure(result['ErrorDescription'])
./SMlog.4.gz:Oct 19 00:22:04 fen-xcp-01 SMGC: [3729] * E X C E
P T I O N *
Oct 19 00:22:04 fen-xcp-01 SMGC: [3729] leaf-coalesce: EXCEPTION <class
'XenAPI.Failure'>, ['INTERNAL_ERROR', 'Invalid argument: chop']
Oct 19 00:22:04 fen-xcp-01 SMGC: [3729] File
"/opt/xensource/sm/cleanup.py", line 1774, in coalesceLeaf
Oct 19 00:22:04 fen-xcp-01 SMGC: [3729] self._coalesceLeaf(vdi)
Oct 19 00:22:04 fen-xcp-01 SMGC: [3729] File
"/opt/xensource/sm/cleanup.py", line 2048, in _coalesceLeaf
Oct 19 00:22:04 fen-xcp-01 SMGC: [3729] if not
self._snapshotCoalesce(vdi):
Oct 19 00:22:04 fen-xcp-01 SMGC: [3729] File
"/opt/xensource/sm/cleanup.py", line 2153, in _snapshotCoalesce
Oct 19 00:22:04 fen-xcp-01 SMGC: [3729] self._coalesce(tempSnap)
Oct 19 00:22:04 fen-xcp-01 SMGC: [3729] File
"/opt/xensource/sm/cleanup.py", line 1962, in _coalesce
Oct 19 00:22:04 fen-xcp-01 SMGC: [3729] self.deleteVDI(vdi)
Oct 19 00:22:04 fen-xcp-01 SMGC: [3729] File
"/opt/xensource/sm/cleanup.py", line 2469, in deleteVDI
Oct 19 00:22:04 fen-xcp-01 SMGC: [3729] self._checkSlaves(vdi)
Oct 19 00:22:04 fen-xcp-01 SMGC: [3729] File
"/opt/xensource/sm/cleanup.py", line 2482, in _checkSlaves
Oct 19 00:22:04 fen-xcp-01 SMGC: [3729] self._checkSlave(hostRef, vdi)
Oct 19 00:22:04 fen-xcp-01 SMGC: [3729] File
"/opt/xensource/sm/cleanup.py", line 2491, in _checkSlave
Oct 19 00:22:04 fen-xcp-01 SMGC: [3729] text = _host.call_plugin(*call)
Oct 19 00:22:04 fen-xcp-01 SMGC: [3729] File
"/usr/lib/python2.7/site-packages/XenAPI.py", line 264, in __call__
Oct 19 00:22:04 fen-xcp-01 SMGC: [3729] return
self.__send(self.__name, args)
Oct 19 00:22:04 fen-xcp-01 SMGC: [3729] File
"/usr/lib/python2.7/site-packages/XenAPI.py", line 160, in xenapi_request
Oct 19 00:22:04 fen-xcp-01 SMGC: [3729] result =
_parse_result(getattr(self, methodname)(*full_params))
Oct 19 00:22:04 fen-xcp-01 SMGC: [3729] File
"/usr/lib/python2.7/site-packages/XenAPI.py", line 238, in _parse_result
Oct 19 00:22:04 fen-xcp-01 SMGC: [3729] raise
Failure(result['ErrorDescription'])
./SMlog.4.gz:Oct 19 00:22:11 fen-xcp-01 SMGC: [3729] * E X C E
P T I O N *
Oct 19 00:22:11 fen-xcp-01 SMGC: [3729] gc: EXCEPTION <class
'XenAPI.Failure'>, ['INTERNAL_ERROR', 'Invalid argument: chop']
Oct 19 00:22:11 fen-xcp-01 SMGC: [3729] File
"/opt/xensource/sm/cleanup.py", line 3388, in gc
Oct 19 00:22:11 fen-xcp-01 SMGC: [3729] _gc(None, srUuid,
dryRun)
Oct 19 00:22:11 fen-xcp-01 SMGC: [3729] File
"/opt/xensource/sm/cleanup.py", line 3273, in _gc
Oct 19 00:22:11 fen-xcp-01 SMGC: [3729] _gcLoop(sr, dryRun)
Oct 19 00:22:11 fen-xcp-01 SMGC: [3729] File
"/opt/xensource/sm/cleanup.py", line 3214, in _gcLoop
Oct 19 00:22:11 fen-xcp-01 SMGC: [3729]
sr.garbageCollect(dryRun)
Oct 19 00:22:11 fen-xcp-01 SMGC: [3729] File
"/opt/xensource/sm/cleanup.py", line 1794, in garbageCollect
Oct 19 00:22:11 fen-xcp-01 SMGC: [3729]
self.deleteVDIs(vdiList)
Oct 19 00:22:11 fen-xcp-01 SMGC: [3729] File
"/opt/xensource/sm/cleanup.py", line 2374, in deleteVDIs
Oct 19 00:22:11 fen-xcp-01 SMGC: [3729] SR.deleteVDIs(self,
vdiList)
Oct 19 00:22:11 fen-xcp-01 SMGC: [3729] File
"/opt/xensource/sm/cleanup.py", line 1808, in deleteVDIs
Oct 19 00:22:11 fen-xcp-01 SMGC: [3729] self.deleteVDI(vdi)
Oct 19 00:22:11 fen-xcp-01 SMGC: [3729] File
"/opt/xensource/sm/cleanup.py", line 2469, in deleteVDI
Oct 19 00:22:11 fen-xcp-01 SMGC: [3729] self._checkSlaves(vdi)
Oct 19 00:22:11 fen-xcp-01 SMGC: [3729] File
"/opt/xensource/sm/cleanup.py", line 2482, in _checkSlaves
Oct 19 00:22:11 fen-xcp-01 SMGC: [3729]
self._checkSlave(hostRef, vdi)
Oct 19 00:22:11 fen-xcp-01 SMGC: [3729] File
"/opt/xensource/sm/cleanup.py", line 2491, in _checkSlave
Oct 19 00:22:11 fen-xcp-01 SMGC: [3729] text =
_host.call_plugin(*call)
Oct 19 00:22:11 fen-xcp-01 SMGC: [3729] File
"/usr/lib/python2.7/site-packages/XenAPI.py", line 264, in __call__
Oct 19 00:22:11 fen-xcp-01 SMGC: [3729] return
self.__send(self.__name, args)
Oct 19 00:22:11 fen-xcp-01 SMGC: [3729] File
"/usr/lib/python2.7/site-packages/XenAPI.py", line 160, in xenapi_request
Oct 19 00:22:11 fen-xcp-01 SMGC: [3729] result =
_parse_result(getattr(self, methodname)(*full_params))
Oct 19 00:22:11 fen-xcp-01 SMGC: [3729] File
"/usr/lib/python2.7/site-packages/XenAPI.py", line 238, in _parse_result
Oct 19 00:22:11 fen-xcp-01 SMGC: [3729] raise
Failure(result['ErrorDescription'])
Oct 19 00:22:11 fen-xcp-01 SMGC: [3729]
./SMlog.5.gz:Oct 18 21:51:23 fen-xcp-01 SMGC: [30714] * E X C
E P T I O N *
Oct 18 21:51:23 fen-xcp-01 SMGC: [30714] GC process exiting, no work left
Oct 18 21:51:23 fen-xcp-01 SM: [30714] lock: released
/var/lock/sm/0cff5362-5c89-2241-2207-a1d736d9ef5e/gc_active
Oct 18 21:51:23 fen-xcp-01 SMGC: [30714] In cleanup
Oct 18 21:51:23 fen-xcp-01 SMGC: [30714] SR 0cff ('fen-nfs-03 - DR
(Diaster Recovery Storage ZFS/NFS)') (608 VDIs in 524 VHD trees): no changes
Oct 18 21:51:23 fen-xcp-01 SMGC: [30714]
*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*
Oct 18 21:51:23 fen-xcp-01 SMGC: [30714] ***********************
Oct 18 21:51:23 fen-xcp-01 SMGC: [30714] * E X C E P T I O N *
Oct 18 21:51:23 fen-xcp-01 SMGC: [30714] ***********************
Oct 18 21:51:23 fen-xcp-01 SMGC: [30714] gc: EXCEPTION <class
'XenAPI.Failure'>, ['INTERNAL_ERROR', 'Invalid argument: chop']
Oct 18 21:51:23 fen-xcp-01 SMGC: [30714] File
"/opt/xensource/sm/cleanup.py", line 3388, in gc
Oct 18 21:51:23 fen-xcp-01 SMGC: [30714] _gc(None, srUuid, dryRun)
Oct 18 21:51:23 fen-xcp-01 SMGC: [30714] File
"/opt/xensource/sm/cleanup.py", line 3273, in _gc
Oct 18 21:51:23 fen-xcp-01 SMGC: [30714] _gcLoop(sr, dryRun)
Oct 18 21:51:23 fen-xcp-01 SMGC: [30714] File
"/opt/xensource/sm/cleanup.py", line 3214, in _gcLoop
Oct 18 21:51:23 fen-xcp-01 SMGC: [30714] sr.garbageCollect(dryRun)
Oct 18 21:51:23 fen-xcp-01 SMGC: [30714] File
"/opt/xensource/sm/cleanup.py", line 1794, in garbageCollect
Oct 18 21:51:23 fen-xcp-01 SMGC: [30714] self.deleteVDIs(vdiList)
Oct 18 21:51:23 fen-xcp-01 SMGC: [30714] File
"/opt/xensource/sm/cleanup.py", line 2374, in deleteVDIs
Oct 18 21:51:23 fen-xcp-01 SMGC: [30714] SR.deleteVDIs(self, vdiList)
Oct 18 21:51:23 fen-xcp-01 SMGC: [30714] File
"/opt/xensource/sm/cleanup.py", line 1808, in deleteVDIs
Oct 18 21:51:23 fen-xcp-01 SMGC: [30714] self.deleteVDI(vdi)
Oct 18 21:51:23 fen-xcp-01 SMGC: [30714] File
"/opt/xensource/sm/cleanup.py", line 2469, in deleteVDI
Oct 18 21:51:23 fen-xcp-01 SMGC: [30714] self._checkSlaves(vdi)
Oct 18 21:51:23 fen-xcp-01 SMGC: [30714] File
"/opt/xensource/sm/cleanup.py", line 2482, in _checkSlaves
Oct 18 21:51:23 fen-xcp-01 SMGC: [30714] self._checkSlave(hostRef, vdi)
Oct 18 21:51:23 fen-xcp-01 SMGC: [30714] File
"/opt/xensource/sm/cleanup.py", line 2491, in _checkSlave
Oct 18 21:51:23 fen-xcp-01 SMGC: [30714] text =
_host.call_plugin(*call)
Oct 18 21:51:23 fen-xcp-01 SMGC: [30714] File
"/usr/lib/python2.7/site-packages/XenAPI.py", line 264, in __call__
Oct 18 21:51:23 fen-xcp-01 SMGC: [30714] return
self.__send(self.__name, args)
Oct 18 21:51:23 fen-xcp-01 SMGC: [30714] File
"/usr/lib/python2.7/site-packages/XenAPI.py", line 160, in xenapi_request
Oct 18 21:51:23 fen-xcp-01 SMGC: [30714] result =
_parse_result(getattr(self, methodname)(*full_params))
Oct 18 21:51:23 fen-xcp-01 SMGC: [30714] File
"/usr/lib/python2.7/site-packages/XenAPI.py", line 238, in _parse_result
Oct 18 21:51:23 fen-xcp-01 SMGC: [30714] raise
Failure(result['ErrorDescription'])
Oct 18 21:51:23 fen-xcp-01 SMGC: [30714]
Oct 18 21:51:23 fen-xcp-01 SMGC: [30714]
*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*
Oct 18 21:51:23 fen-xcp-01 SMGC: [30714] * * * * * SR
0cff5362-5c89-2241-2207-a1d736d9ef5e: ERROR
Oct 18 21:51:23 fen-xcp-01 SMGC: [30714]
Oct 18 21:51:28 fen-xcp-01 SM: [26746] lock: opening lock file
/var/lock/sm/894e5d0d-c100-be00-4fc4-b0c6db478a26/sr