@ronan-a My pool is mostly working now, though I'm still seeing some quirks. I believe the problems I've seen thus far is due to the hardware instability of one of my hosts, as noted before. While frustrating, it has given me the opportunity to work through some host and SR recovery processes. That's the silver lining.
The main issue that I'm still seeing is when I attempt to load a specific stopped VM on a specific host (xcp-ng3). The VM will fail to start and throw this error:
SR_BACKEND_FAILURE_46(, The VDI is not available [opterr=VDI
a9d53abc-f492-4826-b659-744ee87d4d94 not detached cleanly], )
This VM will start without issue on the other two hosts. XO is not showing any other issues with this VM and is not showing any orphaned VDIs.
I am also seeing these errors in the /var/log/linstor-controller directory:
Host xcp-ng1:
ERROR REPORT 648FDBE6-00000-000000
============================================================
Application: LINBIT® LINSTOR
Module: Controller
Version: 1.21.1
Build ID: a677db312062add13e9b230b8b902d43a69caf13
Build time: 2023-03-22T14:05:41+00:00
Error time: 2023-06-19 00:39:06
Node: xcp-ng1
============================================================
Reported error:
===============
Category: Exception
Class name: SocketException
Class canonical name: java.net.SocketException
Generated at: Method 'bind0', Source file 'Net.java, Unknown line number
Error message: Protocol family unavailable
Call backtrace:
Method Native Class:Line number
bind0 Y sun.nio.ch.Net:unknown
bind N sun.nio.ch.Net:461
bind N sun.nio.ch.Net:453
bind N sun.nio.ch.ServerSocketChannelImpl:222
bind N sun.nio.ch.ServerSocketAdaptor:85
bindToChannelAndAddress N org.glassfish.grizzly.nio.transport.TCPNIOBindingHandler:107
bind N org.glassfish.grizzly.nio.transport.TCPNIOBindingHandler:64
bind N org.glassfish.grizzly.nio.transport.TCPNIOTransport:215
bind N org.glassfish.grizzly.nio.transport.TCPNIOTransport:195
bind N org.glassfish.grizzly.nio.transport.TCPNIOTransport:186
start N org.glassfish.grizzly.http.server.NetworkListener:711
start N org.glassfish.grizzly.http.server.HttpServer:256
start N com.linbit.linstor.api.rest.v1.config.GrizzlyHttpService:314
initialize N com.linbit.linstor.systemstarter.GrizzlyInitializer:88
startSystemServices N com.linbit.linstor.core.ApplicationLifecycleManager:87
start N com.linbit.linstor.core.Controller:365
main N com.linbit.linstor.core.Controller:613
END OF ERROR REPORT.
Host xcp-ng2 (each truncated for post limit):
ERROR REPORT 648E4932-00000-000012
============================================================
Application: LINBIT® LINSTOR
Module: Controller
Version: 1.21.1
Build ID: a677db312062add13e9b230b8b902d43a69caf13
Build time: 2023-03-22T14:05:41+00:00
Error time: 2023-06-19 00:38:51
Node: xcp-ng2
Peer: Node: 'xcp-ng1'
============================================================
Reported error:
===============
Category: RuntimeException
Class name: TransactionException
Class canonical name: com.linbit.linstor.transaction.TransactionException
Generated at: Method 'startTransaction', Source file 'ControllerSQLTransactionMgrGenerator.java', Line #32
Error message: Failed to start transaction
Error context:
Failed to start transaction
Asynchronous stage backtrace:
Error has been observed at the following site(s):
|_ checkpoint ⇢ Execute single-stage API UpdateFreeCapacity
|_ checkpoint ⇢ Fallback error handling wrapper
Stack trace:
Call backtrace:
Method Native Class:Line number
startTransaction N com.linbit.linstor.transaction.manager.ControllerSQLTransactionMgrGenerator:32
ERROR REPORT 648E4932-00000-000013
============================================================
Application: LINBIT® LINSTOR
Module: Controller
Version: 1.21.1
Build ID: a677db312062add13e9b230b8b902d43a69caf13
Build time: 2023-03-22T14:05:41+00:00
Error time: 2023-06-19 00:38:51
Node: xcp-ng2
Peer: Node: 'xcp-ng3'
============================================================
Reported error:
===============
Category: RuntimeException
Class name: ErrorCallbackNotImplemented
Class canonical name: reactor.core.Exceptions.ErrorCallbackNotImplemented
Generated at: <UNKNOWN>
Error message: com.linbit.linstor.transaction.TransactionException: Failed to start transaction
Call backtrace:
Method Native Class:Line number
ERROR REPORT 648E4932-00000-000014
============================================================
Application: LINBIT® LINSTOR
Module: Controller
Version: 1.21.1
Build ID: a677db312062add13e9b230b8b902d43a69caf13
Build time: 2023-03-22T14:05:41+00:00
Error time: 2023-06-19 00:38:51
Node: xcp-ng2
Peer: RestClient(127.0.0.1; 'PythonLinstor/1.15.1 (API1.0.4): Client 1.15.1')
============================================================
Reported error:
===============
Category: RuntimeException
Class name: TransactionException
Class canonical name: com.linbit.linstor.transaction.TransactionException
Generated at: Method 'startTransaction', Source file 'ControllerSQLTransactionMgrGenerator.java', Line #32
Error message: Failed to start transaction
Error context:
Modification of node 'xcp-ng1' failed due to an unknown exception.
Asynchronous stage backtrace:
Error has been observed at the following site(s):
|_ checkpoint ⇢ Auto-Quorum and -Tiebreaker after node create
|_ checkpoint ⇢ Reconnect node(s)
Stack trace:
Call backtrace:
Method Native Class:Line number
startTransaction N com.linbit.linstor.transaction.manager.ControllerSQLTransactionMgrGenerator:32
ERROR REPORT 648E4932-00000-000015
============================================================
Application: LINBIT® LINSTOR
Module: Controller
Version: 1.21.1
Build ID: a677db312062add13e9b230b8b902d43a69caf13
Build time: 2023-03-22T14:05:41+00:00
Error time: 2023-06-19 00:38:51
Node: xcp-ng2
Peer: Node: 'xcp-ng2'
============================================================
Reported error:
===============
Category: RuntimeException
Class name: TransactionException
Class canonical name: com.linbit.linstor.transaction.TransactionException
Generated at: Method 'startTransaction', Source file 'ControllerSQLTransactionMgrGenerator.java', Line #32
Error message: Failed to start transaction
Error context:
Failed to start transaction
Asynchronous stage backtrace:
Error has been observed at the following site(s):
|_ checkpoint ⇢ Execute single-stage API ApplyPropsFromStlt
|_ checkpoint ⇢ Fallback error handling wrapper
Stack trace:
Call backtrace:
Method Native Class:Line number
startTransaction N com.linbit.linstor.transaction.manager.ControllerSQLTransactionMgrGenerator:32
ERROR REPORT 648E4932-00000-000016
============================================================
Application: LINBIT® LINSTOR
Module: Controller
Version: 1.21.1
Build ID: a677db312062add13e9b230b8b902d43a69caf13
Build time: 2023-03-22T14:05:41+00:00
Error time: 2023-06-19 00:38:51
Node: xcp-ng2
Peer: RestClient(127.0.0.1; 'PythonLinstor/1.15.1 (API1.0.4): Client 1.15.1')
============================================================
Reported error:
===============
Category: RuntimeException
Class name: TransactionException
Class canonical name: com.linbit.linstor.transaction.TransactionException
Generated at: Method 'startTransaction', Source file 'ControllerSQLTransactionMgrGenerator.java', Line #32
Error message: Failed to start transaction
Error context:
Modification of node 'xcp-ng1' failed due to an unknown exception.
Asynchronous stage backtrace:
Error has been observed at the following site(s):
|_ checkpoint ⇢ Reconnect node(s)
Stack trace:
Call backtrace:
Method Native Class:Line number
startTransaction N com.linbit.linstor.transaction.manager.ControllerSQLTransactionMgrGenerator:32
ERROR REPORT 648E4932-00000-000017
============================================================
Application: LINBIT® LINSTOR
Module: Controller
Version: 1.21.1
Build ID: a677db312062add13e9b230b8b902d43a69caf13
Build time: 2023-03-22T14:05:41+00:00
Error time: 2023-06-19 00:38:51
Node: xcp-ng2
============================================================
Reported error:
===============
Category: RuntimeException
Class name: ErrorCallbackNotImplemented
Class canonical name: reactor.core.Exceptions.ErrorCallbackNotImplemented
Generated at: <UNKNOWN>
Error message: com.linbit.linstor.transaction.TransactionException: Failed to start transaction
Error context:
Exception thrown by connection observer when outbound connection established
Call backtrace:
Method Native Class:Line number
ERROR REPORT 648E4932-00000-000018
============================================================
Application: LINBIT® LINSTOR
Module: Controller
Version: 1.21.1
Build ID: a677db312062add13e9b230b8b902d43a69caf13
Build time: 2023-03-22T14:05:41+00:00
Error time: 2023-06-19 00:38:51
Node: xcp-ng2
Peer: Node: 'xcp-ng1'
============================================================
Reported error:
===============
Category: RuntimeException
Class name: IllegalStateException
Class canonical name: java.lang.IllegalStateException
Generated at: Method 'assertOpen', Source file 'BaseGenericObjectPool.java', Line #759
Error message: Pool not open
Asynchronous stage backtrace:
Error has been observed at the following site(s):
|_ checkpoint ⇢ Execute single-stage API NotifyDevMgrRunCompleted
|_ checkpoint ⇢ Fallback error handling wrapper
Stack trace:
Call backtrace:
Method Native Class:Line number
assertOpen N org.apache.commons.pool2.impl.BaseGenericObjectPool:759
ERROR REPORT 648E4932-00000-000019
============================================================
Application: LINBIT® LINSTOR
Module: Controller
Version: 1.21.1
Build ID: a677db312062add13e9b230b8b902d43a69caf13
Build time: 2023-03-22T14:05:41+00:00
Error time: 2023-06-19 00:38:51
Node: xcp-ng2
============================================================
Reported error:
===============
Category: RuntimeException
Class name: TransactionException
Class canonical name: com.linbit.linstor.transaction.TransactionException
Generated at: Method 'startTransaction', Source file 'ControllerSQLTransactionMgrGenerator.java', Line #32
Error message: Failed to start transaction
Asynchronous stage backtrace:
Error has been observed at the following site(s):
|_ checkpoint ⇢ Restore node
Stack trace:
Call backtrace:
Method Native Class:Line number
startTransaction N com.linbit.linstor.transaction.manager.ControllerSQLTransactionMgrGenerator:32
ERROR REPORT 648E4932-00000-000020
============================================================
Application: LINBIT® LINSTOR
Module: Controller
Version: 1.21.1
Build ID: a677db312062add13e9b230b8b902d43a69caf13
Build time: 2023-03-22T14:05:41+00:00
Error time: 2023-06-19 00:38:51
Node: xcp-ng2
============================================================
Reported error:
===============
Category: RuntimeException
Class name: TransactionException
Class canonical name: com.linbit.linstor.transaction.TransactionException
Generated at: Method 'startTransaction', Source file 'ControllerSQLTransactionMgrGenerator.java', Line #32
Error message: Failed to start transaction
Asynchronous stage backtrace:
Error has been observed at the following site(s):
|_ checkpoint ⇢ Fetch thin capacity info
Stack trace:
Call backtrace:
Method Native Class:Line number
startTransaction N com.linbit.linstor.transaction.manager.ControllerSQLTransactionMgrGenerator:32
ERROR REPORT 648E4932-00000-000021
============================================================
Application: LINBIT® LINSTOR
Module: Controller
Version: 1.21.1
Build ID: a677db312062add13e9b230b8b902d43a69caf13
Build time: 2023-03-22T14:05:41+00:00
Error time: 2023-06-19 00:38:51
Node: xcp-ng2
============================================================
Reported error:
===============
Category: RuntimeException
Class name: TransactionException
Class canonical name: com.linbit.linstor.transaction.TransactionException
Generated at: Method 'startTransaction', Source file 'ControllerSQLTransactionMgrGenerator.java', Line #32
Error message: Failed to start transaction
Asynchronous stage backtrace:
Error has been observed at the following site(s):
|_ checkpoint ⇢ Fetch thin capacity info
Stack trace:
Call backtrace:
Method Native Class:Line number
startTransaction N com.linbit.linstor.transaction.manager.ControllerSQLTransactionMgrGenerator:32
ERROR REPORT 648E4932-00000-000023
============================================================
Application: LINBIT® LINSTOR
Module: Controller
Version: 1.21.1
Build ID: a677db312062add13e9b230b8b902d43a69caf13
Build time: 2023-03-22T14:05:41+00:00
Error time: 2023-06-19 00:38:51
Node: xcp-ng2
Peer: RestClient(127.0.0.1; 'PythonLinstor/1.15.1 (API1.0.4): Client 1.15.1')
============================================================
Reported error:
===============
Category: RuntimeException
Class name: TransactionException
Class canonical name: com.linbit.linstor.transaction.TransactionException
Generated at: Method 'startTransaction', Source file 'ControllerSQLTransactionMgrGenerator.java', Line #32
Error message: Failed to start transaction
Error context:
Modification of node 'xcp-ng1' failed due to an unknown exception.
Asynchronous stage backtrace:
Error has been observed at the following site(s):
|_ checkpoint ⇢ Reconnect node(s)
Stack trace:
Call backtrace:
Method Native Class:Line number
startTransaction N com.linbit.linstor.transaction.manager.ControllerSQLTransactionMgrGenerator:32
ERROR REPORT 648E4932-00000-000024
============================================================
Application: LINBIT® LINSTOR
Module: Controller
Version: 1.21.1
Build ID: a677db312062add13e9b230b8b902d43a69caf13
Build time: 2023-03-22T14:05:41+00:00
Error time: 2023-06-19 00:38:51
Node: xcp-ng2
============================================================
Reported error:
===============
Category: RuntimeException
Class name: TransactionException
Class canonical name: com.linbit.linstor.transaction.TransactionException
Generated at: Method 'startTransaction', Source file 'ControllerSQLTransactionMgrGenerator.java', Line #32
Error message: Failed to start transaction
Asynchronous stage backtrace:
Error has been observed at the following site(s):
|_ checkpoint ⇢ Fetch thin capacity info
Stack trace:
Call backtrace:
Method Native Class:Line number
startTransaction N com.linbit.linstor.transaction.manager.ControllerSQLTransactionMgrGenerator:32
Host xcp-ng3:
ERROR REPORT 64905A1B-00000-000000
============================================================
Application: LINBIT® LINSTOR
Module: Controller
Version: 1.21.1
Build ID: a677db312062add13e9b230b8b902d43a69caf13
Build time: 2023-03-22T14:05:41+00:00
Error time: 2023-06-19 09:37:36
Node: xcp-ng3
============================================================
Reported error:
===============
Category: Exception
Class name: SocketException
Class canonical name: java.net.SocketException
Generated at: Method 'bind0', Source file 'Net.java, Unknown line number
Error message: Protocol family unavailable
Call backtrace:
Method Native Class:Line number
bind0 Y sun.nio.ch.Net:unknown
bind N sun.nio.ch.Net:461
bind N sun.nio.ch.Net:453
bind N sun.nio.ch.ServerSocketChannelImpl:222
bind N sun.nio.ch.ServerSocketAdaptor:85
bindToChannelAndAddress N org.glassfish.grizzly.nio.transport.TCPNIOBindingHandler:107
bind N org.glassfish.grizzly.nio.transport.TCPNIOBindingHandler:64
bind N org.glassfish.grizzly.nio.transport.TCPNIOTransport:215
bind N org.glassfish.grizzly.nio.transport.TCPNIOTransport:195
bind N org.glassfish.grizzly.nio.transport.TCPNIOTransport:186
start N org.glassfish.grizzly.http.server.NetworkListener:711
start N org.glassfish.grizzly.http.server.HttpServer:256
start N com.linbit.linstor.api.rest.v1.config.GrizzlyHttpService:314
initialize N com.linbit.linstor.systemstarter.GrizzlyInitializer:88
startSystemServices N com.linbit.linstor.core.ApplicationLifecycleManager:87
start N com.linbit.linstor.core.Controller:365
main N com.linbit.linstor.core.Controller:613
END OF ERROR REPORT.
Sorry, that was a lot of logs, but I don't know exactly what you'd be interested in seeing. It's possible that all of the logs from host xcp-ng2 are from the last time host xcp-ng1 melted down. I'm still trying to determine exactly when that happened overnight.
Let me know if you need to see anything else! Thank you!