Adding Server to Pool Hangs/Fails
-
I am working on standing up a new cluster and have run into a bit of a snag. We are setting up a bonded network using LACP and moving the management interface to talk over the bonded link. When we attempt to add another host to the pool, the task server_init hangs in XOA, and the host never becomes available for use. All hosts have identical hardware configuration. It seems like it is failing to replicate the bonded network, but I am not seeing any error messages generated to indicate what the failure specifically is. Any and all xe commands appear to hang as well on the target host. Removing the bonded network allows hosts to join the pool no problem, but obviously this is not ideal when adding resources in the future. Any ideas to get this working?
-
@sgroel When joining a new host to a pool. you should only have the primary management interface enabled and without a bond! Any such configurations involving bonds or multipathing should be done after the host joins the pool if not picked up automatically with the network synchronization that takes place when the new host synchronizes with the pool master.
-
@tjkreidl Yes, that is what I am attempting to do, but the system fails to create the bond on the target host being added to the pool if the pool has the management network on the bonded network.
I have moved the management interface to a separate NIC for the pool, and creating all networks and adding the host to the pool works perfectly. It seems it is only an issue when the management interfaces for the pool are on a bonded interface.
edit: Also to add, the target host becomes completely unresponsive, cannot execute any API calls, or command line options.
-
@sgroel Curious. Are all the NICs the same on the existing pool hosts as well as the new host you add? Do all hosts have the same hotfix levels and the same overall hardware architecture? Seems odd as I ran servers this way for years. Maybe also look at the network switches to make sure they are all configured correctly. Finally, any clues in the /var/log files?