Hi, I use a startup script that starts VMs in a specific order, every time my main and single XCP-ng host is restarted.
The past few days I've been getting random failures, where at first the XOA VM just loses connectivity to the host toolstack, even though all VMs are up and the host is functional (I can ssh in).
The script was configured like this:
#!/bin/bash
# xe vm-list for name-label, add in start order
vms=(vm1 vm2 vm3 etc...)
wait=30s
# No need to modify below
initwait=3m
vmslength=${#vms[@]}
log=/root/scripts/startup.log
start_vm () {
echo -n "[$(date +"[%Y-%m-%d %H:%M:%S]")] Starting $1 ... " >> ${log}
/opt/xensource/bin/xe vm-start name-label=$1
if [ $? -eq 0 ]
then
echo "Success" >> ${log}
else
echo "FAILED" >> ${log}
fi
# Wait if not the last vm
if [ "$1" != "${vms[${vmslength}-1]}" ]
then
echo "Waiting ${wait}" >> ${log}
sleep ${wait}
fi
}
echo "[$(date +"[%Y-%m-%d %H:%M:%S]")] Running autostart script (Waiting ${initwait})" > ${log}
sleep ${initwait}
for vm in ${vms[@]}
do
start_vm ${vm}
done
echo "[$(date +"%T")] Startup complete." >> ${log}
echo
As you can see the initwait is set to 3m, having the script wait for the XCP-ng toolstack to get ready, and I've had no issues with this config for the past year.
Now I have noticed that the toolstack takes about 10 minutes to start, where it took about 2 beforehand. I have no idea what's going wrong because I didn't do any updates in the meantime.
Does anyone have an idea where I should look to see what's causing this 10 minute hang?
Even after rebooting the host, after the XOA VM is up, it can't connect to the toolstack for some reason:
connect ETIMEDOUT host-ip:443
Update: the XOA error is due to a kernel issue. 5.10.0-25-amd64 works, 5.10.0-26-amd64 cannot connect to any XCP-ng host. This still leaves me wondering why the XCP-ng host toolstack startup time has increased so drastically.