XOA loses connection to hosts during VM migration / creation on XOSTOR SR
-
Hello,
I’m experiencing an issue on an XCP-ng cluster using XOSTOR.
Environment:- 3-node XCP-ng cluster
- XOSTOR distributed storage (2x2Tob nvme on each host)
- XOA for management
- Management network 1Gb/s
- Storage Network 10Gb/s
MTU 1500 everywhere (no jumbo frames)
So during VM migrations, creation, destroy XOA loses connection to my host pool, VMs keep running normally Hosts remain reachable (SSH / HTTPS / ping OK), Connection comes back after some time 30s to 1min.
Observations:
- No significant CPU or RAM saturation
- No obvious disk latency issues (iostat looks normal)
- No errors reported on NICs
- xapi process remains active (no crash or freeze)
- The problem is intermittent and seems random.
- i've monitored nic with iftop and i see no bandwith bottleneck et and can see that XOSTOR is using 10gb network only.
Has anyone experienced similar behavior with XOSTOR? And how to Fix it ?
Thanks in advance for your help. -
This is a tricky one, and I'm a bit out of my depth on the XOSTOR internals, but the pattern you describe (XO losing the pool for 30 to 60 seconds while SSH and ping to the hosts stay perfectly fine) reads more like XAPI stalling on a storage call during the LINSTOR coordination than a real network drop.
One detail caught my eye: you mention XOSTOR is on the 10Gb storage network only. The XOSTOR docs actually recommend the satellites stay on the XAPI management interface and call a dedicated network for them "not recommended" for pool robustness (https://docs.xcp-ng.org/xostor/), which feels counterintuitive with a fast 10Gb link sitting right there, so I may well be misreading your setup.
If you're able to grab /var/log/SMlog and /var/log/xensource.log on the master host during one of the disconnect windows, that should show whether XAPI is timing out waiting on a storage operation. Nobody's chimed in yet, and this is pretty XOSTOR-specific, so it might be worth a mention to @Team-Storage; they'll know far more than I do about whether the satellite-network config is what's triggering it.
Hello! It looks like you're interested in this conversation, but you don't have an account yet.
Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.
With your input, this post could be even better 💗
Register Login