Created a new bond interface and lost management connectivity to hosts

Murf

Hi all, I have a fresh green fields install of Xen Orchestra with 3 x 8.3.0 servers.
The servers have 8 interfaces, of which I installed the management IP and VLAN on eth0, then connected all 3 to XO and added into a pool.
I then created a bond for eth0 and eth1 per https://xcp-ng.org/blog/2022/09/13/network-bonds-in-xcp-ng/ and the master failed to take any network config, and the 2 slaves both lost connectivity, even after a reboot.
Am I doing this correctly or did I miss something?

olivierlambert

Hi,

Do you have any specific MTU or all is by default?

Murf

@olivierlambert I have 9000 everywhere on the network so i set the MTU to that.
I did notice in a previous POC that I used 1500 in XO/XCP-NG though....
Thoughts?
How can I recover, just delete /re-install the hypervisor and reconnect?

olivierlambert

That's probably the issue (MTU mismatch). It's very likely "MTU 9000 everywhere" isn't true and that's causing the problem. Please review first that MTU is correctly set to the same value everywhere.

Murf

@olivierlambert Thanks, I have tried using xe-reset-networking and the emergency network reset function in the UI of XCP-NG and neither seem to work to reset the management Interface, after rebooting they still have no interfaces at all (which happened when the failed uplink bond was created).
I will probably need to wipe and re-install to get traction again.
Also I am confused with the MTU.
The servers we are using are configured (using a server profile template) that we also use with vMware, and the uplinks on those are all MTU 9000 both in the VNIC templates (ie hardware side) and in the distributed vSwitches in vCenter (equivalent of bonded interfaces in XO?)
If their was a problem with MTU, I would have though they would not work either.
I will keep at it, thanks for the help.

olivierlambert

Keep us posted if you are stuck again, let us know

Murf

@olivierlambert I have rebuilt the servers, and configured Pool networking successfully keeping the MTU at 1500.
However when I create a VIF on one link for usage in a VM I kept getting data timeouts in ssh sessions and server (VM) access to resources, and setting the VIF and the underlying PIF to MTU 9000 resolved the issue.

I am assuming I will need to do the same to the rest of my links, including the management one, but I dont want to end up in the same situation as before and lose my cluster. Any suggestions?!
Thanks.

Brett.

olivierlambert

Ping @bleader

bleader

The original problem could be a known issue when creating bond including the management interface that we have to investigate. Although the emergency network reset should have fixed that, so maybe it is a mix of the bond creation issue and MTU issue.

In the reinstalled pool, did you create the bonds already? If so I would think changing the MTU should be fine, especially as it worked on other PIF, but with MTU issue it is often quite sneaky, so I would not make any promises either.

Murf

@bleader said in Created a new bond interface and lost management connectivity to hosts:

The original problem could be a known issue when creating bond including the management interface that we have to investigate. Although the emergency network reset should have fixed that, so maybe it is a mix of the bond creation issue and MTU issue.

In the reinstalled pool, did you create the bonds already? If so I would think changing the MTU should be fine, especially as it worked on other PIF, but with MTU issue it is often quite sneaky, so I would not make any promises either.

Yes I have recreated them, but I am not game enough yet to change the management interfaces to MTU 9000, though I have done on bridge interfaces for VMs, because I was losing ssh sessions, and this solved the issue. Also, the bonded storage interface is MTU 9000 no problem.