All NICs on XCP-NG Node Running in Promiscuous Mode
-
This is more an information gathering post since I have scoured the forums, asked the AIs, and really haven't gotten much feedback.
XCP-NG 8.3.0
I've setup a 3 node cluster managed by XOA with NFS shared storage. While working through some tuning options for the storage, we noticed that all of the parent NICs on the hypervisors run in promiscuous mode, and that traffic is passed onto the VIFs of the VMs and can be seen by using tcpdump or nethogs. This very well may be expected behavior, but it seems odd to flood the VMs with this traffic. We have a similar setup on a VMware environment that we're migrating off, and do not see this behavior. Do the interfaces need to run in promisc mode or can that be disabled somehow? I tried some xe commands to disable the behavior but that didn't seem to change any behavior.
The NICs in use are Mellanox ConnectX-5 MT27800 dual-port 25GB. I have an open trunk on one of them use for creating VLAN networks on the hypervisors. The other port is a single native VLAN config used for storage traffic.
I did also notice some drops registering in the OS RX counters (we've only deployed Alma Linux 8.10 VMs so far). I have not tried to track down where those drops are coming from, but given the minimal traffic and load on this environment, it's surprising to see.
We do have Professional support and can open a ticket, but I figured I'd ask these forums before going that route. Thanks in advance everyone.
-
Not sure about this, asking @bleader
-
I think the promisc mode is due to the fact the interfaces end up in OVS bridges, without that, the traffic coming from the outside to the VMs MAC addresses would be dropped.
Once it reach the OVS bridge the interface is in, it is up to OVS to act as a switch and only forward packets to the MAC he knows on its ports so all the traffic should not be forwarded to all the VIFs.
I just tested on 8.2 and 8.3:
- tcpdumpping icmp on 2 VMs, pinging VM1 does not show traffic on VM2, pinging VM2 does not show traffic on VM1, pinging the host show no traffic on the VMs
- tcpdumpping everything, only ignoring ssh (as I was logged in on both VM in ssh), the only traffic I see is the multicast traffic on the network.
So to answer your question, yes it is normal the NICs are in promiscuous, but that should not lead to all traffic going to all the VMs.
-
Thanks for the test @bleader
If you think it's worth documenting somewhere, let us know that I ask Thomas
-
@carldotcliff if you are 100% positive you see traffic on the VM that should not reach them, it is worth opening a ticket as this is not an intended behavior. If you do, tell in the ticket that this was discussed in the forum with David (me), so our support team can assign it to me if they want to.
For the dropped packets, I do not see any on my home setup, which is a pretty "small" network, in our lab, we do have some on our hosts. On bigger network, that could be pretty much anything, broadcast or multicast reaching the host that the NIC is chosing to drop itself, some NIC will also drop some discovery protocol frames, it would be hard to identify unfortunately, but that would not worry me as long as it is not a high count and not impacting performances.
-
Thanks for the quick replies! Below are some tcpdump examples of what I am seeing on the XCP-NG nodes as well as the VMs:
small tcpdump from one of the trunk ports on a XCP-NG Server:
09:41:18.814082 IP 10.10.20.1 > vrrp.mcast.net: VRRPv2, Advertisement, vrid 20, prio 200, authtype none, intvl 1s, length 20 09:41:18.814500 ARP, Request who-has 21-WEST-SCANNER.belvederetrading.com tell chivmprdtemp002.belvederetrading.com, length 46 09:41:18.821352 ARP, Request who-has chisrvpbx001.belvederetrading.com tell 10.0.1.78, length 46 09:41:18.827963 IP 10.10.158.10 > chisrvgerrit001man.belvederetrading.com: exptest-253 46 09:41:18.836743 IP chisrvprdflx004.belvederetrading.com.d-s-n > chisrvdev213man.belvederetrading.com.37330: Flags [.], ack 1116740055, win 1574, length 0 09:41:18.844230 ARP, Request who-has CHIWKSDEV589.belvederetrading.com tell chiqfx5200-001.belvederetrading.com, length 46 09:41:18.852142 IP chiqfx5200-001.belvederetrading.com > vrrp.mcast.net: VRRPv2, Advertisement, vrid 25, prio 200, authtype none, intvl 1s, length 20 09:41:18.853338 IP 192.168.130.48.mdns > mdns.mcast.net.mdns: 0 [5q] PTR (QU)? _hap._tcp.local. PTR (QU)? _companion-link._tcp.local. PTR (QU)? _rdlink._tcp.local. PTR (QU)? _hap._udp.local. PTR (QU)? _sleep-proxy._udp.local. (104) 09:41:18.853880 IP6 fe80::4d8:5005:7c5:9c77.mdns > ff02::fb.mdns: 0 [5q] PTR (QU)? _hap._tcp.local. PTR (QU)? _companion-link._tcp.local. PTR (QU)? _rdlink._tcp.local. PTR (QU)? _hap._udp.local. PTR (QU)? _sleep-proxy._udp.local. (104) 09:41:18.868284 ARP, Reply CHIWKSDEV445.belvederetrading.com is-at 58:6c:25:ca:4c:d9 (oui Unknown), length 46 09:41:18.878062 IP 10.10.158.10 > chisrvgerrit001man.belvederetrading.com: exptest-253 46 09:41:18.888001 ARP, Request who-has 192.168.130.190 tell chiqfx5200-002.belvederetrading.com, length 46 09:41:18.888014 ARP, Request who-has chivmdevtst088.belvederetrading.com tell chiqfx5200-002.belvederetrading.com, length 46 09:41:18.888022 ARP, Request who-has CHIWKSDEV407.belvederetrading.com tell chiqfx5200-001.belvederetrading.com, length 46 09:41:18.888025 ARP, Request who-has CHIWKSADM211.belvederetrading.com tell chiqfx5200-002.belvederetrading.com, length 46 09:41:18.888193 IP 192.168.130.187.mdns > mdns.mcast.net.mdns: 0 PTR (QM)? lb._dns-sd._udp.local. (39) 09:41:18.888322 IP6 fe80::18f4:8194:26e4:aa31.mdns > ff02::fb.mdns: 0 PTR (QM)? lb._dns-sd._udp.local. (39) 09:41:18.894247 IP chisrvprdflx004.belvederetrading.com.d-s-n > 10.10.208.124.38420: Flags [P.], seq 1578138028:1578138277, ack 3622223145, win 2132, length 249 09:41:18.894292 IP chisrvprdflx004.belvederetrading.com.d-s-n > 10.10.208.124.38420: Flags [F.], seq 249, ack 1, win 2132, length 0 09:41:18.894828 STP 802.1w, Rapid STP, Flags [Learn, Forward, Agreement], bridge-id 8020.00:1c:73:ac:15:57.8033, length 42 09:41:18.905751 IP chisrvprdflx004.belvederetrading.com.d-s-n > 10.10.208.86.36826: Flags [.], ack 2910688230, win 1492, length 0 09:41:18.905804 IP chisrvprdflx004.belvederetrading.com.d-s-n > 10.10.208.86.36826: Flags [.], ack 16, win 1492, length 0 09:41:18.907760 IP chisrvprdflx004.belvederetrading.com.d-s-n > 10.10.208.86.36826: Flags [.], ack 262, win 1491, length 0 09:41:18.907802 IP chisrvprdflx004.belvederetrading.com.d-s-n > 10.10.208.86.36826: Flags [.], ack 754, win 1488, length 0
As you can see, traffic from all subnets is visible to that NIC, which with the NIC running in promiscuous mode is expected since it's open to all VLANs.
small tcpdump from one VM with an interface using a VIF on VLAN 208 from the NIC mentioned above:
09:47:32.030953 IP chisrvprdflx004.belvederetrading.com.d-s-n > 10.10.208.124.38530: Flags [.], ack 62746, win 1619, length 0 09:47:32.030954 IP chisrvprdflx004.belvederetrading.com.d-s-n > 10.10.208.124.38530: Flags [.], ack 63272, win 1619, length 0 09:47:32.030955 IP chisrvprdflx004.belvederetrading.com.d-s-n > 10.10.208.124.38530: Flags [.], ack 63277, win 1619, length 0 09:47:32.031001 IP chisrvprdflx004.belvederetrading.com.d-s-n > chisrvdev210man.belvederetrading.com.57040: Flags [.], ack 104812, win 5836, length 0 09:47:32.031047 IP chisrvprdflx004.belvederetrading.com.d-s-n > chisrvdev210man.belvederetrading.com.57040: Flags [.], ack 106534, win 5852, length 0 09:47:32.031177 IP chisrvprdflx004.belvederetrading.com.d-s-n > chisrvdev210man.belvederetrading.com.57040: Flags [.], ack 109470, win 5830, length 0 09:47:32.031210 IP chisrvprdflx004.belvederetrading.com.d-s-n > chisrvdev210man.belvederetrading.com.57040: Flags [.], ack 110716, win 5863, length 0 09:47:32.041909 IP chisrvprdflx004.belvederetrading.com.d-s-n > chisrvdev203man.belvederetrading.com.57316: Flags [.], ack 262, win 2826, length 0 09:47:32.041942 IP chisrvprdflx004.belvederetrading.com.d-s-n > chisrvdev203man.belvederetrading.com.57316: Flags [.], ack 508, win 2825, length 0 09:47:32.041975 IP chisrvprdflx004.belvederetrading.com.d-s-n > chisrvdev203man.belvederetrading.com.57316: Flags [.], ack 754, win 2824, length 009:47:32.162042 IP chisrvprdflx004.belvederetrading.com.d-s-n > chisrvdev199man.belvederetrading.com.52024: Flags [.], ack 81426, win 2340, length 0 09:47:32.162052 IP chisrvprdflx004.belvederetrading.com.d-s-n > chisrvdev199man.belvederetrading.com.52024: Flags [.], ack 82426, win 2340, length 0 09:47:32.162073 IP chisrvprdflx004.belvederetrading.com.d-s-n > chisrvdev199man.belvederetrading.com.52024: Flags [.], ack 85378, win 2340, length 0 09:47:32.162330 IP chisrvprdflx004.belvederetrading.com.d-s-n > chisrvdev199man.belvederetrading.com.52024: Flags [.], ack 98908, win 2235, length 0 09:47:32.162474 IP chisrvprdflx004.belvederetrading.com.d-s-n > chisrvdev199man.belvederetrading.com.52024: Flags [.], ack 101959, win 2212, length 0
All of that traffic is from other hosts on VLAN 208, but looking at the NIC config, promiscuity is set to 0, so it should not be getting that traffic passed to it:
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000 link/ether 0e:2a:8d:d8:3b:b0 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535 addrgenmode none numtxqueues 8 numrxqueues 8 gso_max_size 65536 gso_max_segs 65535 parentbus xen parentdev vif-0
While typing this up though, I got in touch with one of my network admins. It appears our QFX switches have a bug where ARP entries are not being. discovered properly, leading to this kind of behavior. We were able to manually add entries and prevent this from happening, so I'll likely be on the hunt for non-existent ARP entries to clean up what I am seeing, and will go from there if the issue persists.
Thanks again for the follow up!
-
Running tcpdump switches the interface to promiscuous to allow all traffic that reaches the NIC to be dumped. So I assume the issue you had on your switches allowed traffic to reach the host, that was forwarding it to the VMs, and wasn't dropped because tcpdump switched the VIF into promiscuous mode.
If it seems resolved, that's good, otherwise let us know if we need to investigate further on this