@r1 Yup, see https://gist.github.com/vegarnilsen/dce2b5c17cf188f1fa2c7615dc6fefc4 for the modinfo and lsmod output.
@tuxen Since we're not using FibreChannel, I disabled fcoe before the latest test, see the gist above for info.
@r1 Yup, see https://gist.github.com/vegarnilsen/dce2b5c17cf188f1fa2c7615dc6fefc4 for the modinfo and lsmod output.
@tuxen Since we're not using FibreChannel, I disabled fcoe before the latest test, see the gist above for info.
@r1 We're not using the bnxt_en driver, we're using the bnx2x driver. But given your request I looked for and installed the alternate qlogic driver:
[10:29 oslo5pool3h03 etc]$ rpm -qa | grep qlogic
qlogic-qla2xxx-firmware-8.03.02-1.xcpng8.1.x86_64
qlogic-netxtreme2-4.19.0+1-modules-7.14.53-1.1.xcpng8.1.x86_64
qlogic-qla2xxx-10.01.00.54.80.0_k-1.xcpng8.1.x86_64
qlogic-fastlinq-8.37.30.0-3.xcpng8.1.x86_64
qlogic-netxtreme2-7.14.53-1.1.xcpng8.1.x86_64
[10:29 oslo5pool3h03 etc]$ rpm -qil qlogic-netxtreme2-4.19.0+1-modules-7.14.53-1.1.xcpng8.1.x86_64
Name : qlogic-netxtreme2-4.19.0+1-modules
Version : 7.14.53
Release : 1.1.xcpng8.1
Architecture: x86_64
Install Date: Tue 22 Sep 2020 06:04:01 PM CEST
Group : System Environment/Kernel
Size : 3048296
License : GPL
Signature : RSA/SHA1, Wed 12 Feb 2020 01:27:25 PM CET, Key ID cd75783a3fd3ac9e
Source RPM : qlogic-netxtreme2-7.14.53-1.1.xcpng8.1.src.rpm
Build Date : Wed 12 Feb 2020 01:13:59 PM CET
Build Host : koji.xcp-ng.org
Relocations : (not relocatable)
Packager : XCP-ng
Vendor : XCP-ng
Summary : Qlogic netxtreme2 device drivers
Description :
Qlogic netxtreme2 device drivers for the Linux Kernel
version 4.19.0+1.
/etc/modprobe.d/qlogic-netxtreme2.conf
/lib/modules/4.19.0+1/updates/bnx2.ko
/lib/modules/4.19.0+1/updates/bnx2fc.ko
/lib/modules/4.19.0+1/updates/bnx2i.ko
/lib/modules/4.19.0+1/updates/bnx2x.ko
/lib/modules/4.19.0+1/updates/cnic.ko
[10:29 oslo5pool3h03 etc]$ yum search qlogic
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
Excluding mirror: updates.xcp-ng.org
* xcp-ng-base: mirrors.xcp-ng.org
Excluding mirror: updates.xcp-ng.org
* xcp-ng-updates: mirrors.xcp-ng.org
====================================================== N/S matched: qlogic =======================================================
qlogic-fastlinq.x86_64 : Qlogic fastlinq device drivers
qlogic-fastlinq-debuginfo.x86_64 : Debug information for package qlogic-fastlinq
qlogic-netxtreme2.x86_64 : Qlogic NetXtreme II iSCSI, 1-Gigabit and 10-Gigabit ethernet drivers
qlogic-netxtreme2-4.19.0+1-modules.x86_64 : Qlogic netxtreme2 device drivers
qlogic-netxtreme2-alt.x86_64 : Qlogic NetXtreme II iSCSI, 1-Gigabit and 10-Gigabit ethernet drivers
qlogic-netxtreme2-alt-4.19.0+1-modules.x86_64 : Qlogic netxtreme2 device drivers
qlogic-netxtreme2-alt-debuginfo.x86_64 : Debug information for package qlogic-netxtreme2-alt
qlogic-netxtreme2-debuginfo.x86_64 : Debug information for package qlogic-netxtreme2
qlogic-qla2xxx.x86_64 : Qlogic qla2xxx device drivers
qlogic-qla2xxx-debuginfo.x86_64 : Debug information for package qlogic-qla2xxx
qlogic-qla2xxx-firmware.x86_64 : Qlogic qla2xxx firmware
qlogic-qla2xxx-firmware-debuginfo.x86_64 : Debug information for package qlogic-qla2xxx-firmware
Name and summary matches only, use "search all" for everything.
[10:30 oslo5pool3h03 etc]$ yum info qlogic-netxtreme2-alt-4.19.0+1-modules.x86_64
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
Excluding mirror: updates.xcp-ng.org
* xcp-ng-base: mirrors.xcp-ng.org
Excluding mirror: updates.xcp-ng.org
* xcp-ng-updates: mirrors.xcp-ng.org
Available Packages
Name : qlogic-netxtreme2-alt-4.19.0+1-modules
Arch : x86_64
Version : 7.14.63
Release : 2.xcpng8.1
Size : 1.2 M
Repo : xcp-ng-base
Summary : Qlogic netxtreme2 device drivers
License : GPL
Description : Qlogic netxtreme2 device drivers for the Linux Kernel
: version 4.19.0+1.
[10:30 oslo5pool3h03 etc]$ sudo yum install qlogic-netxtreme2-alt-4.19.0+1-modules.x86_64
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
Excluding mirror: updates.xcp-ng.org
* xcp-ng-base: mirrors.xcp-ng.org
Excluding mirror: updates.xcp-ng.org
* xcp-ng-updates: mirrors.xcp-ng.org
Resolving Dependencies
--> Running transaction check
---> Package qlogic-netxtreme2-alt-4.19.0+1-modules.x86_64 0:7.14.63-2.xcpng8.1 will be installed
--> Finished Dependency Resolution
Dependencies Resolved
==================================================================================================================================
Package Arch Version Repository Size
==================================================================================================================================
Installing:
qlogic-netxtreme2-alt-4.19.0+1-modules x86_64 7.14.63-2.xcpng8.1 xcp-ng-base 1.2 M
Transaction Summary
==================================================================================================================================
Install 1 Package
Total download size: 1.2 M
Installed size: 2.9 M
Is this ok [y/d/N]: y
Downloading packages:
qlogic-netxtreme2-alt-4.19.0+1-modules-7.14.63-2.xcpng8.1.x86_64.rpm | 1.2 MB 00:00:00
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
Installing : qlogic-netxtreme2-alt-4.19.0+1-modules-7.14.63-2.xcpng8.1.x86_64 1/1
Verifying : qlogic-netxtreme2-alt-4.19.0+1-modules-7.14.63-2.xcpng8.1.x86_64 1/1
Installed:
qlogic-netxtreme2-alt-4.19.0+1-modules.x86_64 0:7.14.63-2.xcpng8.1
Complete!
[10:32 oslo5pool3h03 etc]$
I rebooted the server, and booted up a couple of the VMs I'm having issues with, and then I ran ping
from one of the internal servers to an external site:
64 bytes from www.vg.no (195.88.54.16): icmp_seq=1338 ttl=248 time=2.88 ms
64 bytes from www.vg.no (195.88.54.16): icmp_seq=1339 ttl=248 time=3.04 ms
64 bytes from www.vg.no (195.88.54.16): icmp_seq=1340 ttl=248 time=3.17 ms
64 bytes from www.vg.no (195.88.54.16): icmp_seq=1341 ttl=248 time=2.91 ms
client_loop: send disconnect: Broken pipe
client_loop: send disconnect: Broken pipe
However, as you can see, this crashed the host after a while and resulted in a host with no network.
We are in the process of migrating our VMs from a XenServer 6.5 pool to a new pool running XCP-ng 8.1. After we migrated some VMs that are acting as gateways / firewalls for internal networks, the host(s) those VMs are running on loses network within a few minutes, at times within seconds, of the VM booting up. (The host is still running, and if I log in on the console everything except any network is working.)
The new pool is running on HP BL460c Gen8 blades, with 10Gb Flexfabric NICs, using the bnx2x driver.
When the host loses network these messages appear in kern.log:
[09:34 oslo5pool3h03 log]$ sudo grep bnx2x kern.log | grep timeout
Nov 9 10:24:52 oslo5pool3h03 kernel: [ 1537.425714] bnx2x: [bnx2x_stats_comp:211(eth0)]timeout waiting for stats finished
Nov 9 10:24:54 oslo5pool3h03 kernel: [ 1538.584055] bnx2x: [bnx2x_stats_comp:211(eth0)]timeout waiting for stats finished
Nov 9 10:25:24 oslo5pool3h03 kernel: [ 1568.785236] bnx2x: [bnx2x_stats_comp:211(eth0)]timeout waiting for stats finished
Nov 9 10:25:25 oslo5pool3h03 kernel: [ 1569.940934] bnx2x: [bnx2x_stats_comp:211(eth0)]timeout waiting for stats finished
Some pages I found through Google hint at IO-MMU being the problem. I tried disabling IO-MMU through grub parameters to the kernel, when I did that the host rebooted immediately when the test-VM caused the problem.
The NICs seem to be on the XenServer HCL, and since these are blade servers I can't swap the NICs to a different chipset, since all HPE NICs for this blade generation uses that same chipset.
"Regular" VMs are working fine, but VMs with multiple virtual NICs where there's traffic going from one interface to another seem to reliably crash the host.
I've applied all available updates to XCP-ng, this didn't make any difference.
Since we're not using FibreChannel, I tried disabling that module, also I tried disabling some offloading:
[09:40 oslo5pool3h03 log]$ cat /etc/modprobe.d/qlogic-netxtreme2.conf
options bnx2x num_vfs=0
options bnx2x disable_tpa=1
[09:40 oslo5pool3h03 log]$ cat /etc/modprobe.d/blacklist-fc.conf
blacklist bnx2fc
Neither of these made any difference.
@gn_ro Will there be any CPU masking if they all have the same feature set though?
In my scenario there's only a speed difference between the CPUs, nothing else.
I'm planning a new XCP-ng pool, where we plan on using HP Gen8 blades. These are available with a bunch of different CPU options, so I'd like to know if it would be reasonable to have e.g. half the blades with a CPU model that has fewer cores but higher clock speed, and half with a CPU with more cores and lower clock speed. All of the CPUs would be Xeon E5 26xx v2, so they should all have the same CPU features.
Would I be able to live-migrate guests between the fast and slow hosts, or would only cold-migrate be possible in this situation?
With such a setup, could I designate the slow hosts as the default for new guests? I would prefer to reserve the fast hosts for guests that actually need the higher clock speed, typically for single-thread workloads.
Cheers, Vegar
Aha, that helped. Thanks.
I've set up a new XCP-ng 8 pool with two hosts and a shared NFS server on a different VLAN from the management network. When I try to give each host an IP address on this NFS network I can't figure out how to do that, there's nothing in the documentation as far as I can tell.
I ended up booting Windows and installing the latest XCP-ng Center, where it's easy: Open each host's Network tab, choose configure in the management network section, and add a new IP address on the correct VLAN.
Once I had added the IP-addresses via XCP-ng Center, they show up in the network list in XO, and I can edit the address and network mask if I want to do so.
This is with XOA, "Current version: 5.43.2".
Thanks,
Vegar