XCP-ng 8.3 betas and RCs feedback 🚀

exetico

@stormi Updated a single lab with no problems.

I'm not sure if "pure positive" updates are meant for this thread, so I'll just report that, anyways .

Pierre

@stormi For us, the openvswitch-2.17.7-2.1.xcpng8.3 update has definitively solved the problem of overlay networks that weren't working (thanks David M. for all your help and hard work).

Unfortunately, the sm-3.2.3-1.1.xcpng8.3 update seems to have broken something in our iSCSI sessions. We now have 3 sessions instead of the usual 4 (2 interfaces on the host to two controllers on the storage array).

We have a less-than-ideal configuration with a large, flat network ( we have a project planned to put everything back the way it should be) with interfaces named c_eth4 and c_eth5, as recommended in the documentation in such cases.

It seems that the new version respects the use of interfaces for one of the two targets, but not for the other, which uses the default interface.

# iscsiadm -m session -P3
iSCSI Transport Class version 2.0-870
version 6.2.0.874-7
Target: iqn.2006-08.********************:21008041266ebf8e::20602:***.***.***.*** (non-flash)
        Current Portal: ***.***.***.***:3260,1539
        Persistent Portal: ***.***.***.***:3260,1539
                **********
                Interface:
                **********
                Iface Name: default
                Iface Transport: tcp
                Iface Initiatorname: iqn.2024-06.*********xcp-01.lan.mgmt:7203c069
                Iface IPaddress: ***.***.20.155
                Iface HWaddress: <empty>
                Iface Netdev: <empty>
                SID: 1
                iSCSI Connection State: LOGGED IN
                iSCSI Session State: LOGGED_IN
                Internal iscsid Session State: NO CHANGE
				[...]

Target: iqn.2006-08.********************:21008041266ebf8e::1020602:***.***.***.*** (non-flash)
        Current Portal: ***.***.***.***:3260,1549
        Persistent Portal: ***.***.***.***:3260,1549
                **********
                Interface:
                **********
                Iface Name: c_eth5
                Iface Transport: tcp
                Iface Initiatorname: iqn.2024-06.*********xcp-01.lan.mgmt:7203c069
                Iface IPaddress: ***.***.21.155
                Iface HWaddress: <empty>
                Iface Netdev: xenbr5
                SID: 2
                iSCSI Connection State: LOGGED IN
                iSCSI Session State: LOGGED_IN
                Internal iscsid Session State: NO CHANGE
				[...]

                **********
                Interface:
                **********
                Iface Name: c_eth4
                Iface Transport: tcp
                Iface Initiatorname: iqn.2024-06.*********xcp-01.lan.mgmt:7203c069
                Iface IPaddress: ***.***.20.155
                Iface HWaddress: <empty>
                Iface Netdev: xenbr4
                SID: 3
                iSCSI Connection State: LOGGED IN
                iSCSI Session State: LOGGED_IN
                Internal iscsid Session State: NO CHANGE
				[...]

stormi

@ph7 said in XCP-ng 8.3 betas and RCs feedback :

@stormi
I ran yum update a few minutes ago and got this:
warning: %triggerin(sm-3.2.3-1.1.xcpng8.3.x86_64) scriptlet failed, exit status 1 Non-fatal <unknown> scriptlet failure in rpm package sm-3.2.3-1.1.xcpng8.3.x86_64

The server started and the VMs seems to be running.
I'm not running iSCSI so i hope its OK

It comes from a trigger added by XenServer which updates the /etc/cgrules.conf file but attempts to do so even when already patched, so it fails.

No consequences but this warning.

I fixed it and reported it upstream: https://github.com/xapi-project/sm/issues/705

stormi created this issue in xapi-project/sm

closed [packaging] Failing patch application upon update in sm's RPM triggerin on XenServer 8 #705

brezlord

I did an update of a 8.2 host to 8.3 via ISO install. Everything is working as it should but I get the below error on the host advanced tab with PCI passthrough. I had a Nvidia GPU passed through to a RHEL 9 VM on the 8.2 host. This was done via the command line.

I will load 8.3 on a new host with the same hardware config later in the week to confirm that it must be something to do with the 8.2 --> 8.3 upgrade.

Screenshot from 2024-08-19 19-33-53.png

Let me know if you need more info.

olivierlambert

That's an interesting one, maybe a bug in the way we parse all the device? Can you copy/paste lspci in here?

brezlord

Info as requested.

[21:08 xcp-ng-01 ~]# lspci
00:00.0 Host bridge: Intel Corporation 10th Gen Core Processor Host Bridge/DRAM Registers (rev 05)
00:01.0 PCI bridge: Intel Corporation 6th-10th Gen Core Processor PCIe Controller (x16) (rev 05)
00:02.0 VGA compatible controller: Intel Corporation CometLake-S GT2 [UHD Graphics 630] (rev 05)
00:04.0 Signal processing controller: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Thermal Subsystem (rev 05)
00:12.0 Signal processing controller: Intel Corporation Comet Lake PCH Thermal Controller
00:14.0 USB controller: Intel Corporation Comet Lake USB 3.1 xHCI Host Controller
00:14.2 RAM memory: Intel Corporation Comet Lake PCH Shared SRAM
00:16.0 Communication controller: Intel Corporation Comet Lake HECI Controller
00:16.3 Serial controller: Intel Corporation Comet Lake Keyboard and Text (KT) Redirection
00:17.0 SATA controller: Intel Corporation Comet Lake SATA AHCI Controller
00:1b.0 PCI bridge: Intel Corporation Comet Lake PCI Express Root Port #21 (rev f0)
00:1d.0 PCI bridge: Intel Corporation Comet Lake PCI Express Root Port #9 (rev f0)
00:1f.0 ISA bridge: Intel Corporation Q470 Chipset LPC/eSPI Controller
00:1f.3 Audio device: Intel Corporation Comet Lake PCH cAVS
00:1f.4 SMBus: Intel Corporation Comet Lake PCH SMBus Controller
00:1f.5 Serial bus controller: Intel Corporation Comet Lake PCH SPI Controller
00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (11) I219-LM
01:00.0 VGA compatible controller: NVIDIA Corporation GP107GL [Quadro P600] (rev a1)
01:00.1 Audio device: NVIDIA Corporation GP107GL High Definition Audio Controller (rev a1)
02:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983
03:00.0 Ethernet controller: Intel Corporation Ethernet 10G 2P X520 Adapter (rev 01)
03:00.1 Ethernet controller: Intel Corporation Ethernet 10G 2P X520 Adapter (rev 01)

There error is present in both:

Current version: 5.97.0 - XOA build: 20240401
Xen Orchestra from source code, commit 70014 / Master, commit e5c53

rmaclachlan

Seems to be running fine on our 12 hosts, iSCSI came up fine too with all our paths.

olivierlambert

@brezlord I think it might be related to some recent XAPI code, I don't think it's XO related. Might worth a specific investigation, I'm not sure it's due to the upgrade itself.

brezlord

@olivierlambert I'll report back at the end of the week when I deploy a new host with the exact same hardware config and a fresh install.

Pierre

@pierre-c said in XCP-ng 8.3 betas and RCs feedback :

@stormi For us, the openvswitch-2.17.7-2.1.xcpng8.3 update has definitively solved the problem of overlay networks that weren't working (thanks David M. for all your help and hard work).

Unfortunately, the sm-3.2.3-1.1.xcpng8.3 update seems to have broken something in our iSCSI sessions. We now have 3 sessions instead of the usual 4 (2 interfaces on the host to two controllers on the storage array).

We have a less-than-ideal configuration with a large, flat network ( we have a project planned to put everything back the way it should be) with interfaces named c_eth4 and c_eth5, as recommended in the documentation in such cases.

It seems that the new version respects the use of interfaces for one of the two targets, but not for the other, which uses the default interface.

# iscsiadm -m session -P3
iSCSI Transport Class version 2.0-870
version 6.2.0.874-7
Target: iqn.2006-08.********************:21008041266ebf8e::20602:***.***.***.*** (non-flash)
        Current Portal: ***.***.***.***:3260,1539
        Persistent Portal: ***.***.***.***:3260,1539
                **********
                Interface:
                **********
                Iface Name: default
                Iface Transport: tcp
                Iface Initiatorname: iqn.2024-06.*********xcp-01.lan.mgmt:7203c069
                Iface IPaddress: ***.***.20.155
                Iface HWaddress: <empty>
                Iface Netdev: <empty>
                SID: 1
                iSCSI Connection State: LOGGED IN
                iSCSI Session State: LOGGED_IN
                Internal iscsid Session State: NO CHANGE
				[...]

Target: iqn.2006-08.********************:21008041266ebf8e::1020602:***.***.***.*** (non-flash)
        Current Portal: ***.***.***.***:3260,1549
        Persistent Portal: ***.***.***.***:3260,1549
                **********
                Interface:
                **********
                Iface Name: c_eth5
                Iface Transport: tcp
                Iface Initiatorname: iqn.2024-06.*********xcp-01.lan.mgmt:7203c069
                Iface IPaddress: ***.***.21.155
                Iface HWaddress: <empty>
                Iface Netdev: xenbr5
                SID: 2
                iSCSI Connection State: LOGGED IN
                iSCSI Session State: LOGGED_IN
                Internal iscsid Session State: NO CHANGE
				[...]

                **********
                Interface:
                **********
                Iface Name: c_eth4
                Iface Transport: tcp
                Iface Initiatorname: iqn.2024-06.*********xcp-01.lan.mgmt:7203c069
                Iface IPaddress: ***.***.20.155
                Iface HWaddress: <empty>
                Iface Netdev: xenbr4
                SID: 3
                iSCSI Connection State: LOGGED IN
                iSCSI Session State: LOGGED_IN
                Internal iscsid Session State: NO CHANGE
				[...]

Concerning our problem with iSCSI sessions, I have just tried a downgrade of sm followed by a reboot :

yum downgrade sm sm-fairlock

I can confirm that it works again on 4 paths via both interfaces after downgrade.

# iscsiadm -m session -o show
tcp: [2] ***.***22.1:3260,1539 iqn.2006-08.********************:21008041266ebf8e::20602:***.***22.1 (non-flash)
tcp: [3] ***.***22.1:3260,1539 iqn.2006-08.********************:21008041266ebf8e::20602:***.***22.1 (non-flash)
tcp: [4] ***.***23.1:3260,1549 iqn.2006-08.********************:21008041266ebf8e::1020602:***.***23.1 (non-flash)
tcp: [5] ***.***23.1:3260,1549 iqn.2006-08.********************:21008041266ebf8e::1020602:***.***23.1 (non-flash)

# iscsiadm -m session -P3
iSCSI Transport Class version 2.0-870
version 6.2.0.874-7
Target: iqn.2006-08.********************:21008041266ebf8e::20602:***.***22.1 (non-flash)
        Current Portal: ***.***22.1:3260,1539
        Persistent Portal: ***.***22.1:3260,1539
                **********
                Interface:
                **********
                Iface Name: c_eth5
                Iface Transport: tcp
                Iface Initiatorname: iqn.2024-06.*********xcp-01*********:7203c069
                Iface IPaddress: ***.***21.155
                Iface HWaddress: <empty>
                Iface Netdev: xenbr5
				[...]
                **********
                Interface:
                **********
                Iface Name: c_eth4
                Iface Transport: tcp
                Iface Initiatorname: iqn.2024-06.*********xcp-01*********:7203c069
                Iface IPaddress: ***.***20.155
                Iface HWaddress: <empty>
                Iface Netdev: xenbr4
				[...]
Target: iqn.2006-08.********************:21008041266ebf8e::1020602:***.***23.1 (non-flash)
        Current Portal: ***.***23.1:3260,1549
        Persistent Portal: ***.***23.1:3260,1549
                **********
                Interface:
                **********
                Iface Name: c_eth5
                Iface Transport: tcp
                Iface Initiatorname: iqn.2024-06.*********xcp-01*********:7203c069
                Iface IPaddress: ***.***21.155
                Iface HWaddress: <empty>
                Iface Netdev: xenbr5
				[...]
                **********
                Interface:
                **********
                Iface Name: c_eth4
                Iface Transport: tcp
                Iface Initiatorname: iqn.2024-06.*********xcp-01*********:7203c069
                Iface IPaddress: ***.***20.155
                Iface HWaddress: <empty>
                Iface Netdev: xenbr4
                SID: 5
				[...]

So we do have a problem with the new version of sm

Anonabhar

@pierre-c I dont use iSCSI in my lab anymore, but yesterday when I did a "yum update" of 8.3 Beta, I got a script error when yum was running through all the updates. The script that failed was part of the sm update.

To be honest, I didnt bother doing anything with the error as everything seemed to work OK for me after, so I just ignored it. But.. Maybe I should have said something.. Sorry guys..

stormi

@Anonabhar We discussed the script error above. It's benign and will be fixed.

brezlord

@brezlord said in XCP-ng 8.3 betas and RCs feedback :

I did an update of a 8.2 host to 8.3 via ISO install. Everything is working as it should but I get the below error on the host advanced tab with PCI passthrough. I had a Nvidia GPU passed through to a RHEL 9 VM on the 8.2 host. This was done via the command line.

I will load 8.3 on a new host with the same hardware config later in the week to confirm that it must be something to do with the 8.2 --> 8.3 upgrade.

Let me know if you need more info.

@olivierlambert I can confirm that a host with the exact same hardware config and a fresh install of 8.3-RC does not have this issue. Only the host upgraded from 8.2 LTS.

olivierlambert

That's interesting. Thanks for the feedback!

brezlord

@olivierlambert Do you need any further information before I re-image with the host 8.3?

ecoutinho

While installing 8.3 RC1 it says BIOS boot mode is deprecated and I should use UEFI. But if I boot in UEFI, when I choose to upgrade the existing installation, an error occurs: "Installer mode (UEFI) is mismatched with target host mode (legacy)".

To be clear, it is not possible to migrate from BIOS mode to UEFI during an upgrade?

In that case, the only option would be to remove the host from the pool, install XCP-ng (not upgrade) and then add it to the pool. Am I correct? Thanks.

stormi

@ecoutinho at the moment, reinstalling is indeed the only solution to switch from BIOS to UEFI.

Note that the deprecation announcement comes with 8.3, but BIOS boot mode will still work and is still tested for this release. So you have some time to plan the switch.

ecoutinho

@stormi I've upgraded the master in BIOS mode, and proceeded to reinstall one of the other hosts in UEFI mode, after having removed it from the pool. It went fine, until I tried to add the host back into the pool. This was not possible because the reinstalled host has Certificate Verification enabled, while the pool doesn't. Even if I host-emergency-disable-tls-verification, it's not possible to add the server back into the pool.

It seems I'll have to upgrade the other hosts in BIOS mode, enable certificate verification on the pool and then add this host. I guess I'll reinstall the other hosts in UEFI mode on a future upgrade.

stormi

@ecoutinho You can enable TLS verification on the pool then join the new host. Or disable it on the new host but that's a downgrade of this new security feature meant to protect against MITM attacks.

ecoutinho

@stormi Thanks for your suggestions. I've tried to enable it on the pool:

# xe pool-enable-tls-verification
This operation is not supported during an upgrade.

I have to finish the upgrade of the other hosts before enabling it on the pool.

As for disabling it on the new host, I didn't find any way to do it permanently. I just found the host-emergency-disable-tls-verification option, which does not disable it completely, and doesn't allow to add it to a pool without TLS verification. Would you clarify how to disable it on the new host?

I will enable it on the pool when the upgrade is finished.