XCP-ng 8.2.1 (maintenance update) - ready for testing

Andrew

@stormi I upgraded my normal pool from 8.2.0 to 8.2.1 (staging) using yum. It took some work because of the version change my pool master got unhappy with the order I did it. My mistake with the process... I ended up upgrading and rebooting all pool members and then things were good. I abused the upgrade process and things still worked out in the end. No trouble, stuck, damaged, or lost VMs (or other resources). Things are working as they should including shared SR on NFS, ISO on NFS, VxLAN, migration, replication, and S3 delta backups. I'm not testing USB/GPU/pass-thru.

JeffBerntsen

@stormi said in XCP-ng 8.2.1 (maintenance update) - ready for testing:

@jeffberntsen probably not. We'll need to rebuild some packages, like sm on top of the latest versions else you will lose needed specific patches that are not merged in the main branch yet.

CC @ronan-a

That's what I thought but figured it wouldn't hurt to ask.

stormi

I updated the announcement above with details about the changes and on what to focus tests on if possible.

Andrew

@stormi I'm not sure what I was doing at the time....

host.isHyperThreadingEnabled
{
  "id": "b9aaf368-7be4-4b5f-ae9d-867e7e83d1e3"
}
{
  "code": "-1",
  "params": [
    "'module' object has no attribute 'run'",
    "",
    "Traceback (most recent call last):
  File \"/etc/xapi.d/plugins/xcpngutils/__init__.py\", line 98, in wrapper
    return func(*args, **kwds)
  File \"/etc/xapi.d/plugins/hyperthreading.py\", line 14, in get_hyperthreading
    result = run_command(['xl', 'info', 'threads_per_core'])
  File \"/etc/xapi.d/plugins/xcpngutils/__init__.py\", line 67, in run_command
    res = subprocess.run(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE, check=True)
AttributeError: 'module' object has no attribute 'run'
"
  ],
  "call": {
    "method": "host.call_plugin",
    "params": [
      "OpaqueRef:6d554a61-ec51-49b0-b58d-e002ea93ce54",
      "hyperthreading.py",
      "get_hyperthreading",
      {}
    ]
  },
  "message": "-1('module' object has no attribute 'run', , Traceback (most recent call last):
  File \"/etc/xapi.d/plugins/xcpngutils/__init__.py\", line 98, in wrapper
    return func(*args, **kwds)
  File \"/etc/xapi.d/plugins/hyperthreading.py\", line 14, in get_hyperthreading
    result = run_command(['xl', 'info', 'threads_per_core'])
  File \"/etc/xapi.d/plugins/xcpngutils/__init__.py\", line 67, in run_command
    res = subprocess.run(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE, check=True)
AttributeError: 'module' object has no attribute 'run'
)",
  "name": "XapiError",
  "stack": "XapiError: -1('module' object has no attribute 'run', , Traceback (most recent call last):
  File \"/etc/xapi.d/plugins/xcpngutils/__init__.py\", line 98, in wrapper
    return func(*args, **kwds)
  File \"/etc/xapi.d/plugins/hyperthreading.py\", line 14, in get_hyperthreading
    result = run_command(['xl', 'info', 'threads_per_core'])
  File \"/etc/xapi.d/plugins/xcpngutils/__init__.py\", line 67, in run_command
    res = subprocess.run(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE, check=True)
AttributeError: 'module' object has no attribute 'run'
)
    at Function.wrap (/opt/xo/xo-builds/xen-orchestra-202201310821/packages/xen-api/src/_XapiError.js:16:12)
    at /opt/xo/xo-builds/xen-orchestra-202201310821/packages/xen-api/src/transports/json-rpc.js:41:27
    at AsyncResource.runInAsyncScope (node:async_hooks:199:9)
    at cb (/opt/xo/xo-builds/xen-orchestra-202201310821/node_modules/bluebird/js/release/util.js:355:42)
    at tryCatcher (/opt/xo/xo-builds/xen-orchestra-202201310821/node_modules/bluebird/js/release/util.js:16:23)
    at Promise._settlePromiseFromHandler (/opt/xo/xo-builds/xen-orchestra-202201310821/node_modules/bluebird/js/release/promise.js:547:31)
    at Promise._settlePromise (/opt/xo/xo-builds/xen-orchestra-202201310821/node_modules/bluebird/js/release/promise.js:604:18)
    at Promise._settlePromise0 (/opt/xo/xo-builds/xen-orchestra-202201310821/node_modules/bluebird/js/release/promise.js:649:10)
    at Promise._settlePromises (/opt/xo/xo-builds/xen-orchestra-202201310821/node_modules/bluebird/js/release/promise.js:729:18)
    at _drainQueueStep (/opt/xo/xo-builds/xen-orchestra-202201310821/node_modules/bluebird/js/release/async.js:93:12)
    at _drainQueue (/opt/xo/xo-builds/xen-orchestra-202201310821/node_modules/bluebird/js/release/async.js:86:9)
    at Async._drainQueues (/opt/xo/xo-builds/xen-orchestra-202201310821/node_modules/bluebird/js/release/async.js:102:5)
    at Immediate.Async.drainQueues [as _onImmediate] (/opt/xo/xo-builds/xen-orchestra-202201310821/node_modules/bluebird/js/release/async.js:15:14)
    at processImmediate (node:internal/timers:464:21)
    at process.topLevelDomainCallback (node:domain:152:15)
    at process.callbackTrampoline (node:internal/async_hooks:128:24)"
}

stormi

@andrew Thanks. We'll investigate.

Andrew

@stormi I did a fresh install from the new ISO... after reboot I get an error reported in user.log.

Feb  2 18:19:39 xcp4 kdump: Loaded crash kernel
Feb  2 18:19:43 xcp4 fcoe_driver INFO: eth0 is FCoE capable
Feb  2 18:19:43 xcp4 fcoe_driver INFO: eth1 is FCoE capable
Feb  2 18:19:43 xcp4 fcoe_driver CRITICAL:
Feb  2 18:19:43 xcp4 fcoe_driver CRITICAL: ['Traceback (most recent call last):\n', '  File "/opt/xensource/libexec/fcoe_driver", line 34, in execute\n    output = subprocess.check_output(cmd)\n', '  File "/usr/lib64/python2.7/subprocess.py", line 575, in check_output\n    raise CalledProcessError(retcode, cmd, output=output)\n', "CalledProcessError: Command '['fcoeadm', '-i']' returned non-zero exit status 2\n"]
Feb  2 18:19:43 xcp4 fcoe_driver INFO: Applying config on interface: eth0
Feb  2 18:19:44 xcp4 fcoe_driver INFO: Applying config on interface: eth1

Andrew

This post is deleted!

Andrew

@stormi Here's more info from the xensource.log

I found this error happens when you use XO, click on a HOST and then the ADVANCED tab.

Feb  2 18:40:32 xcp4 xapi: [debug||741 HTTPS 192.168.1.131->:::80|host.get_sched_gran R:cdd533230ce9|audit] Host.get_sched_gran: host='a87516dc-1363-450d-8384-10e9e4a131b4 (xcp4)'
Feb  2 18:40:32 xcp4 xapi: [debug||741 HTTPS 192.168.1.131->:::80|host.get_sched_gran R:cdd533230ce9|helpers] about to call script: /opt/xensource/libexec/xen-cmdline
Feb  2 18:40:32 xcp4 xapi: [debug||742 HTTPS 192.168.1.131->:::80|host.call_plugin R:4f64bd0de6ba|audit] Host.call_plugin host = 'a87516dc-1363-450d-8384-10e9e4a131b4 (xcp4)'; plugin = 'hyperthre
ading.py'; fn = 'get_hyperthreading' args = [ 'hidden' ]
Feb  2 18:40:32 xcp4 xapi: [ warn||740 HTTPS 192.168.1.131->:::80|event.from D:b61f8cdc98d8|xapi_message] get_since_for_events: no in_memory_cache!
Feb  2 18:40:32 xcp4 xapi: [debug||741 HTTPS 192.168.1.131->:::80|host.get_sched_gran R:cdd533230ce9|helpers] /opt/xensource/libexec/xen-cmdline --get-xen sched-gran succeeded [ output = '\x0A' ]
Feb  2 18:40:32 xcp4 xapi: [error||742 :::80||backtrace] host.call_plugin R:4f64bd0de6ba failed with exception Server_error(-1, [ 'module' object has no attribute 'run'; ; Traceback (most recent call last):\x0A  File "/etc/xapi.d/plugins/xcpngutils/__init__.py", line 98, in wrapper\x0A    return func(*args, **kwds)\x0A  File "/etc/xapi.d/plugins/hyperthreading.py", line 14, in get_hyperthreading\x0A    result = run_command(['xl', 'info', 'threads_per_core'])\x0A  File "/etc/xapi.d/plugins/xcpngutils/__init__.py", line 67, in run_command\x0A    res = subprocess.run(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE, check=True)\x0AAttributeError: 'module' object has no attribute 'run'\x0A ])
Feb  2 18:40:32 xcp4 xapi: [error||742 :::80||backtrace] Raised Server_error(-1, [ 'module' object has no attribute 'run'; ; Traceback (most recent call last):\x0A  File "/etc/xapi.d/plugins/xcpngutils/__init__.py", line 98, in wrapper\x0A    return func(*args, **kwds)\x0A  File "/etc/xapi.d/plugins/hyperthreading.py", line 14, in get_hyperthreading\x0A    result = run_command(['xl', 'info', 'threads_per_core'])\x0A  File "/etc/xapi.d/plugins/xcpngutils/__init__.py", line 67, in run_command\x0A    res = subprocess.run(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE, check=True)\x0AAttributeError: 'module' object has no attribute 'run'\x0A ])
Feb  2 18:40:32 xcp4 xapi: [error||742 :::80||backtrace] 1/6 xapi Raised at file ocaml/xapi/rbac.ml, line 231
Feb  2 18:40:32 xcp4 xapi: [error||742 :::80||backtrace] 2/6 xapi Called from file ocaml/xapi/server_helpers.ml, line 103
Feb  2 18:40:32 xcp4 xapi: [error||742 :::80||backtrace] 3/6 xapi Called from file ocaml/xapi/server_helpers.ml, line 121
Feb  2 18:40:32 xcp4 xapi: [error||742 :::80||backtrace] 4/6 xapi Called from file lib/xapi-stdext-pervasives/pervasiveext.ml, line 24
Feb  2 18:40:32 xcp4 xapi: [error||742 :::80||backtrace] 5/6 xapi Called from file lib/xapi-stdext-pervasives/pervasiveext.ml, line 35
Feb  2 18:40:32 xcp4 xapi: [error||742 :::80||backtrace] 6/6 xapi Called from file lib/backtrace.ml, line 177
Feb  2 18:40:32 xcp4 xapi: [error||742 :::80||backtrace]
Feb  2 18:40:32 xcp4 xapi: [ warn||743 HTTPS 19.168.1.131->:::80|event.from D:6e6288e090db|xapi_message] get_since_for_events: no in_memory_cache!
Feb  2 18:40:32 xcp4 xapi: [ warn||744 HTTPS 192.168.1.131->:::80|event.from D:6c41ed917a6a|xapi_message] get_since_for_events: no in_memory_cache!

stormi

@andrew said in XCP-ng 8.2.1 (maintenance update) - ready for testing:

@stormi I did a fresh install from the new ISO... after reboot I get an error reported in user.log.

Feb  2 18:19:39 xcp4 kdump: Loaded crash kernel
Feb  2 18:19:43 xcp4 fcoe_driver INFO: eth0 is FCoE capable
Feb  2 18:19:43 xcp4 fcoe_driver INFO: eth1 is FCoE capable
Feb  2 18:19:43 xcp4 fcoe_driver CRITICAL:
Feb  2 18:19:43 xcp4 fcoe_driver CRITICAL: ['Traceback (most recent call last):\n', '  File "/opt/xensource/libexec/fcoe_driver", line 34, in execute\n    output = subprocess.check_output(cmd)\n', '  File "/usr/lib64/python2.7/subprocess.py", line 575, in check_output\n    raise CalledProcessError(retcode, cmd, output=output)\n', "CalledProcessError: Command '['fcoeadm', '-i']' returned non-zero exit status 2\n"]
Feb  2 18:19:43 xcp4 fcoe_driver INFO: Applying config on interface: eth0
Feb  2 18:19:44 xcp4 fcoe_driver INFO: Applying config on interface: eth1

Is this something you'd reproduce with XCP-ng 8.2 at first boot?

Andrew

@stormi No, I did not see it before. Also it's not an error on hosts upgraded from 8.2.0 to 8.2.1.

stormi

@andrew said in XCP-ng 8.2.1 (maintenance update) - ready for testing:

@stormi Here's more info from the xensource.log

I found this error happens when you use XO, click on a HOST and then the ADVANCED tab.

We reproduced and will fix this one. Thanks!

stormi

@andrew said in XCP-ng 8.2.1 (maintenance update) - ready for testing:

@stormi No, I did not see it before. Also it's not an error on hosts upgraded from 8.2.0 to 8.2.1.

At first I don't see what could have changed here. Does the error appear to have consequences?

Andrew

@stormi I'll say no consequences because I don't use FCoE. It looks like a timing issue with exactly when the script runs and the status of the ethernet interfaces. As I was looking at the code and the error vanished so now runs correctly (without changes).

stormi

Update! New ISOs, and various updates to the previous 8.2.1 test packages.

Hello dear testers.

It's the final sprint (~1 week!)

I have uploaded new installation ISO images to https://updates.xcp-ng.org/tmp/

Changes since last time

kernel-alt updated to 4.19.227

@r1 has updated our alternate kernel to a much more recent maintenance release of kernel.org's 4.19 branch.
To test: installation on real hardware using the alternate kernel (check the docs for instructions). Booting and using the installed system with the alternate kernel ( it's not the default boot entry).

xcp-ng-xapi-plugins: regressions fixed

Issues reported by @Andrew were fixed.

Includes the latest security fixes we released to XCP-ng 8.2

Nothing to add about it. Check the blog.

UEFI: uefistored was updated

This component necessary for UEFI support and Secure Boot was updated with minor fixes: logs used to contain error messages that brought nothing useful but pollution to the logs. They have been removed.

net-install now available

You can find the netinstall ISO at https://updates.xcp-ng.org/tmp/

xcp-ng-release-config

Fixed the cipher list in SSH configuration for people updating from an already up to date XCP-ng 8.2 (a bug in Citrix' patching of the configuration files. Reported upstream too.)
Fixed a systemd symlink that was not updated properly in previous 8.2.1 packages. Nothing to do for you, just update.

How to test

If you already have an XCP-ng 8.2.1 from previous testing, just run yum update --enablerepo=xcp-ng-staging and reboot.

Else refer to the original post.

What to test

As usual, anything that you need XCP-ng for.

We also would like you to give special focus to the following items:

UEFI VMs, without Secure Boot
UEFI VMs, with Secure Boot (check the docs. There's a manual command to run once on the pool, to download and install the certificates from Microsoft.)
On Windows installed from a not too recent image (otherwise the test is impossible), installation of update KB4535680, which updates the list of revocated certificates for Secure Boot. Should work without Secure Boot on, but we had reports of failures in this situation so I'm interested in finding a way to reproduce. Should also work with Secure Boot on.
Log rotation if you have a way to trigger very verbose logs.
The installer (installation, upgrade, backup restore...).
Active Directory connectivity, if you know how to make it work
The alternate kernel. Having it tested on a large variety of hardware would be good.
Like @Andrew in this very thread, keeping an eye on new error messages in XCP-ng logs and XOA logs can also be useful.

Time window before final release

~ one week according to my latest estimates.

nikade

Hi everyone,

I just installed this 8.2.1 a Dell R630 and the installer was very smooth.
We chose to make a new installation since the host was running XS 7.2 and we wanted a fresh install, with that came the opportunity to leverage EXT and thin provisioning which seems to work just as it should.

We're also mounting a NFS SR for VM disks which works fine as well. I'll have to wait and see, but hopefully the problem with /var/log/snmpd.log is resolved now and no more alerts regarding disk usage

olivierlambert

Thanks a lot @nikade for your feedback! (also I love your avatar!)

nikade

@olivierlambert said in XCP-ng 8.2.1 (maintenance update) - ready for testing:

Thanks a lot @nikade for your feedback! (also I love your avatar!)

Yeah, I gotta have my suit on my tux

I wanted to inform that I've tried iSCSI with multipathing as well and it works fine.
So far everything we need in our general production-environment seems to be working as it should.

Andrew

I ran the 8.2.1-test5 full installer booted by Ventoy-1.0.69 from USB on a nice ThinkPad T430 laptop to install to another USB SSD Stick. I then built XO from source on Bullseye as a VM. Devices (except wifi/BT/TPM) work fine on the old machine and performance is as expected. If you need a test "server", the old T430 can be dual or quad core and 16G memory with two SATA internal drives (supports intel VT).

I also have 8.2.1 (updated with staging) on all my other servers.

@stormi New error report: I do get a logged ERROR several times a minute in Dom0 on all host servers. There is more than enough free memory in Dom0:

xcp-rrdd-plugins.log:Feb 17 14:01:29 xcp-ng-rncjnand xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds...

stormi

@Andrew said in XCP-ng 8.2.1 (maintenance update) - ready for testing:

@stormi New error report: I do get a logged ERROR several times a minute in Dom0 on all host servers. There is more than enough free memory in Dom0:
xcp-rrdd-plugins.log:Feb 17 14:01:29 xcp-ng-rncjnand xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds...

Luckily, this one is not a new error. It's an unfortunate but benign side effect of building a stub version of Citrix' gpumon, which would require a proprietary nVIDIA toolkit to build properly.

apz

@stormi I installed a fresh host from 8.2.1 Test 5 full CD image in my homelab. This host has software RAID1 which was problematic with the last installer, now it went without issues.

Now that I've installed a newer version than what's in the repos, will this work correctly in the future regarding to updates?