XCP-ng 8.3 updates announcements and testing

Andrew

@rzr Updated and running on most systems now. Rolling pool reboot worked correctly (need to disable backups first). VMs and CR running normally.

After a large S3 backup session (that succeeded), GC on the master got stuck in a loop and failed (lock issue). A reboot of the pool master was required to resolve the problem then GC coalesced the VHDs without additional action. Other pools did not have the same problem. I have not seen that issue before, but everything seems fine now so I can't blame the update.

Greg_E

@Andrew

Why did you need to disable backups before applying this?

Andrew

@Greg_E It's not a patch/update issue, it's the rolling pool reboot. If a VM is being backed up and XO wants to migrate the VM and can't, then the host evacuate stops and the reboot does not happen.

dthenot

@Andrew Hello,

I'm also getting error on some VMs while trying to export a disk and also trying to even start some VMs from NFS (that were fine before).

Is it still a problem? It might have multiple causes. If it's still an issue, could you share the logs: /var/log/{SMlog,xensource.log,daemon.log}. It contains information that could help us investigate.

The VDI not detached cleanly means that the VDI still has a reference to the host it was running on before.
It's might be caused by the xenopsd error earlier or maybe it's the cause.

If you are sure no tapdisk are using the VDI, you can look at tap-ctl list output on each host.
You can clean this reference with the script in /opt/xensource/sm/resetvdis.py single <VDI UUID>.

Greg_E

@Andrew

Thanks, my updates and my RPU are normally at different times. I also don't backup multiple times per day like I probably should.

rzr

@ovicz said:

I get this in dmesg after the latest updates :

[   54.673443] python3[3691]: segfault at 200000 ip 00007f16eb8eca9f sp 00007ffd                                                                                                             b84e9ff0 error 4 in libpython3.6m.so.1.0[7f16eb804000+28d000]
[   54.673450] Code: 01 00 00 8d 5f ff 48 8d 2d de 3a 3c 00 c1 eb 03 44 8d 24 1b                                                                                                              4e 8b 44 e5 00 49 8b 70 10 49 39 f0 74 5f 49 8b 40 08 41 83 00 01 <48> 8b 38 48                                                                                                              85 ff 49 89 78 08 74 0d 48 83 c4 10 5b 5d 41 5c c3 0f
[   84.587661] xapi[3697]: segfault at 7f28cacaea40 ip 00007f28c6df0ec2 sp 00007                                                                                                             f289a5b8af0 error 6 in libjemalloc.so.2[7f28c6d85000+85000]
[   84.587669] Code: 48 2b 73 08 44 8b 4d 84 ba 01 00 00 00 49 83 c2 01 49 0f af                                                                                                              f1 4c 8d 0d ac 72 42 00 48 89 f1 48 c1 ee 26 48 c1 e9 20 48 d3 e2 <48> 31 54 f3                                                                                                              40 48 8b 8d 58 ff ff ff 48 8b 33 48 8d be 00 00 00 10

Hi, if possible can you share us more information to troubleshoot, like a xen-bugtool --yestoall output ?
https://docs.xcp-ng.org/troubleshooting/log-files/

If you can also do the usual hardware check (eg: memtest) that would help, because my intuition is that it is not related to this precise update but we like to be sure.

ovicz

@rzr Hi. It appeared only once, after a host reboot. If it shows again, I'll let you know.

marcoi

@rzr said:

New security update candidates for XCP-ng 8.3 LTS (kernel, xen, intel-microcode)

i tested on one of my extra pools and so far i didnt see any issues. Grated i dont do much on that pool but i created a ubuntu 26 VM without issues.

Ill wait for full release to do my main pool.

stormi

We pushed the updates to the xcp-ng-updates repository: https://xcp-ng.org/blog/2026/05/21/may-2026-updates-3-for-xcp-ng-8-3-lts/

Changed since the initial announcement, xen was updated with the proper vulnerability fix and an update to sm was added to fix an issue on LVM-based SRs with CBT enabled.

Thanks everyone for your feedback!

marcoi

@stormi said:

We pushed the updates to the xcp-ng-updates repository: https://xcp-ng.org/blog/2026/05/21/may-2026-updates-3-for-xcp-ng-8-3-lts/

Changed since the initial announcement, xen was updated with the proper vulnerability fix and an update to sm was added to fix an issue on LVM-based SRs with CBT enabled.

Thanks everyone for your feedback!

update main pool. so far so good. updates went well and vms are running.

Greg_E

Did my production this morning... Not a smooth upgrade this time.

The master took so long that the process timed out, I can't tell too much more because I was doing other work and was doing this remotely.

I handled the next host manually, and that took so much time I gathered my hearing protection and went over to the rack. It was stuck in a reboot phase but hadn't shut down for about 20 minutes. Held my finger on the power button. booted back up with a couple reboots in the process and it finally was ready.

Moved the VMs off the third host, manually triggered the update, and sat and watched looking at the VGA output to see what was happening. The reboot phase saw the XCP-ng animation progress all the way to an empty bar, but sat there another 5-10 minutes until I again held my finger on the power button to shut it off. Power up and after a bit of time it was ready again. All in all, updating three hosts took me around 2 hours today.

Here's the one condition that I think could contribute to this issue:

I have two NAS normally connected, an ISO and NFS connection on each. One of the servers is powered down for construction, but I did not disconnect it from the hosts. Could this severed connection be the reason why my updates took so long, something around not being able to purge or drain the state before the reboot?

I've disconnected those SR from the hosts, and I'll probably do a rolling pool reboot later today or next week and see if things go better.

And all that said, my XO sources is not happy with this update. XO6 isn't grabbing the data so dashboards are blank or take a long time with spinning wheels gathering the data. XO5 is immediate. One of my XO sources updated this morning, the other was yesterday so they should be pretty close to current.

olivierlambert

@Greg_E said:

I have two NAS normally connected, an ISO and NFS connection on each. One of the servers is powered down for construction, but I did not disconnect it from the hosts. Could this severed connection be the reason why my updates took so long, something around not being able to purge or drain the state before the reboot?

Don't look further, that's exactly the issue. Reboot would have occur in the end after 30 minutes (timeout) and all other operations will be extremely slow.

You must disconnect a SR for maintenance, otherwise you enter in a world of pain.

acebmxer

I have issue with rolling pool update with 1 of my 3 pools at work. It was the last pool to be updated. Host 1 updated no issues. vms stopped migrated off host 2 to complete updates.

Support ticket opened - Ticket#7758427. Found 1 vm with cpu stuck at 100% and unresponsive. Force rebooted vm and proceed updates on host2.

probain

Now receiving UUID_INVALIDwhen trying to disable CBT on a VDI.
Perhaps a result of fixing the "List index out of range"-bug?

XO Source: 5811d
Node 24

vdi.set
{
  "id": "57e0db3e-3131-40df-a620-c1118047b9d4",
  "cbt": false
}
{
  "code": "UUID_INVALID",
  "params": [
    "VDI",
    "7b179964-dec6-4e24-a13b-8c5c56efcd95"
  ],
  "call": {
    "duration": 2,
    "method": "VDI.get_by_uuid",
    "params": [
      "* session id *",
      "7b179964-dec6-4e24-a13b-8c5c56efcd95"
    ]
  },
  "message": "UUID_INVALID(VDI, 7b179964-dec6-4e24-a13b-8c5c56efcd95)",
  "name": "XapiError",
  "stack": "XapiError: UUID_INVALID(VDI, 7b179964-dec6-4e24-a13b-8c5c56efcd95)
    at XapiError.wrap (file:///opt/xen-orchestra/packages/xen-api/_XapiError.mjs:16:12)
    at file:///opt/xen-orchestra/packages/xen-api/transports/json-rpc.mjs:38:21
    at runNextTicks (node:internal/process/task_queues:65:5)
    at processImmediate (node:internal/timers:472:9)"
}

Greg_E

@olivierlambert

That's what I thought, I have it disconnected now and I'll try a rolling reboot when I'm back at work.

acebmxer

acebmxer said:

I have issue with rolling pool update with 1 of my 3 pools at work. It was the last pool to be updated. Host 1 updated no issues. vms stopped migrated off host 2 to complete updates.

Support ticket opened - Ticket#7758427. Found 1 vm with cpu stuck at 100% and unresponsive. Force rebooted vm and proceed updates on host2.

Well I think i found the source of my problems. After having continues other odd issues with this remote pool. I decided i was going to reboot everything. That's when every vm started to fail. Logged into Synology rs1221+ and it was just very sluggish and not responsive. No new error alerts or anything to explain the odd behavior. Rebooted it and even after boot still odd behavior until finally disk error. Then the system started to respond.

Luckily I have a spare drive onsite but cant gain access until Monday possibly Tuesday. Fingers crossed. Lucky for backups. Looks like the important vms had a successful backup as of yesterday so thats good.

stormi

@probain said:

Now receiving UUID_INVALIDwhen trying to disable CBT on a VDI.
Perhaps a result of fixing the "List index out of range"-bug?

Let me call @Team-Storage about this.

dthenot

@probain Hello,

It's likely linked to the List index out of range bug.
That bug was linked to the SR scan failing to introduce CBT_metatadata VDI in the XAPI database, could you try to launch a xe sr-scan uuid=<SR UUID> and try again to disable CBT?
If it does not work, could you share the /var/log/SMlog of around the time you are trying to disable CBT?

acebmxer

acebmxer said:

acebmxer said:

I have issue with rolling pool update with 1 of my 3 pools at work. It was the last pool to be updated. Host 1 updated no issues. vms stopped migrated off host 2 to complete updates.

Support ticket opened - Ticket#7758427. Found 1 vm with cpu stuck at 100% and unresponsive. Force rebooted vm and proceed updates on host2.

Well I think i found the source of my problems. After having continues other odd issues with this remote pool. I decided i was going to reboot everything. That's when every vm started to fail. Logged into Synology rs1221+ and it was just very sluggish and not responsive. No new error alerts or anything to explain the odd behavior. Rebooted it and even after boot still odd behavior until finally disk error. Then the system started to respond.

Luckily I have a spare drive onsite but cant gain access until Monday possibly Tuesday. Fingers crossed. Lucky for backups. Looks like the important vms had a successful backup as of yesterday so thats good.

Still having issues with this remote pool. Synology is still rebuilding the storage pool, but the time seems unreal to complete 80+ days. It keeps dropping and increasing... Yet I tried to migrate vm from NFS SR to local storage and vm having issues boot. Try to determining but i think i have multiple issue just not sure which ones.

probain

@dthenot said:

@probain Hello,

It's likely linked to the List index out of range bug.
That bug was linked to the SR scan failing to introduce CBT_metatadata VDI in the XAPI database, could you try to launch a xe sr-scan uuid=<SR UUID> and try again to disable CBT?
If it does not work, could you share the /var/log/SMlog of around the time you are trying to disable CBT?

I've sent you a DM for sharing the logs.. Unfortunately I "solved" the issue by deleting all snapshots related to each VM. Including CBT ones. That did make it so I could toggle CBT on the VDIs again.

But I've collected the logs for you.

This also seems like a good time to raise my suggestion to have somewhere at vates where we could upload details in a similar way to how TrueNAS does it. Suggested here: https://feedback.vates.tech/posts/69/suggesting-to-add-a-debug-file-option