XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    XCP-ng 8.3 updates announcements and testing

    Scheduled Pinned Locked Moved News
    582 Posts 52 Posters 286.2k Views 73 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • A Offline
      Andrew Top contributor @rzr
      last edited by

      @rzr Updated and running on most systems now. Rolling pool reboot worked correctly (need to disable backups first). VMs and CR running normally.

      After a large S3 backup session (that succeeded), GC on the master got stuck in a loop and failed (lock issue). A reboot of the pool master was required to resolve the problem then GC coalesced the VHDs without additional action. Other pools did not have the same problem. I have not seen that issue before, but everything seems fine now so I can't blame the update.

      G 1 Reply Last reply Reply Quote 2
      • G Offline
        Greg_E @Andrew
        last edited by

        @Andrew

        Why did you need to disable backups before applying this?

        A 1 Reply Last reply Reply Quote 1
        • A Offline
          Andrew Top contributor @Greg_E
          last edited by

          @Greg_E It's not a patch/update issue, it's the rolling pool reboot. If a VM is being backed up and XO wants to migrate the VM and can't, then the host evacuate stops and the reboot does not happen.

          G 1 Reply Last reply Reply Quote 1
          • dthenotD Offline
            dthenot Vates 🪐 XCP-ng Team @Andrew
            last edited by dthenot

            @Andrew Hello,

            I'm also getting error on some VMs while trying to export a disk and also trying to even start some VMs from NFS (that were fine before).

            Is it still a problem? It might have multiple causes. If it's still an issue, could you share the logs: /var/log/{SMlog,xensource.log,daemon.log}. It contains information that could help us investigate.

            The VDI not detached cleanly means that the VDI still has a reference to the host it was running on before.
            It's might be caused by the xenopsd error earlier or maybe it's the cause.

            If you are sure no tapdisk are using the VDI, you can look at tap-ctl list output on each host.
            You can clean this reference with the script in /opt/xensource/sm/resetvdis.py single <VDI UUID>.

            1 Reply Last reply Reply Quote 2
            • G Offline
              Greg_E @Andrew
              last edited by

              @Andrew

              Thanks, my updates and my RPU are normally at different times. I also don't backup multiple times per day like I probably should.

              1 Reply Last reply Reply Quote 1
              • rzrR Offline
                rzr Vates 🪐 XCP-ng Team @ovicz
                last edited by rzr

                @ovicz said:

                I get this in dmesg after the latest updates :

                [   54.673443] python3[3691]: segfault at 200000 ip 00007f16eb8eca9f sp 00007ffd                                                                                                             b84e9ff0 error 4 in libpython3.6m.so.1.0[7f16eb804000+28d000]
                [   54.673450] Code: 01 00 00 8d 5f ff 48 8d 2d de 3a 3c 00 c1 eb 03 44 8d 24 1b                                                                                                              4e 8b 44 e5 00 49 8b 70 10 49 39 f0 74 5f 49 8b 40 08 41 83 00 01 <48> 8b 38 48                                                                                                              85 ff 49 89 78 08 74 0d 48 83 c4 10 5b 5d 41 5c c3 0f
                [   84.587661] xapi[3697]: segfault at 7f28cacaea40 ip 00007f28c6df0ec2 sp 00007                                                                                                             f289a5b8af0 error 6 in libjemalloc.so.2[7f28c6d85000+85000]
                [   84.587669] Code: 48 2b 73 08 44 8b 4d 84 ba 01 00 00 00 49 83 c2 01 49 0f af                                                                                                              f1 4c 8d 0d ac 72 42 00 48 89 f1 48 c1 ee 26 48 c1 e9 20 48 d3 e2 <48> 31 54 f3                                                                                                              40 48 8b 8d 58 ff ff ff 48 8b 33 48 8d be 00 00 00 10
                

                Hi, if possible can you share us more information to troubleshoot, like a xen-bugtool --yestoall output ?
                https://docs.xcp-ng.org/troubleshooting/log-files/

                If you can also do the usual hardware check (eg: memtest) that would help, because my intuition is that it is not related to this precise update but we like to be sure.

                O 1 Reply Last reply Reply Quote 1
                • O Offline
                  ovicz @rzr
                  last edited by

                  @rzr Hi. It appeared only once, after a host reboot. If it shows again, I'll let you know.

                  1 Reply Last reply Reply Quote 2
                  • marcoiM Offline
                    marcoi
                    last edited by

                    @rzr said:

                    New security update candidates for XCP-ng 8.3 LTS (kernel, xen, intel-microcode)

                    i tested on one of my extra pools and so far i didnt see any issues. Grated i dont do much on that pool but i created a ubuntu 26 VM without issues.

                    Ill wait for full release to do my main pool.

                    1 Reply Last reply Reply Quote 2
                    • stormiS Offline
                      stormi Vates 🪐 XCP-ng Team
                      last edited by stormi

                      We pushed the updates to the xcp-ng-updates repository: https://xcp-ng.org/blog/2026/05/21/may-2026-updates-3-for-xcp-ng-8-3-lts/

                      Changed since the initial announcement, xen was updated with the proper vulnerability fix and an update to sm was added to fix an issue on LVM-based SRs with CBT enabled.

                      Thanks everyone for your feedback!

                      marcoiM 1 Reply Last reply Reply Quote 2
                      • marcoiM Offline
                        marcoi @stormi
                        last edited by

                        @stormi said:

                        We pushed the updates to the xcp-ng-updates repository: https://xcp-ng.org/blog/2026/05/21/may-2026-updates-3-for-xcp-ng-8-3-lts/

                        Changed since the initial announcement, xen was updated with the proper vulnerability fix and an update to sm was added to fix an issue on LVM-based SRs with CBT enabled.

                        Thanks everyone for your feedback!

                        update main pool. so far so good. updates went well and vms are running.

                        1 Reply Last reply Reply Quote 2
                        • G Offline
                          Greg_E
                          last edited by

                          Did my production this morning... Not a smooth upgrade this time.

                          The master took so long that the process timed out, I can't tell too much more because I was doing other work and was doing this remotely.

                          I handled the next host manually, and that took so much time I gathered my hearing protection and went over to the rack. It was stuck in a reboot phase but hadn't shut down for about 20 minutes. Held my finger on the power button. booted back up with a couple reboots in the process and it finally was ready.

                          Moved the VMs off the third host, manually triggered the update, and sat and watched looking at the VGA output to see what was happening. The reboot phase saw the XCP-ng animation progress all the way to an empty bar, but sat there another 5-10 minutes until I again held my finger on the power button to shut it off. Power up and after a bit of time it was ready again. All in all, updating three hosts took me around 2 hours today.

                          Here's the one condition that I think could contribute to this issue:

                          I have two NAS normally connected, an ISO and NFS connection on each. One of the servers is powered down for construction, but I did not disconnect it from the hosts. Could this severed connection be the reason why my updates took so long, something around not being able to purge or drain the state before the reboot?

                          I've disconnected those SR from the hosts, and I'll probably do a rolling pool reboot later today or next week and see if things go better.

                          And all that said, my XO sources is not happy with this update. XO6 isn't grabbing the data so dashboards are blank or take a long time with spinning wheels gathering the data. XO5 is immediate. One of my XO sources updated this morning, the other was yesterday so they should be pretty close to current.

                          olivierlambertO 1 Reply Last reply Reply Quote 2
                          • olivierlambertO Online
                            olivierlambert Vates 🪐 Co-Founder CEO @Greg_E
                            last edited by

                            @Greg_E said:

                            I have two NAS normally connected, an ISO and NFS connection on each. One of the servers is powered down for construction, but I did not disconnect it from the hosts. Could this severed connection be the reason why my updates took so long, something around not being able to purge or drain the state before the reboot?

                            Don't look further, that's exactly the issue. Reboot would have occur in the end after 30 minutes (timeout) and all other operations will be extremely slow.

                            You must disconnect a SR for maintenance, otherwise you enter in a world of pain.

                            G 1 Reply Last reply Reply Quote 3
                            • acebmxerA Online
                              acebmxer
                              last edited by acebmxer

                              I have issue with rolling pool update with 1 of my 3 pools at work. It was the last pool to be updated. Host 1 updated no issues. vms stopped migrated off host 2 to complete updates.

                              Support ticket opened - Ticket#7758427. Found 1 vm with cpu stuck at 100% and unresponsive. Force rebooted vm and proceed updates on host2.

                              acebmxerA 1 Reply Last reply Reply Quote 2
                              • P Offline
                                probain
                                last edited by probain

                                Now receiving UUID_INVALIDwhen trying to disable CBT on a VDI.
                                Perhaps a result of fixing the "List index out of range"-bug?

                                XO Source: 5811d
                                Node 24

                                vdi.set
                                {
                                  "id": "57e0db3e-3131-40df-a620-c1118047b9d4",
                                  "cbt": false
                                }
                                {
                                  "code": "UUID_INVALID",
                                  "params": [
                                    "VDI",
                                    "7b179964-dec6-4e24-a13b-8c5c56efcd95"
                                  ],
                                  "call": {
                                    "duration": 2,
                                    "method": "VDI.get_by_uuid",
                                    "params": [
                                      "* session id *",
                                      "7b179964-dec6-4e24-a13b-8c5c56efcd95"
                                    ]
                                  },
                                  "message": "UUID_INVALID(VDI, 7b179964-dec6-4e24-a13b-8c5c56efcd95)",
                                  "name": "XapiError",
                                  "stack": "XapiError: UUID_INVALID(VDI, 7b179964-dec6-4e24-a13b-8c5c56efcd95)
                                    at XapiError.wrap (file:///opt/xen-orchestra/packages/xen-api/_XapiError.mjs:16:12)
                                    at file:///opt/xen-orchestra/packages/xen-api/transports/json-rpc.mjs:38:21
                                    at runNextTicks (node:internal/process/task_queues:65:5)
                                    at processImmediate (node:internal/timers:472:9)"
                                }
                                
                                stormiS dthenotD 2 Replies Last reply Reply Quote 0
                                • G Offline
                                  Greg_E @olivierlambert
                                  last edited by

                                  @olivierlambert

                                  That's what I thought, I have it disconnected now and I'll try a rolling reboot when I'm back at work.

                                  1 Reply Last reply Reply Quote 0
                                  • acebmxerA Online
                                    acebmxer @acebmxer
                                    last edited by acebmxer

                                    acebmxer said:

                                    I have issue with rolling pool update with 1 of my 3 pools at work. It was the last pool to be updated. Host 1 updated no issues. vms stopped migrated off host 2 to complete updates.

                                    Support ticket opened - Ticket#7758427. Found 1 vm with cpu stuck at 100% and unresponsive. Force rebooted vm and proceed updates on host2.

                                    Well I think i found the source of my problems. After having continues other odd issues with this remote pool. I decided i was going to reboot everything. That's when every vm started to fail. Logged into Synology rs1221+ and it was just very sluggish and not responsive. No new error alerts or anything to explain the odd behavior. Rebooted it and even after boot still odd behavior until finally disk error. Then the system started to respond.

                                    Luckily I have a spare drive onsite but cant gain access until Monday possibly Tuesday. Fingers crossed. Lucky for backups. Looks like the important vms had a successful backup as of yesterday so thats good.

                                    acebmxerA 1 Reply Last reply Reply Quote 1
                                    • stormiS Offline
                                      stormi Vates 🪐 XCP-ng Team @probain
                                      last edited by

                                      @probain said:

                                      Now receiving UUID_INVALIDwhen trying to disable CBT on a VDI.
                                      Perhaps a result of fixing the "List index out of range"-bug?

                                      Let me call @Team-Storage about this.

                                      1 Reply Last reply Reply Quote 0
                                      • dthenotD Offline
                                        dthenot Vates 🪐 XCP-ng Team @probain
                                        last edited by

                                        @probain Hello,

                                        It's likely linked to the List index out of range bug.
                                        That bug was linked to the SR scan failing to introduce CBT_metatadata VDI in the XAPI database, could you try to launch a xe sr-scan uuid=<SR UUID> and try again to disable CBT?
                                        If it does not work, could you share the /var/log/SMlog of around the time you are trying to disable CBT?

                                        P 1 Reply Last reply Reply Quote 0
                                        • acebmxerA Online
                                          acebmxer @acebmxer
                                          last edited by

                                          acebmxer said:

                                          acebmxer said:

                                          I have issue with rolling pool update with 1 of my 3 pools at work. It was the last pool to be updated. Host 1 updated no issues. vms stopped migrated off host 2 to complete updates.

                                          Support ticket opened - Ticket#7758427. Found 1 vm with cpu stuck at 100% and unresponsive. Force rebooted vm and proceed updates on host2.

                                          Well I think i found the source of my problems. After having continues other odd issues with this remote pool. I decided i was going to reboot everything. That's when every vm started to fail. Logged into Synology rs1221+ and it was just very sluggish and not responsive. No new error alerts or anything to explain the odd behavior. Rebooted it and even after boot still odd behavior until finally disk error. Then the system started to respond.

                                          Luckily I have a spare drive onsite but cant gain access until Monday possibly Tuesday. Fingers crossed. Lucky for backups. Looks like the important vms had a successful backup as of yesterday so thats good.

                                          Still having issues with this remote pool. Synology is still rebuilding the storage pool, but the time seems unreal to complete 80+ days. It keeps dropping and increasing... Yet I tried to migrate vm from NFS SR to local storage and vm having issues boot. Try to determining but i think i have multiple issue just not sure which ones.

                                          1 Reply Last reply Reply Quote 0
                                          • P Offline
                                            probain @dthenot
                                            last edited by

                                            @dthenot said:

                                            @probain Hello,

                                            It's likely linked to the List index out of range bug.
                                            That bug was linked to the SR scan failing to introduce CBT_metatadata VDI in the XAPI database, could you try to launch a xe sr-scan uuid=<SR UUID> and try again to disable CBT?
                                            If it does not work, could you share the /var/log/SMlog of around the time you are trying to disable CBT?

                                            I've sent you a DM for sharing the logs.. Unfortunately I "solved" the issue by deleting all snapshots related to each VM. Including CBT ones. That did make it so I could toggle CBT on the VDIs again.

                                            But I've collected the logs for you.

                                            This also seems like a good time to raise my suggestion to have somewhere at vates where we could upload details in a similar way to how TrueNAS does it. Suggested here: https://feedback.vates.tech/posts/69/suggesting-to-add-a-debug-file-option

                                            1 Reply Last reply Reply Quote 0

                                            Hello! It looks like you're interested in this conversation, but you don't have an account yet.

                                            Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.

                                            With your input, this post could be even better 💗

                                            Register Login
                                            • First post
                                              Last post