XCP-ng 8.3 betas and RCs feedback 🚀

stormi

Given the diversity of issues reported above, and the fact that XCP-ng 8.3 is now released, I strongly suggest you all create new topics in the forum. You can reference them once in this thread to help them get attention, no problem with that.

stormi

@jhansen said in XCP-ng 8.3 betas and RCs feedback :

My 8.2.1 servers were still running on BIOS boot. When updating to 8.3 RC I got the message that there was still a DOS table on the hard drive and the update from the USB stick was aborted.
A newly installed 8.3 server with UEFI cannot be integrated into the old 8.2 pool.

Here is a small workaround:
2 USB sticks, one with 8.2.1 and one with 8.3 RC.
Take one server out of the pool, not the master.
Reinstall with UEFI on 8.2.1.
Integrate server into the old pool.
make server as new pool master.
The server can now be updated to 8.3 without any problems, the master is on UEFI and all pool information and VM are on 8.3.
Now just take all other servers out of the pool, do a clean 8.3 UEFI installation and put them back into the pool.

Let's add that there must be no VMs on local storage on the server you reinstall, of course. Other than that, I like this workaround!

CC @yann, do you think it's worth adding this to the release notes?

jhansen

@stormi
Sorry, I was a bit confused this morning when I saw all the errors. Of course it wasn't an update to 8.3 RC but to 8.3.0 Official.
Very strange thing, I checked my test lab server, it's the only pool that doesn't have the error. Full backup and several delta backups, all OK.
The only difference between this and the other pools is that this one was running with 8.3 RC2 and was only updated to 8.3.0 official with yum update.
I have now cleaned a faulty pool, removed all snapshots, removed the NDB and CBT in the backup and removed the NBD interface in the pool. I'm currently doing a full backup on this pool and will then do several deltas to narrow down the error a bit.
Regards, Joerg

Ajmind 0

@Finallf

already done but with same result.

jhansen

@stormi
Of course, you shouldn't have a VM on the local storage.
It's also a good idea to write down the IPs, MACs and the number of the NICs beforehand. It can happen during the new installation that the order of the network cards changes.
Final tip: if you use iSCSI storage and use the iSCSI IQN as identification, you should also write this down beforehand and set it on the newly installed server before you put the server in the pool, otherwise you will have problems with the iSCSI storage.

jhansen

@stormi
Test with removed the NDB and CBT in the backup and removed the NBD interface in the pool.
Fullbackup Okay
Multiple delta backups all Okay no errors.
The error lies in the NBD or CBT.
But why not on the pool updated with yum from 8.3 RC2 ????

Can someone create a new topic?

Tristis Oris

@jhansen yum update not intended.

jhansen

@Tristis-Oris Not in your case?

In the Release Information about 8.3 you can read:
"Users who installed a prerelease of XCP-ng 8.3 must upgrade to the final 8.3.0 version using the installation ISO image. The only exception is for users who installed XCP-ng 8.3 RC2 or have already upgraded to RC2 using the installation ISO image. These users can simply update their system without needing the ISO image."

Tristis Oris

@jhansen ah. maybe only this way.

jhansen

Status quo:
I have now reduced the problem to the NBD.
With an NBD full backup, everything seems to be working.
With a delta backup, NBD leaves snapshots that are blocked by the control domain of the various Xen servers. These can only be disconnected/forgotten and then deleted after a reboot of the hole pool or individual reboot of all Xen servers in the pool. The last option at least gives you the chance of a reboot without shutting down all VMs, if the pool has several Xen servers. But all servers in the pool must be rebooted.
Unfortunately, an xe-tool restart is not enough.
The activation/deactivation of CBT does not matter.
All backups without NBD run without errors.
This error occurs on all Xen servers installed from USB Image 8.3 officially. My lab pool, which was updated from 8.3RC2 to 8.3 Official via yum, doesn't seem to have the problem. 1 full backup and 4 delta backups ran without errors and without blocked snapshots.

As I urgently need a current backup, I deactivated all NBD backups and set the server interface back to NO NBD. The standard delta backups are now running without errors again.

I left my LAB pool on NBD with CBT to see if the error occurs there in the next few days.

I actually wanted to create a new topic for the problem, but I'm not really sure which category I should put it under.

Tristis Oris

i also removed one 8.3 pool from nbd backup, create the usual one. Looks it work fine.

Mathieu

Hello,

Indeed, disabling NBD for the delta backups, they are no more VDI attached to Dom0.

My upgrade path was from 8.2.1 stable to 8.3 using the ISO installation.

jhansen

@Tristis-Oris @Mathieu
Hello, I'm glad I could help you, but it would be nice if the NBD worked. I was able to save a lot of space on the storage with NBD and CBT. I hope that in the not too distant future it will work again like in 8.2.1
My 8.3 LAB pool, updated with yum, made an error-free delta backup with NBD and CBT last night, just like my last 8.2.1 pool.
regards Joerg

olivierlambert

Since you are able to reproduce consistently the issue, it might be interesting to get more logs/debug to find the root cause Pinging @dthenot and all relevant people that could help in here

yann

@Ajmind-0 there may be more details in the log a bit before those last lines, and they would not be visible because of the screen size and volume of logs. You can switch to the console with a shell and have a look into /tmp/install-log, there may be some more useful info in there.

jhansen

@olivierlambert
Unfortunately, I don't have much time in the next few days.
On Sunday I will reset one of the pools with the error to NBD / CBT Delta Backup and then pull the logs and compare them with the one from the working 9.3. Let's see if I can see anything there. If not, I will send you the logs. I will also take a closer look at which processes are stuck there. It looks to me like there are ghost processes running there that don't terminate but seem to continue to block the snapshot. That also seems to be the reason why the snapshot cannot be deleted, except after a reboot.
If I have time, I will also set up a new pool with 9.3 RC2 and then update it to 9.3 officially via yum, just to see if it can then also do error-free NBD deltas.
Regards Joerg

dthenot

@jhansen
Hello,
I created a thread about the NBD issue where VBD are left connected to Dom0.
I added what I already know about the situation on my side.
Anyone observing the error can help us by sharing what they observed in the thread.

https://xcp-ng.org/forum/topic/9864/vdi-staying-connected-to-dom0-when-using-nbd-backups

Thanks

Ajmind 0

@yann

the systems freezes after the last screen and it is diffult to catch the log just before the freeze...

yann

@Ajmind-0 does it still freeze if you add atexit=shell on the linux commandline? In some error situations the installer attempts to reboot the system after a few seconds, this will make it drop into a shell instead.

Finallf

@Ajmind-0 try using the save option at the beginning of the installation.