XCP-ng 8.3 public alpha π
-
Hello, here my test of version 8.3 in the last weeks:
I tried to put everything I could think of in or on the machine. Different hardware and different versions of Windows and Linux, backup of XO, with and without Xen tools, etc.
So far no crashes or major problems, even if I drove the machine to the limits of its resilience, everything runs smoothly and for days, for this a thumbs up.Hardware:
Dell Poweredge 730, CPU 2x E5-2698 V4, 512GB RAM
2 x Intel I350 1Gb adapters
2 x Intel X540-AT2 10Gb adapters
1 x Dell H730P mono Raid Controller (5 x 8TB Disk in Raid5)
2 x SSD in Raid 1 as Boot Drive
1 x PCie NVMe Adapter (4 x 2TB NVMe Disk in Softraid 5)
( Yummi, over 1 GByte/sec write speed in a VM with 2 TB of Data)
1x Nvidia K80 GPU cardThis is without a doubt the biggest test machine I've ever had.
I have tested so far:
All standard functions (Copy, Move, Migrate, Snapshot etc.)
Use of GPU (Windows VM)
PCI passthrough (Windows, Linux - NetCard, USB, PCI card)
SR-IOV (see comments below)
Backup with XO
Heavy network load (copy 26TB of data via 10 GB netcard)
Heavy CPU and GPU load (8 VM with CPU at maximum for hours)
Fast copying of large data between the different SRs in the system.With the exception of the SR-IOV, no problems were encountered and the performance was excellent in all respects.
What made me very happy was the installation of Xen-Tools under Windows 2022. I have often had the experience that after a Windows update the server no longer wanted to start due to a driver update. The problem seems to have disappeared completely, all drivers were installed automatically when the server was installed and have so far survived all updates from Microsoft without any problems. I only had to manually add the management agent.
SR IOV:
The two Intel I350 1Gb adapters no longer show up as SR-IOV adapters and have lost that functionality. They still worked under 8.2.
The Intel X540-AT2 adapters have the SR-IOV function. But when I use it, the adapter port shuts down after a short time. The Xen server still shows the card as connectet, but the network function is gone. The coupled switch shows the port as deactivated. In between, the network function is there again for a short time and the switch also shows this. The second 10 GB port runs error-free all the time. If I switch off the SR-IOV of the port, it works without any problems. Both as a normal Xen-Nic and in PCI passthrough. I copied TByte over the port, no errors.
It must be somehow due to the SR-IOV that apparently no longer works under 8.3.
I would be interested to know if others have experienced something similar or if everything works there.I would like to test the "VM snapshot with disk exclusion" but somehow I can't do it. Both the snapshot and the XO always back up the entire VM with all disks. I'm sure it's error 50 (50cm in front of the keyboard) . Is there a detailed description of how to set it up somewhere?
Unfortunately I haven't been able to test any pool functions yet, so I have to set up a second machine for that. I can't get a second machine that size. I will probably have to build 2 smaller systems with shared storage and test them there.
If anyone thinks of anything else I could test, let me know
So far everything looks very good, you did a great job.
Greetings Joerg
-
Thanks @jhansen for all your test work done on this alpha release!
- SR IOV: that's interesting, we will probably get that feedback to XS team too, I wonder why it's broken
- VM snap disk exclusion: that should be straight forward, but I'll try to see if I can reproduce it myself.
Thanks again!
-
@olivierlambert said in XCP-ng 8.3 public alpha :
snap disk exclusion
Thank you too.
I use the SR-IOV under 8.2 and it work there on the same netcard. I don't think these are suddenly broken.A description for the snap disk exclusion would be great, I hate error 50
-
In theory
[NOBAK]
in the disk name should be enough but I didn't test it myself -
@olivierlambert
Hmmm. Tried rename the Disk "SRV-File-Debian-10.10" in "[NOBAK] SRV-File-Debian-10.10"
It does not work! -
Okay thanks, I'll take a look in my home lab and see if I can reproduce
-
@jhansen About the disk excluded: I can confirm it's not a PEBKAC. At least, not you or myself, since I can reproduce the problem. I can't find anything in XO code base despite we announced it, so maybe the issue is a PEBKAC elsewhere in the XO team I'll keep you posted!
-
@ajpri1998 said in XCP-ng 8.3 public alpha :
I have a minor feature requestβ¦
Can we get the xen-cmdline (/opt/xensource/libexec/xen-cmdline) added to the default PATH? I donβt use it too often but having it would save me a google remembering? Iβve also added to my bashrc with the name xcl.I would rather have it symlinked from /usr/sbin or such rather than altering the default PATH. However, it's minor enough for now for me not to want to diverge from XenServer on this, so ideally we should try to push this change upstream. Unfortunately, this is a packaging matter and it's not the most open part of XenServer, so contributing there is hard. Not impossible, but requiring efforts that may be above the gain, here.
If you want, you could still open an enhancement request on XCP-ng's bugtracker. Maybe others will join you and at some point maybe the gain will be bigger than the effort
Sorry, sometimes we must be lazy to keep time for bigger topics.
-
Great relief, I was beginning to have doubts about myself
-
@jhansen said in XCP-ng 8.3 public alpha :
Great relief, I was beginning to have doubts about myself
So, it works but ONLY for snap taken during a backup, not a manual snapshot
-
@olivierlambert
I'm trying that this weekend.
Thanks in advance. -
@olivierlambert
Here as promised a test of the [NOBAK] function.About the SR:
NVME Raid
Size: 2.4 TB used of 5.5 TB total (2.9 TB allocated)About "SRV-Test" VM:
0 "SRV-Test" "NVME Raid" 50GB /dev/xvda
1 "[NOBAK] SRV-Test Drive" "NVME Raid" 1.5 TB /dev/xvdbManual snapshot takes both drives and ignores the [NOBAK]
Backup via Xo works, only the first drive is backed up.
You can follow this under TASKS in the XO.
Under "File Restore" in the XO there are no data for the second drive.So beautiful - so good. The idea would actually be to back up a server with very large drives partially.
So I change my VM to the following:
0 "SRV-Test" "NVME Raid" 50GB /dev/xvda
1 "[NOBAK] SRV-Test Drive" "NVME Raid" 1.5 TB /dev/xvdb
2 "[NOBAK] SRV-Test second test drive" NVME raid 2 TB /dev/xvdcMy SR looks know:
NVME Raid
4.4 TB used of 5.5 TB total (4.9 TB allocated)Exactly the same result as the first test, both with the manual snapshot and with the XO backup.
Actually, the manual snapshot should not work at all, because there is not enough space on the SR to include all the drives, but due to thin provisioning it works because the drives are almost empty.Now I'm filling the drives with data so I'm sure there isn't enough space on the SR for a full snapshot of all the data.
As expected, I get the error message "The specified storage repository has insufficient space" when taking a manual snapshot Everything is understandable since the manual snapshot saves the entire VM and there is no space for the 3.5 TB snapshot on the SR.
Now I make another backup with XO, which previously only backed up the 50 GB of the first drive.
UPS ... After a few seconds, the error "SR_BACKEND_FAILURE_44(, There is insufficient space, )" and the backup was aborted.So, XO seems to first test whether a complete backup/snapshot can be created with all drives and then backs up the 50 GB of the first drive.
In my opinion a bit suboptimal because it doesn't solve the problem to always have twice the space on the SR as you used, even if I only want to back up the system disk of a VM with a few GB, for example.Conclusion of the test:
A) Manual Snapshot with only part of drives ([NOBAK]), does not work.
B) Backup with XO with only part of the drives ([NOBAK]), works, but only if there is enough space for a complete snapshot of the VM on the SR.Hope you can use with my little test for something.
-
Do you mean that the VM snapshot space is computing with all disks despite we decided to ignore one? If that's the issue, then it's a XAPI bug/something that we should add in the first place
-
@olivierlambert
Exactly, it looks like the snapshot of the entire VM is always calculated first. If there is enough space then only the selected drive is taken in the backup, if not the Backup is aborted.
In other words, only if you can take a manual snapshot of the VM, XO's backup will then only backup the selected drive.
It would be good if the backup only checked whether there was enough space for a snapshot of the selected drive. -
Okay so it's not on XO, but on XAPI then. @psafont and @BenjiReis I assume it's a basic check in VM.snapshot that's not doing the right computation since we can ignore disk
-
@olivierlambert
Seems like that to me too. Let me know if you found a solution. I leave the test VM on my Xen server so that I can test again at any time. -
@olivierlambert Snapshot is in essence a VDI clone, I don't see any checks being done before the filtering for ignored VDIs is done. And that is done pretty early on, not sure why there are operations affecting virtual block devices from ignored VDIs: https://github.com/xapi-project/xen-api/blob/master/ocaml/xapi/xapi_vm_clone.ml#L416
-
I will try to reproduce it locally with
xe
and keeping you posted, thanks! -
Okay so it works with
xe
. I'll see with XO doing a backup now. -
I can't reproduce the bug with XO. Here is my test setup @jhansen :
1x VM with 2 disks:
- 1x 10GiB disk on a SR with plenty of space
- 1x [NOBAK] 900GiB local LVM disk (thick) on a 1TiB SR (you can't snapshot the disk without SR disk space failure)
I did a delta backup with XO and it worked.