Hello,
I'm a new developer on XCP-ng, I'll work on the Xen side to improve performance.
I'm a newly graduated of University of Versailles Saint-Quentin with a specialty in parallel computing and HPC, I have a big interest in operating systems.
Hello,
I'm a new developer on XCP-ng, I'll work on the Xen side to improve performance.
I'm a newly graduated of University of Versailles Saint-Quentin with a specialty in parallel computing and HPC, I have a big interest in operating systems.
Hello,
As some of you may know, there is currently a problem with disks with blocksize of 4KiB not being compatible to be a SR disk.
It is an error with the vhd-util
utilities that is not easily fixed.
As such, we quickly developed a SMAPI driver using losetup
ability to emulate another sector size to be able to workaround the problem for the moment.
The real solution will involve SMAPIv3, which the first driver is available to test: https://xcp-ng.org/blog/2024/04/19/first-smapiv3-driver-is-available-in-preview/
To go back to the LargeBlock driver, it is available in 8.3 in sm 3.0.12-12.2.
To set it up, it is as simple as creating a EXT SR with xe
CLI but with type=largeblock
.
xe sr-create host-uuid=<host UUID> type=largeblock name-label="LargeBlock SR" device-config:device=/dev/nvme0n1
It does not support using multiple devices because of quirks with LVM and the EXT SR driver.
It automatically creates a loop device with a sector size of 512b on top of the 4KiB device and then creates a EXT SR on top of this emulated device.
This driver is a workaround, we have automated tests but they can't catch all things.
If you have any feedbacks or problems, don't hesitate to share here
@ph7 It's only enabled for the two yum command with the --enablerepo
explicitly used.
It's disabled in the config otherwise.
No need to do anything
@gb.123 Hello,
The instruction in the first post are still the way to go
For people testing the QCOW2 preview, please be informed that you need to update with the QCOW2 repo enabled, if you install the new non QCOW2 version, you risk QCOW2 VDI being dropped from XAPI database until you have installed it and re-scanned the SR.
Dropping from XAPI means losing name-label, description and worse, the links to a VM for these VDI.
There should be a blktap, sm and sm-fairlock update of the same version as above in the QCOW2 repo.
If you have correctly added the QCOW2 repo linked here: https://xcp-ng.org/forum/post/90287
You can update like this:
yum clean metadata --enablerepo=xcp-ng-testing,xcp-ng-qcow2
yum update --enablerepo=xcp-ng-testing,xcp-ng-qcow2
reboot
blktap
: 3.55.4-1.1.0.qcow2.1.xcpng8.3sm
: 3.2.12-3.1.0.qcow2.1.xcpng8.3@jhansen
Hello,
I created a thread about the NBD issue where VBD are left connected to Dom0.
I added what I already know about the situation on my side.
Anyone observing the error can help us by sharing what they observed in the thread.
https://xcp-ng.org/forum/topic/9864/vdi-staying-connected-to-dom0-when-using-nbd-backups
Thanks
Hello again,
It is now available in 8.2.1 with the testing packages, you can install them by enabling the testing repository and updating.
Available in sm 2.30.8-10.2.
yum update --enablerepo=xcp-ng-testing sm xapi-core xapi-xe xapi-doc
You then need to restart the toolstack.
Afterwards, you can create SR with the command in the above post.
@olivierlambert @S-Pam Indeed, it's normal, Dom0 doesn't see the NUMA information and the hypervisor handle the compute and memory allocation. You can see the wiki about manipulating VM allocation with the NUMA architecture if you want. But in normal use-cases it's not worth the effort.
@bufanda You just need to make sure to have a sm and blktap qcow2 version.
Otherwise, having a normal sm version would drop the QCOW2 VDI from the XAPI database and you would lose VBD to VM aswell as name of VDI.
So it could be painful depending on how much you have
But in the case, you would install a non QCOW2 sm version, you would only lose the QCOW2 from DB, those would not be deleted or anything. Reinstalling a QCOW2 version then rescanning the SR would make them re-appear. But then you would have to identify them again (lost name-label) and relink them to VM.
We try to keep our QCOW2 version on top of the testing branch of XCP-ng but we could miss it
@bufanda Hello,
There is equivalent sm
packages in the qcow2 repo for testing, XAPI will be coming soon.
You can update while enabling the QCOW2 repo to get the sm and blktap QCOW2 version and get the XAPI version letter if you want.
@JeffBerntsen That's why I meant, the way to install written in the first post still work in 8.3, the script still work as expected also, it basically only create the VG/LV needed on hosts before you create the SR.
@gb.123 Hello,
The instruction in the first post are still the way to go
@nvoss said in VDI Chain on Deltas:
What would make the force restart work when the scheduled regular runs dont?
I'm not sure what you mean.
The backup need to do a snapshot to have a point to compare before exporting data.
This snapshot will create a new level of VHD that would need to be coalesced, but it's limiting the number of VHD in the chain so it fails.
This is caused by the fact that the garbage collector can't run because it can't edit the corrupted VDI.
Since there is a corrupted VDI it's not running to not create more problem on the VDI chains.
Sometime corruption mean that we don't know if a VHD has any parent for example, and if doing so we can't know what the chain looks like meaning not knowing what VHD are in what chain in the SR (Storage Repository).
VDI: Virtual Disk Image in this context
VHD being the format of VDI we use at the moment in XCP-ng
After removing the corrupted VDI, maybe automatically by the migration process (maybe you'll have to do it by hand), you can run a sr-scan
on the SR and it launch the GC again.
@nvoss No, the GC is blocked because only one VDI is corrupted, the one with the check.
All other VDI are on a long chain because they couldn't coalesce.
Sorry, BATMAP is the block allocation table, it's the info of the VHD to know which block exist locally.
Migrating the VDI might work indeed, I can't really be sure.
@nvoss The VHD is reported corrupted on the batmap. You can try to repair it with vhd-util repair
but it'll likely not work.
I have seen people recover from this kind of error by doing a vdi-copy.
You could try a VM copy or a VDI copy and link the VDI to the VM again and see if it's alright.
The corrupted VDI is blocking the garbage collector so the chain are long and that's the error you see on XO side.
It might be needed to remove the chain by hand to resolve the issue.
@nvoss Could you try to run vhd-util check -n /var/run/sr-mount/f23aacc2-d566-7dc6-c9b0-bc56c749e056/3a3e915f-c903-4434-a2f0-cfc89bbe96bf.vhd
?
@nvoss Hello, The UNDO LEAF-COEALESCE
usually has a cause that is listed in the error above it. Could you share this part please?