dthenot

dthenot

Hello,

I'm a new developer on XCP-ng, I'll work on the Xen side to improve performance.
I'm a newly graduated of University of Versailles Saint-Quentin with a specialty in parallel computing and HPC, I have a big interest in operating systems.

dthenot

Hello,

As some of you may know, there is currently a problem with disks with blocksize of 4KiB not being compatible to be a SR disk.
It is an error with the vhd-util utilities that is not easily fixed.
As such, we quickly developed a SMAPI driver using losetup ability to emulate another sector size to be able to workaround the problem for the moment.

The real solution will involve SMAPIv3, which the first driver is available to test: https://xcp-ng.org/blog/2024/04/19/first-smapiv3-driver-is-available-in-preview/

To go back to the LargeBlock driver, it is available in 8.3 in sm 3.0.12-12.2.

To set it up, it is as simple as creating a EXT SR with xe CLI but with type=largeblock.

xe sr-create host-uuid=<host UUID> type=largeblock name-label="LargeBlock SR" device-config:device=/dev/nvme0n1

It does not support using multiple devices because of quirks with LVM and the EXT SR driver.

It automatically creates a loop device with a sector size of 512b on top of the 4KiB device and then creates a EXT SR on top of this emulated device.

This driver is a workaround, we have automated tests but they can't catch all things.
If you have any feedbacks or problems, don't hesitate to share here

dthenot

@ph7 It's only enabled for the two yum command with the --enablerepo explicitly used.
It's disabled in the config otherwise.
No need to do anything

dthenot

@gb.123 Hello,
The instruction in the first post are still the way to go

dthenot

@manilx Hi,

yum install plug-late-sr

Should do the trick to install it

dthenot

@olivierlambert In 8.2 yes, linstor sm version is separated, it's not the case in 8.3 anymore.

dthenot

@olivierlambert I am

dthenot

For people testing the QCOW2 preview, please be informed that you need to update with the QCOW2 repo enabled, if you install the new non QCOW2 version, you risk QCOW2 VDI being dropped from XAPI database until you have installed it and re-scanned the SR.
Dropping from XAPI means losing name-label, description and worse, the links to a VM for these VDI.
There should be a blktap, sm and sm-fairlock update of the same version as above in the QCOW2 repo.

If you have correctly added the QCOW2 repo linked here: https://xcp-ng.org/forum/post/90287

You can update like this:

yum clean metadata --enablerepo=xcp-ng-testing,xcp-ng-qcow2
yum update --enablerepo=xcp-ng-testing,xcp-ng-qcow2
reboot

Versions:

blktap: 3.55.4-1.1.0.qcow2.1.xcpng8.3
sm: 3.2.12-3.1.0.qcow2.1.xcpng8.3

dthenot

@jhansen
Hello,
I created a thread about the NBD issue where VBD are left connected to Dom0.
I added what I already know about the situation on my side.
Anyone observing the error can help us by sharing what they observed in the thread.

https://xcp-ng.org/forum/topic/9864/vdi-staying-connected-to-dom0-when-using-nbd-backups

Thanks

dthenot

Hello again,

It is now available in 8.2.1 with the testing packages, you can install them by enabling the testing repository and updating.
Available in sm 2.30.8-10.2.

yum update --enablerepo=xcp-ng-testing sm xapi-core xapi-xe xapi-doc

You then need to restart the toolstack.
Afterwards, you can create SR with the command in the above post.

dthenot

@olivierlambert In 8.2 yes, linstor sm version is separated, it's not the case in 8.3 anymore.

dthenot

@Razor_648 While I was writing my previous message, I have been reminded that there are also issues with LVHDoISCSI SR and CBT, you should disable CBT on your backup job and on all VDI on the SR. It might help with the issue.

dthenot

@Razor_648 Hi,

The log you showed only mean that it couldn't compare two VDI together using their CBT.
It sometimes happen that a CBT chain become disconnected.

Disabling leaf-coalesce mean it won't run on leaf, VHD chain will always be 2 depths deep.

You migrated 200 VMs, every disks of those VMs had snapshot made that then need to be coalesced, it can take a while.
Your backup then also do a snapshot each time while running that need to be coalesced.

There are GC in both version of XCP-ng 8.2 and 8.3.
The GC is run independently of auto-scan, if you really want to disable it you can do it temporarily using /opt/xensource/sm/cleanup.py -x -u <SR UUID> it will stop the GC until you press enter. I guess you could run it in a tmux to make it stop until next reboot. But it would be better to find the problem or if there really is no problem let the GC work until it's finished.
It's a bit weird to need 15 minutes to take a snapshot, it would point to a problem though.
Do you have any other error than the CBT one in your SMlog?

dthenot

@manilx Hi,

yum install plug-late-sr

Should do the trick to install it

dthenot

@bufanda You just need to make sure to have a sm and blktap qcow2 version.
Otherwise, having a normal sm version would drop the QCOW2 VDI from the XAPI database and you would lose VBD to VM aswell as name of VDI.
So it could be painful depending on how much you have
But in the case, you would install a non QCOW2 sm version, you would only lose the QCOW2 from DB, those would not be deleted or anything. Reinstalling a QCOW2 version then rescanning the SR would make them re-appear. But then you would have to identify them again (lost name-label) and relink them to VM.
We try to keep our QCOW2 version on top of the testing branch of XCP-ng but we could miss it

dthenot

@bufanda Hello,

There is equivalent sm packages in the qcow2 repo for testing, XAPI will be coming soon.
You can update while enabling the QCOW2 repo to get the sm and blktap QCOW2 version and get the XAPI version letter if you want.

dthenot

@JeffBerntsen That's why I meant, the way to install written in the first post still work in 8.3, the script still work as expected also, it basically only create the VG/LV needed on hosts before you create the SR.

dthenot

@gb.123 Hello,
The instruction in the first post are still the way to go

dthenot

@nvoss said in VDI Chain on Deltas:

What would make the force restart work when the scheduled regular runs dont?

I'm not sure what you mean.
The backup need to do a snapshot to have a point to compare before exporting data.
This snapshot will create a new level of VHD that would need to be coalesced, but it's limiting the number of VHD in the chain so it fails.
This is caused by the fact that the garbage collector can't run because it can't edit the corrupted VDI.
Since there is a corrupted VDI it's not running to not create more problem on the VDI chains.
Sometime corruption mean that we don't know if a VHD has any parent for example, and if doing so we can't know what the chain looks like meaning not knowing what VHD are in what chain in the SR (Storage Repository).

VDI: Virtual Disk Image in this context
VHD being the format of VDI we use at the moment in XCP-ng

After removing the corrupted VDI, maybe automatically by the migration process (maybe you'll have to do it by hand), you can run a sr-scan on the SR and it launch the GC again.

dthenot

@nvoss No, the GC is blocked because only one VDI is corrupted, the one with the check.
All other VDI are on a long chain because they couldn't coalesce.
Sorry, BATMAP is the block allocation table, it's the info of the VHD to know which block exist locally.
Migrating the VDI might work indeed, I can't really be sure.

dthenot

@dthenot

Best posts made by dthenot

Versions:

Latest posts made by dthenot