@olivierlambert In 8.2 yes, linstor sm version is separated, it's not the case in 8.3 anymore.

Posts
-
RE: Unable to add new node to pool using XOSTOR
-
RE: SR Garbage Collection running permanently
@Razor_648 While I was writing my previous message, I have been reminded that there are also issues with LVHDoISCSI SR and CBT, you should disable CBT on your backup job and on all VDI on the SR. It might help with the issue.
-
RE: SR Garbage Collection running permanently
@Razor_648 Hi,
The log you showed only mean that it couldn't compare two VDI together using their CBT.
It sometimes happen that a CBT chain become disconnected.Disabling leaf-coalesce mean it won't run on leaf, VHD chain will always be 2 depths deep.
You migrated 200 VMs, every disks of those VMs had snapshot made that then need to be coalesced, it can take a while.
Your backup then also do a snapshot each time while running that need to be coalesced.There are GC in both version of XCP-ng 8.2 and 8.3.
The GC is run independently of auto-scan, if you really want to disable it you can do it temporarily using/opt/xensource/sm/cleanup.py -x -u <SR UUID>
it will stop the GC until you press enter. I guess you could run it in a tmux to make it stop until next reboot. But it would be better to find the problem or if there really is no problem let the GC work until it's finished.
It's a bit weird to need 15 minutes to take a snapshot, it would point to a problem though.
Do you have any other error than the CBT one in your SMlog? -
RE: XCP-ng 8.3 updates announcements and testing
@bufanda You just need to make sure to have a sm and blktap qcow2 version.
Otherwise, having a normal sm version would drop the QCOW2 VDI from the XAPI database and you would lose VBD to VM aswell as name of VDI.
So it could be painful depending on how much you have
But in the case, you would install a non QCOW2 sm version, you would only lose the QCOW2 from DB, those would not be deleted or anything. Reinstalling a QCOW2 version then rescanning the SR would make them re-appear. But then you would have to identify them again (lost name-label) and relink them to VM.
We try to keep our QCOW2 version on top of the testing branch of XCP-ng but we could miss it -
RE: XCP-ng 8.3 updates announcements and testing
@bufanda Hello,
There is equivalent
sm
packages in the qcow2 repo for testing, XAPI will be coming soon.
You can update while enabling the QCOW2 repo to get the sm and blktap QCOW2 version and get the XAPI version letter if you want. -
RE: XOSTOR hyperconvergence preview
@JeffBerntsen That's why I meant, the way to install written in the first post still work in 8.3, the script still work as expected also, it basically only create the VG/LV needed on hosts before you create the SR.
-
RE: XOSTOR hyperconvergence preview
@gb.123 Hello,
The instruction in the first post are still the way to go -
RE: VDI Chain on Deltas
@nvoss said in VDI Chain on Deltas:
What would make the force restart work when the scheduled regular runs dont?
I'm not sure what you mean.
The backup need to do a snapshot to have a point to compare before exporting data.
This snapshot will create a new level of VHD that would need to be coalesced, but it's limiting the number of VHD in the chain so it fails.
This is caused by the fact that the garbage collector can't run because it can't edit the corrupted VDI.
Since there is a corrupted VDI it's not running to not create more problem on the VDI chains.
Sometime corruption mean that we don't know if a VHD has any parent for example, and if doing so we can't know what the chain looks like meaning not knowing what VHD are in what chain in the SR (Storage Repository).VDI: Virtual Disk Image in this context
VHD being the format of VDI we use at the moment in XCP-ngAfter removing the corrupted VDI, maybe automatically by the migration process (maybe you'll have to do it by hand), you can run a
sr-scan
on the SR and it launch the GC again. -
RE: VDI Chain on Deltas
@nvoss No, the GC is blocked because only one VDI is corrupted, the one with the check.
All other VDI are on a long chain because they couldn't coalesce.
Sorry, BATMAP is the block allocation table, it's the info of the VHD to know which block exist locally.
Migrating the VDI might work indeed, I can't really be sure. -
RE: VDI Chain on Deltas
@nvoss The VHD is reported corrupted on the batmap. You can try to repair it with
vhd-util repair
but it'll likely not work.
I have seen people recover from this kind of error by doing a vdi-copy.
You could try a VM copy or a VDI copy and link the VDI to the VM again and see if it's alright.
The corrupted VDI is blocking the garbage collector so the chain are long and that's the error you see on XO side.
It might be needed to remove the chain by hand to resolve the issue. -
RE: VDI Chain on Deltas
@nvoss Could you try to run
vhd-util check -n /var/run/sr-mount/f23aacc2-d566-7dc6-c9b0-bc56c749e056/3a3e915f-c903-4434-a2f0-cfc89bbe96bf.vhd
? -
RE: VDI Chain on Deltas
@nvoss Hello, The
UNDO LEAF-COEALESCE
usually has a cause that is listed in the error above it. Could you share this part please? -
RE: LargeBlockSR for 4KiB blocksize disks
@yllar Maybe it was because of the loopdevice not being completely created indeed.
No error for this GC run.Everything should be ok then
-
RE: LargeBlockSR for 4KiB blocksize disks
Sorry, I missed the first ping.
May 2 08:31:40 a1 SM: [18985] ['/sbin/vgs', '--readonly', 'VG_XenStorage-07ab18c4-a76f-d1fc-4374-babfe21fd679'] May 2 08:32:24 a1 SM: [18985] pread SUCCESS May 2 08:32:24 a1 SM: [18985] ***** Long LVM call of 'vgs' took 43.6255850792
That would explain why it took a long time to create. 43 seconds for a call to
vgs
.
Can you try to do avgs
call yourself on your host?
Does it take a long time?This exception is "normal":
May 2 08:32:25 a1 SMGC: [19336] *~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~* May 2 08:32:25 a1 SMGC: [19336] *********************** May 2 08:32:25 a1 SMGC: [19336] * E X C E P T I O N * May 2 08:32:25 a1 SMGC: [19336] *********************** May 2 08:32:25 a1 SMGC: [19336] gc: EXCEPTION <class 'util.SMException'>, SR 42535e39-4c98-22c6-71eb-303caa3fc97b not attached on this host May 2 08:32:25 a1 SMGC: [19336] File "/opt/xensource/sm/cleanup.py", line 3388, in gc May 2 08:32:25 a1 SMGC: [19336] _gc(None, srUuid, dryRun) May 2 08:32:25 a1 SMGC: [19336] File "/opt/xensource/sm/cleanup.py", line 3267, in _gc May 2 08:32:25 a1 SMGC: [19336] sr = SR.getInstance(srUuid, session) May 2 08:32:25 a1 SMGC: [19336] File "/opt/xensource/sm/cleanup.py", line 1552, in getInstance May 2 08:32:25 a1 SMGC: [19336] return FileSR(uuid, xapi, createLock, force) May 2 08:32:25 a1 SMGC: [19336] File "/opt/xensource/sm/cleanup.py", line 2334, in __init__ May 2 08:32:25 a1 SMGC: [19336] SR.__init__(self, uuid, xapi, createLock, force) May 2 08:32:25 a1 SMGC: [19336] File "/opt/xensource/sm/cleanup.py", line 1582, in __init__ May 2 08:32:25 a1 SMGC: [19336] raise util.SMException("SR %s not attached on this host" % uuid) May 2 08:32:25 a1 SMGC: [19336] May 2 08:32:25 a1 SMGC: [19336] *~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~* May 2 08:32:25 a1 SMGC: [19336] * * * * * SR 42535e39-4c98-22c6-71eb-303caa3fc97b: ERROR May 2 08:32:25 a1 SMGC: [19336]
It's the garbage collector trying to run on the SR but it is in the process of attaching.
It's weird though because it's the call tosr_attach
that launched the GC.
Does the GC run normally on this SR on next attempts?Otherwise, I don't see anything worrying the logs you shared.
It should be safe to use -
RE: Matching volume/resource/lvm on disk to VDI/VHD?
@cmd Hello,
It's described here in the documentation https://docs.xcp-ng.org/xostor/#map-linstor-resource-names-to-xapi-vdi-uuids
It might be possible to add a parameter in the sm-config of the VDI to ease this link, I'll put a card in our backlog to see if it's doable. -
RE: non-zero exit, , File "/opt/xensource/sm/EXTSR", line 78 except util.CommandException, inst: ^ SyntaxError: invalid syntax
@FMOTrust Hello,
Good news you found the problem.
Yes, in XCP-ng 8.3 python should point to a 2.7.5 version while python3 will point to 3.6.8 at the moment.
I imagine you are on 8.2.1 though since the smapi is running in python 3 in 8.3.
While the smapi is python2 only on 8.2.1 and so will expect python to point to the 2.7.5 version. -
RE: non-zero exit, , File "/opt/xensource/sm/EXTSR", line 78 except util.CommandException, inst: ^ SyntaxError: invalid syntax
@FMOTrust Hello,
Could you give us the output of
yum info sm
please? -
RE: XCP-ng 8.3 updates announcements and testing
For people testing the QCOW2 preview, please be informed that you need to update with the QCOW2 repo enabled, if you install the new non QCOW2 version, you risk QCOW2 VDI being dropped from XAPI database until you have installed it and re-scanned the SR.
Dropping from XAPI means losing name-label, description and worse, the links to a VM for these VDI.
There should be a blktap, sm and sm-fairlock update of the same version as above in the QCOW2 repo.If you have correctly added the QCOW2 repo linked here: https://xcp-ng.org/forum/post/90287
You can update like this:
yum clean metadata --enablerepo=xcp-ng-testing,xcp-ng-qcow2 yum update --enablerepo=xcp-ng-testing,xcp-ng-qcow2 reboot
Versions:
blktap
: 3.55.4-1.1.0.qcow2.1.xcpng8.3sm
: 3.2.12-3.1.0.qcow2.1.xcpng8.3
-
RE: Issue with SR and coalesce
Hi, this XAPI plugin
multi
is called on another host but is failing with IOError.
It's doing a few things on a host related to LVM handling.
It's failing on one of them, you should look into the one having the error to have the full error in SMlog of the host.
The plugin itself is located in/etc/xapi.d/plugins/on-slave
, it's the function namedmulti
.