FIX to XCP-ng
-
Is it possible to apply this fix also in XCP-ng 8?
Live migration, storage live migration, and VDI migration can fail for VMs that have no attached
VIFs. After this failure, the VM hangs in shutdown mode. (CA-327906)https://github.com/xapi-project/xenopsd/commit/8c3756b952476ff82f9bcbb9ab11ea027bc5ccbb
-
If having that fix is worth the trouble of updating a pool (and xenopsd updates can cause migration failures between hosts of a pool being updated), we could.
-
I think it's i'm going through this problem. we are migrating from CH8 to XCP and during migrate, I have noticed failure in the process (migre stalled at 100% and does not conclude) or unexpected shutdown occurs in virtual server.
Does this bug affect migrate after a pool is 100% updated with XCP8 (with all updates applied)?
-
Was it stuck at
Caught Xs_protocol.Enoent("directory"): cleaning up VM stat
like the commit message says?@_danielgurgel said in FIX to XCP-ng:
Does this bug affect migrate after a pool is 100% updated with XCP8 (with all updates applied)?
Which bug exactly?
-
@stormi Described in CA-327906
-
@stormi said in FIX to XCP-ng:
cleaning up VM stat
Yes, I received the error, see below:
xensource.log:Nov 27 08:20:32 SECH82 xenopsd-xc: [debug|SECH82|24 |Async.VM.pool_migrate R:8472b9c00966|xenops_server] Caught Xenops_interface.Xenopsd_error([S(Storage_backend_error);[S(SR_BACKEND_FAILURE_46);[S();S(The VDI is not available [opterr=VDI e3222f55-10ce-4d85-b6a8-a7c81f1a5a1d not detached cleanly]);S()]]]): cleaning up VM state
[10:14 SECH82 log]# cat /etc/xensource-inventory PRIMARY_DISK='/dev/disk/by-id/scsi-36d0946606f4911002317966d118d6a4f' DOM0_VCPUS='16' PRODUCT_VERSION='8.0.0' DOM0_MEM='8192' CONTROL_DOMAIN_UUID='8320160d-3a65-4051-8d55-2e619ad4875f' MANAGEMENT_ADDRESS_TYPE='IPv4' COMPANY_NAME_SHORT='Open Source' PARTITION_LAYOUT='ROOT,BACKUP,LOG,BOOT,SWAP,SR' PRODUCT_VERSION_TEXT='8.0' INSTALLATION_UUID='a394d22c-94a9-4e83-89c0-fd366b191216' PRODUCT_BRAND='XCP-ng' BRAND_CONSOLE='XCP-ng Center' PRODUCT_VERSION_TEXT_SHORT='8.0' MANAGEMENT_INTERFACE='xenbr2' PRODUCT_NAME='xenenterprise' STUNNEL_LEGACY='true' BUILD_NUMBER='release/naples/master/45' PLATFORM_VERSION='3.0.0' COMPANY_PRODUCT_BRAND='XCP-ng' PLATFORM_NAME='XCP' BACKUP_PARTITION='/dev/disk/by-id/scsi-36d0946606f4911002317966d118d6a4f-part2' BRAND_CONSOLE_URL='https://xcp-ng.org' INSTALLATION_DATE='2019-11-27 01:22:15.630698' COMPANY_NAME='Open Source'
-
I'm not completely sure it's the same issue. Have you double check that you have ejected VM CD before doing migration?
-
@olivierlambert Yes, all VMs have ejected CD.
-
Ideally, we could make a patch a see if it fixes your issue.
-
I've built xenopsd with that patch.
You can download the xenopsd and xenopsd-xc packages (both need to be updated) there:
- https://koji.xcp-ng.org/kojifiles/work/tasks/362/10362/xenopsd-0.101.0-2.0.1.xcpng8.0.x86_64.rpm
- https://koji.xcp-ng.org/kojifiles/work/tasks/362/10362/xenopsd-xc-0.101.0-2.0.1.xcpng8.0.x86_64.rpm
I believe a toolstack restart is enough afterwards.
-
What amazing support
-
@stormi After applying the new patch, migrations are no longer failing. We just tested on a new cluster with 20 servers.
Thank you so much for your help.
This patch be including as official update? -
We can do that yes.
-
Did the patch fix migrations that were failing before, for the same VMs? Else we're not sure that the issue you had previously was fixed by it.
Your error message was
Caught Xenops_interface.Xenopsd_error([S(Storage_backend_error);[S(SR_BACKEND_FAILURE_46);[S();S(The VDI is not available [opterr=VDI e3222f55-10ce-4d85-b6a8-a7c81f1a5a1d not detached cleanly]);S()]]])
In the commit message the error message that the patch fixes is:
Caught Xs_protocol.Enoent("directory")
-
Note: the patch only fixes migration for VMs without any network interface. Was that the case for you?
Edit: looks like this patch could be a fix for https://github.com/xcp-ng/xcp/issues/269
-
@stormi it seems to be the same problem.
Sorry, it may have been the "placebo effect" but after applying the update we no longer had the error when making the migrate.
I'm going to continue the tests... and with VMs without network card, as described in the link.
-
Before applying the patch, in a new pool (during the CH8 update process to XCP8), when moving any VM to the updated host I received the error below.
After installing the patch on the master host and restarting the toolstack on the host master and the source host, we no longer notice error, with the migrate process successfully completing.
The error condition was the failed shutdown of the VM in the live migrate process. So I think the patch made available actually solves the problem in question. (for VMs with and without network interfaces)
Nov 30 12:54:48 SECH82 xapi: [error|SECH82|946189 ||backtrace] Async.VM.pool_migrate R:29b4b31d74de failed with exception Server_error(INTERNAL_ERROR, [ xenopsd internal error: Device_common.QMP_Error(135, "{\"error\":{\"class\":\"GenericError\",\"desc\":\"Unable to open /dev/fdset/0: No such file or directory\",\"data\":{}},\"id\":\"qmp-000029-135\"}") ]) Nov 30 12:54:48 SECH82 xapi: [error|SECH82|946189 ||backtrace] Raised Server_error(INTERNAL_ERROR, [ xenopsd internal error: Device_common.QMP_Error(135, "{\"error\":{\"class\":\"GenericError\",\"desc\":\"Unable to open /dev/fdset/0: No such file or directory\",\"data\":{}},\"id\":\"qmp-000029-135\"}") ]) Nov 30 12:54:48 SECH82 xapi: [error|SECH82|946189 ||backtrace] 1/1 xapi @ SECH82 Raised at file (Thread 946189 has no backtrace table. Was with_backtraces called?, line 0 Nov 30 12:54:48 SECH82 xapi: [error|SECH82|946189 ||backtrace]