FIX to XCP-ng

_danielgurgel

Is it possible to apply this fix also in XCP-ng 8?

Live migration, storage live migration, and VDI migration can fail for VMs that have no attached
VIFs. After this failure, the VM hangs in shutdown mode. (CA-327906)

https://github.com/xapi-project/xenopsd/commit/8c3756b952476ff82f9bcbb9ab11ea027bc5ccbb

stormi

If having that fix is worth the trouble of updating a pool (and xenopsd updates can cause migration failures between hosts of a pool being updated), we could.

_danielgurgel

I think it's i'm going through this problem. we are migrating from CH8 to XCP and during migrate, I have noticed failure in the process (migre stalled at 100% and does not conclude) or unexpected shutdown occurs in virtual server.

Does this bug affect migrate after a pool is 100% updated with XCP8 (with all updates applied)?

stormi

Was it stuck at Caught Xs_protocol.Enoent("directory"): cleaning up VM stat like the commit message says?

@_danielgurgel said in FIX to XCP-ng:

Does this bug affect migrate after a pool is 100% updated with XCP8 (with all updates applied)?

Which bug exactly?

_danielgurgel

@stormi Described in CA-327906

_danielgurgel

@stormi said in FIX to XCP-ng:

cleaning up VM stat

Yes, I received the error, see below:

xensource.log:Nov 27 08:20:32 SECH82 xenopsd-xc: [debug|SECH82|24 |Async.VM.pool_migrate R:8472b9c00966|xenops_server] Caught Xenops_interface.Xenopsd_error([S(Storage_backend_error);[S(SR_BACKEND_FAILURE_46);[S();S(The VDI is not available [opterr=VDI e3222f55-10ce-4d85-b6a8-a7c81f1a5a1d not detached cleanly]);S()]]]): cleaning up VM state

[10:14 SECH82 log]# cat /etc/xensource-inventory
PRIMARY_DISK='/dev/disk/by-id/scsi-36d0946606f4911002317966d118d6a4f'
DOM0_VCPUS='16'
PRODUCT_VERSION='8.0.0'
DOM0_MEM='8192'
CONTROL_DOMAIN_UUID='8320160d-3a65-4051-8d55-2e619ad4875f'
MANAGEMENT_ADDRESS_TYPE='IPv4'
COMPANY_NAME_SHORT='Open Source'
PARTITION_LAYOUT='ROOT,BACKUP,LOG,BOOT,SWAP,SR'
PRODUCT_VERSION_TEXT='8.0'
INSTALLATION_UUID='a394d22c-94a9-4e83-89c0-fd366b191216'
PRODUCT_BRAND='XCP-ng'
BRAND_CONSOLE='XCP-ng Center'
PRODUCT_VERSION_TEXT_SHORT='8.0'
MANAGEMENT_INTERFACE='xenbr2'
PRODUCT_NAME='xenenterprise'
STUNNEL_LEGACY='true'
BUILD_NUMBER='release/naples/master/45'
PLATFORM_VERSION='3.0.0'
COMPANY_PRODUCT_BRAND='XCP-ng'
PLATFORM_NAME='XCP'
BACKUP_PARTITION='/dev/disk/by-id/scsi-36d0946606f4911002317966d118d6a4f-part2'
BRAND_CONSOLE_URL='https://xcp-ng.org'
INSTALLATION_DATE='2019-11-27 01:22:15.630698'
COMPANY_NAME='Open Source'

olivierlambert

I'm not completely sure it's the same issue. Have you double check that you have ejected VM CD before doing migration?

_danielgurgel

@olivierlambert Yes, all VMs have ejected CD.

olivierlambert

Ideally, we could make a patch a see if it fixes your issue.

stormi

I've built xenopsd with that patch.

You can download the xenopsd and xenopsd-xc packages (both need to be updated) there:

I believe a toolstack restart is enough afterwards.

maxcuttins

What amazing support

_danielgurgel

@stormi After applying the new patch, migrations are no longer failing. We just tested on a new cluster with 20 servers.

Thank you so much for your help.
This patch be including as official update?

olivierlambert

We can do that yes.

stormi

Did the patch fix migrations that were failing before, for the same VMs? Else we're not sure that the issue you had previously was fixed by it.

Your error message was
Caught Xenops_interface.Xenopsd_error([S(Storage_backend_error);[S(SR_BACKEND_FAILURE_46);[S();S(The VDI is not available [opterr=VDI e3222f55-10ce-4d85-b6a8-a7c81f1a5a1d not detached cleanly]);S()]]])

In the commit message the error message that the patch fixes is:
Caught Xs_protocol.Enoent("directory")

stormi

Note: the patch only fixes migration for VMs without any network interface. Was that the case for you?

Edit: looks like this patch could be a fix for https://github.com/xcp-ng/xcp/issues/269

_danielgurgel

@stormi it seems to be the same problem.

Sorry, it may have been the "placebo effect" but after applying the update we no longer had the error when making the migrate.

I'm going to continue the tests... and with VMs without network card, as described in the link.

_danielgurgel

Before applying the patch, in a new pool (during the CH8 update process to XCP8), when moving any VM to the updated host I received the error below.

After installing the patch on the master host and restarting the toolstack on the host master and the source host, we no longer notice error, with the migrate process successfully completing.

The error condition was the failed shutdown of the VM in the live migrate process. So I think the patch made available actually solves the problem in question. (for VMs with and without network interfaces)

Nov 30 12:54:48 SECH82 xapi: [error|SECH82|946189 ||backtrace] Async.VM.pool_migrate R:29b4b31d74de failed with exception Server_error(INTERNAL_ERROR, [ xenopsd internal error: Device_common.QMP_Error(135, "{\"error\":{\"class\":\"GenericError\",\"desc\":\"Unable to open /dev/fdset/0: No such file or directory\",\"data\":{}},\"id\":\"qmp-000029-135\"}") ])
Nov 30 12:54:48 SECH82 xapi: [error|SECH82|946189 ||backtrace] Raised Server_error(INTERNAL_ERROR, [ xenopsd internal error: Device_common.QMP_Error(135, "{\"error\":{\"class\":\"GenericError\",\"desc\":\"Unable to open /dev/fdset/0: No such file or directory\",\"data\":{}},\"id\":\"qmp-000029-135\"}") ])
Nov 30 12:54:48 SECH82 xapi: [error|SECH82|946189 ||backtrace] 1/1 xapi @ SECH82 Raised at file (Thread 946189 has no backtrace table. Was with_backtraces called?, line 0
Nov 30 12:54:48 SECH82 xapi: [error|SECH82|946189 ||backtrace]