XCP-ng 7.6 RC1 available
-
@olivierlambert
Same Pool. It does not seem to matter if it is idle or not. -
Interesting update: I installed the latest PV tools by enabling window update, this fixed 4 out of my 5 test VMs running windows. More testing is underway with linux PV and HVM.
-
That's maybe the reason why I can't reproduce. It's like old tools will lead to migration problems but not more recent ones
-
@olivierlambert said in XCP-ng 7.6 RC1 available:
That's maybe the reason why I can't reproduce. It's like old tools will lead to migration problems but not more recent ones
HVM also have migration problems.
-
I'm testing with PVHVM guests, I can't reproduce since I got latest tools installed
-
I don't know if this helps but I have also been doing some test on 7.5 and it looks like VMs that are using a large percentage of their memory have problems migrating.
eg Centos 7 HVM with Atlassian bitbucket java process using about 80%
From top:PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1874 atlbitb+ 20 0 3695600 1.3g 6440 S 0.3 42.8 0:52.40 java 1891 atlbitb+ 20 0 4154320 1.1g 8124 S 1.3 36.5 6:40.68 java
Live migration stops at 99% and stays there till you cancel the migration and I have seen it go to 100% and the VM crash.
You then need to restart the toolstack on the receiving server or
/var/log/xcp-rrdd-plugins.log
fills up with these messagesxcp-rrdd-squeezed: [ warn|xen1-3|0 ||rrdd-plugins] Couldn't find cached dynamic-max value for domain 66, using 0
Shut down the 2 java processes and it migrates OK.
I have also seen the same problem with migration of virtual disks between iSCSI SRs
-
@KFC-Netearth edited your post to see the top output better
-
@KFC-Netearth That's really interesting. Do you have the latest tools version installed in your VM?
edit: I'll try to use 80%+ RAM on a test VM and see if it triggers the problem
-
I have this version of the guest-tools on the VM
#rpm -aq|grep xen
xe-guest-utilities-xenstore-7.10.0-1.x86_64 -
Seems to be latest tools.
Do you have anything during the failed migration in
/var/log/xensource.log
? Or/var/log/SMlog
? -
iSCSI possible problem:
I install XCP-NG 7.6-RC1 on 2 node testing pool and I get error when I try to attach iSCSI SR.
"The SR could not ne connected because the drive gfs2 was not recognized.
Check you settings and try again."![0_1540743412810_XCP-NG_7.6-iSCSI error.jpg](Uploading 100%)
I create THICK volume:
![0_1540724487043_XCP-NG_7.6-iSCSI create.png](Uploading 0%)With Citrix XEN 7.6 works fine on same QNAP.
Any idea what I do wrong or I encountered an error in XCP-NG 7.6.
-
@olivierlambert first testresults:
Situation
- testpool at work with 3 nodes
- XCP-ng 7.5 with latest regular updates
- VM with debian 8.9, 2 vCPU, 2GB RAM, latest updates, xe-guest-utilities 7.4.0-1, x64, PVHVM
Test 1 - VM is idle
- VM migrates just fine
Test 2 - VM stressed with
stress --cpu 1
- VM migration crashed with error, VM is stopped
Oct 28 16:17:52 xen5 xenopsd-xc: [debug|xen5|32 ||xenops_server] TASK.signal 192582 = ["Failed",["Internal_error","End_of_file"]] ... Oct 28 16:17:52 xen5 xapi: [debug|xen5|78975 INET :::80|Async.VM.pool_migrate R:e90b0ed1692e|xapi] xenops: will retry migration: caught Xenops_interface.Internal_error("End_of_file") from sync_with_task in attempt 1 of 3. ... Oct 28 16:17:52 xen5 xenopsd-xc: [ info|xen5|28 |Async.VM.pool_migrate R:e90b0ed1692e|xenops_server] Caught Xenops_interface.Does_not_exist(_) executing ["VM_migrate",["0c19e8aa-24d2-a6a4-5677-8ef4af6dd592",{},{},{},"http://xxx.xxx.xxx.xxx/services/xenops?session_id=OpaqueRef:89ad861c-f0fd-4d70-bb0d-4bf65741f13a"]]: triggering cleanup actions
(full log: https://schulzalex.de/nextcloud/index.php/s/pT456wEZkBoJrTw)
I had to execute
xe-toolstack-restart
on the whole pool to get migration working again.Test3 - VM is idle
- migrate just fine like in Test 1
Test4 - VM is executing
apt upgrade
- migrate stucks at
... VM = 0c19e8aa-24d2-a6a4-5677-8ef4af6dd592; domid = 26; progress = 99 / 100...
until the upgrade inside the VM is finished and the VM is idle
Partial output of
tail -f /var/log/xensource.log | grep -i migrate | grep -i progress
... Oct 28 16:57:56 xen5 xenopsd-xc: [debug|xen5|14 |Async.VM.pool_migrate R:ccb8ffcf5ce5|xenops] VM = 0c19e8aa-24d2-a6a4-5677-8ef4af6dd592; domid = 26; progress = 99 / 100 Oct 28 16:57:56 xen5 xenopsd-xc: [debug|xen5|14 |Async.VM.pool_migrate R:ccb8ffcf5ce5|xenops] VM = 0c19e8aa-24d2-a6a4-5677-8ef4af6dd592; domid = 26; progress = 99 / 100 Oct 28 16:57:56 xen5 xenopsd-xc: [debug|xen5|14 |Async.VM.pool_migrate R:ccb8ffcf5ce5|xenops] VM = 0c19e8aa-24d2-a6a4-5677-8ef4af6dd592; domid = 26; progress = 99 / 100 Oct 28 16:57:56 xen5 xenopsd-xc: [debug|xen5|14 |Async.VM.pool_migrate R:ccb8ffcf5ce5|xenops] VM = 0c19e8aa-24d2-a6a4-5677-8ef4af6dd592; domid = 26; progress = 99 / 100 Oct 28 16:57:56 xen5 xenopsd-xc: [debug|xen5|14 |Async.VM.pool_migrate R:ccb8ffcf5ce5|xenops] VM = 0c19e8aa-24d2-a6a4-5677-8ef4af6dd592; domid = 26; progress = 99 / 100 Oct 28 16:57:57 xen5 xenopsd-xc: [debug|xen5|14 |Async.VM.pool_migrate R:ccb8ffcf5ce5|xenops] VM = 0c19e8aa-24d2-a6a4-5677-8ef4af6dd592; domid = 26; progress = 99 / 100 Oct 28 16:57:57 xen5 xenopsd-xc: [debug|xen5|14 |Async.VM.pool_migrate R:ccb8ffcf5ce5|xenops] VM = 0c19e8aa-24d2-a6a4-5677-8ef4af6dd592; domid = 26; progress = 99 / 100 Oct 28 16:57:57 xen5 xenopsd-xc: [debug|xen5|14 |Async.VM.pool_migrate R:ccb8ffcf5ce5|xenops] VM = 0c19e8aa-24d2-a6a4-5677-8ef4af6dd592; domid = 26; progress = 99 / 100 Oct 28 16:57:57 xen5 xenopsd-xc: [debug|xen5|14 |Async.VM.pool_migrate R:ccb8ffcf5ce5|xenops] VM = 0c19e8aa-24d2-a6a4-5677-8ef4af6dd592; domid = 26; progress = 99 / 100 Oct 28 16:57:57 xen5 xenopsd-xc: [debug|xen5|14 |Async.VM.pool_migrate R:ccb8ffcf5ce5|xenops] VM = 0c19e8aa-24d2-a6a4-5677-8ef4af6dd592; domid = 26; progress = 99 / 100 Oct 28 16:57:57 xen5 xenopsd-xc: [debug|xen5|14 |Async.VM.pool_migrate R:ccb8ffcf5ce5|xenops] VM = 0c19e8aa-24d2-a6a4-5677-8ef4af6dd592; domid = 26; progress = 99 / 100 Oct 28 16:57:57 xen5 xenopsd-xc: [debug|xen5|14 |Async.VM.pool_migrate R:ccb8ffcf5ce5|xenops] VM = 0c19e8aa-24d2-a6a4-5677-8ef4af6dd592; domid = 26; progress = 99 / 100 Oct 28 16:57:57 xen5 xenopsd-xc: [debug|xen5|14 |Async.VM.pool_migrate R:ccb8ffcf5ce5|xenops] VM = 0c19e8aa-24d2-a6a4-5677-8ef4af6dd592; domid = 26; progress = 99 / 100 Oct 28 16:57:57 xen5 xenopsd-xc: [debug|xen5|14 |Async.VM.pool_migrate R:ccb8ffcf5ce5|xenops] VM = 0c19e8aa-24d2-a6a4-5677-8ef4af6dd592; domid = 26; progress = 99 / 100 Oct 28 16:57:58 xen5 xenopsd-xc: [debug|xen5|14 |Async.VM.pool_migrate R:ccb8ffcf5ce5|xenops] VM = 0c19e8aa-24d2-a6a4-5677-8ef4af6dd592; domid = 26; progress = 99 / 100 Oct 28 16:57:58 xen5 xenopsd-xc: [debug|xen5|14 |Async.VM.pool_migrate R:ccb8ffcf5ce5|xenops] VM = 0c19e8aa-24d2-a6a4-5677-8ef4af6dd592; domid = 26; progress = 99 / 100 Oct 28 16:57:58 xen5 xenopsd-xc: [debug|xen5|14 |Async.VM.pool_migrate R:ccb8ffcf5ce5|xenops] VM = 0c19e8aa-24d2-a6a4-5677-8ef4af6dd592; domid = 26; progress = 99 / 100 Oct 28 16:57:58 xen5 xenopsd-xc: [debug|xen5|14 |Async.VM.pool_migrate R:ccb8ffcf5ce5|xenops] VM = 0c19e8aa-24d2-a6a4-5677-8ef4af6dd592; domid = 26; progress = 99 / 100 Oct 28 16:57:58 xen5 xenopsd-xc: [debug|xen5|14 |Async.VM.pool_migrate R:ccb8ffcf5ce5|xenops] VM = 0c19e8aa-24d2-a6a4-5677-8ef4af6dd592; domid = 26; progress = 99 / 100 Oct 28 16:57:58 xen5 xenopsd-xc: [debug|xen5|14 |Async.VM.pool_migrate R:ccb8ffcf5ce5|xenops] VM = 0c19e8aa-24d2-a6a4-5677-8ef4af6dd592; domid = 26; progress = 99 / 100 Oct 28 16:57:58 xen5 xenopsd-xc: [debug|xen5|14 |Async.VM.pool_migrate R:ccb8ffcf5ce5|xenops] VM = 0c19e8aa-24d2-a6a4-5677-8ef4af6dd592; domid = 26; progress = 99 / 100 Oct 28 16:57:58 xen5 xenopsd-xc: [debug|xen5|14 |Async.VM.pool_migrate R:ccb8ffcf5ce5|xenops] VM = 0c19e8aa-24d2-a6a4-5677-8ef4af6dd592; domid = 26; progress = 99 / 100 Oct 28 16:57:58 xen5 xenopsd-xc: [debug|xen5|14 |Async.VM.pool_migrate R:ccb8ffcf5ce5|xenops] VM = 0c19e8aa-24d2-a6a4-5677-8ef4af6dd592; domid = 26; progress = 99 / 100 Oct 28 16:57:58 xen5 xenopsd-xc: [debug|xen5|14 |Async.VM.pool_migrate R:ccb8ffcf5ce5|xenops] VM = 0c19e8aa-24d2-a6a4-5677-8ef4af6dd592; domid = 26; progress = 99 / 100 Oct 28 16:57:58 xen5 xenopsd-xc: [debug|xen5|14 |Async.VM.pool_migrate R:ccb8ffcf5ce5|xenops] VM = 0c19e8aa-24d2-a6a4-5677-8ef4af6dd592; domid = 26; progress = 99 / 100 Oct 28 16:57:58 xen5 xenopsd-xc: [debug|xen5|14 |Async.VM.pool_migrate R:ccb8ffcf5ce5|xenops] VM = 0c19e8aa-24d2-a6a4-5677-8ef4af6dd592; domid = 26; progress = 99 / 100 Oct 28 16:57:59 xen5 xenopsd-xc: [debug|xen5|14 |Async.VM.pool_migrate R:ccb8ffcf5ce5|xenops] VM = 0c19e8aa-24d2-a6a4-5677-8ef4af6dd592; domid = 26; progress = 99 / 100 Oct 28 16:57:59 xen5 xenopsd-xc: [debug|xen5|14 |Async.VM.pool_migrate R:ccb8ffcf5ce5|xenops] VM = 0c19e8aa-24d2-a6a4-5677-8ef4af6dd592; domid = 26; progress = 99 / 100 Oct 28 16:57:59 xen5 xenopsd-xc: [debug|xen5|14 |Async.VM.pool_migrate R:ccb8ffcf5ce5|xenops] VM = 0c19e8aa-24d2-a6a4-5677-8ef4af6dd592; domid = 26; progress = 99 / 100 Oct 28 16:57:59 xen5 xenopsd-xc: [debug|xen5|14 |Async.VM.pool_migrate R:ccb8ffcf5ce5|xenops] VM = 0c19e8aa-24d2-a6a4-5677-8ef4af6dd592; domid = 26; progress = 99 / 100 Oct 28 16:57:59 xen5 xenopsd-xc: [debug|xen5|14 |Async.VM.pool_migrate R:ccb8ffcf5ce5|xenops] VM = 0c19e8aa-24d2-a6a4-5677-8ef4af6dd592; domid = 26; progress = 99 / 100 Oct 28 16:57:59 xen5 xenopsd-xc: [debug|xen5|14 |Async.VM.pool_migrate R:ccb8ffcf5ce5|xenops] VM = 0c19e8aa-24d2-a6a4-5677-8ef4af6dd592; domid = 26; progress = 99 / 100 Oct 28 16:57:59 xen5 xenopsd-xc: [debug|xen5|14 |Async.VM.pool_migrate R:ccb8ffcf5ce5|xenops] VM = 0c19e8aa-24d2-a6a4-5677-8ef4af6dd592; domid = 26; progress = 99 / 100 Oct 28 16:57:59 xen5 xenopsd-xc: [debug|xen5|14 |Async.VM.pool_migrate R:ccb8ffcf5ce5|xenops] VM = 0c19e8aa-24d2-a6a4-5677-8ef4af6dd592; domid = 26; progress = 99 / 100 Oct 28 16:57:59 xen5 xenopsd-xc: [debug|xen5|14 |Async.VM.pool_migrate R:ccb8ffcf5ce5|xenops] VM = 0c19e8aa-24d2-a6a4-5677-8ef4af6dd592; domid = 26; progress = 99 / 100 Oct 28 16:57:59 xen5 xenopsd-xc: [debug|xen5|14 |Async.VM.pool_migrate R:ccb8ffcf5ce5|xenops] VM = 0c19e8aa-24d2-a6a4-5677-8ef4af6dd592; domid = 26; progress = 99 / 100 Oct 28 16:57:59 xen5 xenopsd-xc: [debug|xen5|14 |Async.VM.pool_migrate R:ccb8ffcf5ce5|xenops] VM = 0c19e8aa-24d2-a6a4-5677-8ef4af6dd592; domid = 26; progress = 99 / 100 Oct 28 16:58:00 xen5 xenopsd-xc: [debug|xen5|14 |Async.VM.pool_migrate R:ccb8ffcf5ce5|xenops] VM = 0c19e8aa-24d2-a6a4-5677-8ef4af6dd592; domid = 26; progress = 99 / 100 Oct 28 16:58:00 xen5 xenopsd-xc: [debug|xen5|14 |Async.VM.pool_migrate R:ccb8ffcf5ce5|xenops] VM = 0c19e8aa-24d2-a6a4-5677-8ef4af6dd592; domid = 26; progress = 99 / 100 Oct 28 16:58:00 xen5 xenopsd-xc: [debug|xen5|14 |Async.VM.pool_migrate R:ccb8ffcf5ce5|xenops] VM = 0c19e8aa-24d2-a6a4-5677-8ef4af6dd592; domid = 26; progress = 99 / 100 Oct 28 16:58:00 xen5 xenopsd-xc: [debug|xen5|14 |Async.VM.pool_migrate R:ccb8ffcf5ce5|xenops] VM = 0c19e8aa-24d2-a6a4-5677-8ef4af6dd592; domid = 26; progress = 99 / 100 Oct 28 16:58:00 xen5 xenopsd-xc: [debug|xen5|14 |Async.VM.pool_migrate R:ccb8ffcf5ce5|xenops] VM = 0c19e8aa-24d2-a6a4-5677-8ef4af6dd592; domid = 26; progress = 99 / 100 Oct 28 16:58:00 xen5 xenopsd-xc: [debug|xen5|14 |Async.VM.pool_migrate R:ccb8ffcf5ce5|xenops] VM = 0c19e8aa-24d2-a6a4-5677-8ef4af6dd592; domid = 26; progress = 99 / 100 Oct 28 16:58:00 xen5 xenopsd-xc: [debug|xen5|14 |Async.VM.pool_migrate R:ccb8ffcf5ce5|xenops] VM = 0c19e8aa-24d2-a6a4-5677-8ef4af6dd592; domid = 26; progress = 99 / 100 ....
- in the end the VM is migrated just fine
-
Test 5 - VM idle (with latest guest tools
xe-guest-utilities 7.10.0-1
)- migrated just fine
Test 6 - VM stressed with
stress --cpu 1
(with latest guest toolsxe-guest-utilities 7.10.0-1
)- same error as in Test 2
- logfile from receiving side: https://schulzalex.de/nextcloud/index.php/s/XCPPzN5qcda59wL
-
Just spotted a difference in my logfiles
I migrated my test-VM in idle and in stressed state and recorded the logfiles on both the sender and the receiver host.On the sender side there was a difference, see here:
https://schulzalex.de/nextcloud/index.php/s/aeoEFAD7TNWjfqkHere the part that is different:
Oct 28 18:08:04 xen5 xenopsd-xc: [error|xen5|26 |Async.VM.pool_migrate R:8d24f91b7cb1|emu-manager] Memory F 24499440 KiB S 0 KiB T 32766 MiB Oct 28 18:08:04 xen5 xenopsd-xc: [ info|xen5|26 |Async.VM.pool_migrate R:8d24f91b7cb1|xenops_server] Caught End_of_file executing ["VM_migrate",["0c19e8aa-24d2-a6a4-5677-8ef4af6dd592",{},{},{},"http://xxx.xxx.xxx.xxx/services/xenops?session_id=OpaqueRef:efd310cd-3d07-4d0b-92f5-531fe4a26614"]]: triggering cleanup actions Oct 28 18:08:04 xen5 xenopsd-xc: [debug|xen5|26 |Async.VM.pool_migrate R:8d24f91b7cb1|xenops_server] Task 157 reference Async.VM.pool_migrate R:8d24f91b7cb1: ["VM_check_state","0c19e8aa-24d2-a6a4-5677-8ef4af6dd592"] Oct 28 18:08:04 xen5 xenopsd-xc: [ warn|xen5|26 |Async.VM.pool_migrate R:8d24f91b7cb1|xenops_server] VM 0c19e8aa-24d2-a6a4-5677-8ef4af6dd592 has unexpectedly suspended Oct 28 18:08:04 xen5 xenopsd-xc: [debug|xen5|26 |Async.VM.pool_migrate R:8d24f91b7cb1|xenops_server] VM.shutdown 0c19e8aa-24d2-a6a4-5677-8ef4af6dd592 Oct 28 18:08:04 xen5 xenopsd-xc: [debug|xen5|26 |Async.VM.pool_migrate R:8d24f91b7cb1|xenops_server] Performing: ["VM_destroy_device_model","0c19e8aa-24d2-a6a4-5677-8ef4af6dd592"] Oct 28 18:08:04 xen5 xenopsd-xc: [debug|xen5|26 |Async.VM.pool_migrate R:8d24f91b7cb1|xenops_server] VM.destroy_device_model 0c19e8aa-24d2-a6a4-5677-8ef4af6dd592 Oct 28 18:08:04 xen5 xenopsd-xc: [debug|xen5|26 |Async.VM.pool_migrate R:8d24f91b7cb1|xenops] qemu-dm: stopping qemu-dm with SIGTERM (domid = 31) Oct 28 18:08:04 xen5 xenopsd-xc: [ info|xen5|26 |Async.VM.pool_migrate R:8d24f91b7cb1|xenops] removing core files from /var/xen/qemu: ignoring exception (Unix.Unix_error "No such file or directory" rmdir /var/xen/qemu/21226) Oct 28 18:08:04 xen5 xenopsd-xc: [debug|xen5|26 |Async.VM.pool_migrate R:8d24f91b7cb1|xenops_server] TASK.signal 157 (object deleted) Oct 28 18:08:04 xen5 xenopsd-xc: [debug|xen5|26 |Async.VM.pool_migrate R:8d24f91b7cb1|xenops_server] Performing: ["Parallel","0c19e8aa-24d2-a6a4-5677-8ef4af6dd592","VBD.unplug vm=0c19e8aa-24d2-a6a4-5677-8ef4af6dd592",[["VBD_unplug",["0c19e8aa-24d2-a6a4-5677-8ef4af6dd592","xvda"],true],["VBD_unplug",["0c19e8aa-24d2-a6a4-5677-8ef4af6dd592","xvdd"],true]]] Oct 28 18:08:04 xen5 xenopsd-xc: [debug|xen5|26 |Async.VM.pool_migrate R:8d24f91b7cb1|xenops_server] begin_Parallel:task=157.atoms=2.(VBD.unplug vm=0c19e8aa-24d2-a6a4-5677-8ef4af6dd592) Oct 28 18:08:04 xen5 xenopsd-xc: [debug|xen5|26 |Async.VM.pool_migrate R:8d24f91b7cb1|xenops_server] queue_atomics_and_wait: Parallel:task=157.atoms=2.(VBD.unplug vm=0c19e8aa-24d2-a6a4-5677-8ef4af6dd592): chunk of 2 atoms
There seems to be an issue in
emu-manager
if the VM is under load.I want to test this again with XCP-ng 7.6,
but I don't have any other hardware laying around and I can't update the test pool at work for various non-technical reasonsEdit:
Just builded a testpool (XCP-ng 7.6) with same VM to test migration:
- VM idle: no problems
- VM
stress --cpu 1
has the same behaviour
Oct 28 21:40:30 xen4 xenopsd-xc: [error|xen4|33 |Async.VM.pool_migrate R:27b04736b62f|emu-manager] Memory F 32865556 KiB S 0 KiB T 40958 MiB
VM is killed and off.
-
Pinging @johnelse regarding your last post, I think it's very interesting
-
@borzel said in XCP-ng 7.6 RC1 available:
stress --cpu 1
I can reproduce the bug with this command indeed. Thanks @borzel now we have something to reproduce, hence it will be easier to test some fixes.
-
7.6 has been released: https://xcp-ng.org/blog/2018/10/31/xcp-ng-7-6/
The migration issue that is being discussed here will be adressed by an update as soon as we have a fix.
-
Congratulations #TeamXCPNG
-
@olivierlambert how in the bloody hell could have happened that you released xcp-ng 7.6 with this bug? Also why not saying a word about it here https://xcp-ng.org/blog/2018/10/31/xcp-ng-7-6/?
-
That's because the bug is likely already in 7.5, and we plan to patch it ASAP (next week?)