XCP-ng 7.5.0 final is here

borzel

Maybe this is something someone could test with XS 7.5.

It looks to me as if the pv-drivers from XS 7.4 or before (because we didn't deliver windows drivers ever) causing problems on the newer xen version in XCP-ng 7.5 .... so maybe the same issue is happening in XS 7.5 (without upgrading the drivers)... and not upgrading the drivers is not recommended as I recall the upgrade instructions from Citrix

...it's just a guess from my feeling

By the way, I just found the smiley bar:

dsiminiuk

I exported all my VMs to XVA. Wiped disks, installed anew from 7.4, updated to latest with yum, imported VMs.
Everything works as it should; live migrations, everything.

olivierlambert

Yeah but 7.4 won't be maintained, so it would be better to try to see if it's a XS issue or not.

john205

I had a live migration issue with windows too.

Secondary host running XCP-ng 7.5, primary XS7.1, live-migrated the Win 2K12 VM from primary host to secondary fine, took 14mins. I updated XS7.1 to XCP-ng 7.5 and installed new Dell BIOS on primary host. Migrate back hung at 99%. I left it about 45mins (3-times as long as the original migrate took) and noted it was still sending data between the two.. I wonder if it was stuck in some form of loop.

The logs on src host had this in logs a lot:

Aug 16 13:55:05 xen52 xenopsd-xc: [debug|xen52|37 |Async.VM.migrate_send R:c6ab32165e47|xenops_server] TASK.signal 1305 = ["Pending",0.99]
Aug 16 13:55:05 xen52 xapi: [debug|xen52|389 |org.xen.xapi.xenops.classic events D:333a26fc4942|xenops] Processing event: ["Task","1305"]
Aug 16 13:55:05 xen52 xapi: [debug|xen52|389 |org.xen.xapi.xenops.classic events D:333a26fc4942|xenops] xenops event on Task 1305
Aug 16 13:55:05 xen52 xenopsd-xc: [debug|xen52|37 |Async.VM.migrate_send R:c6ab32165e47|xenops] VM = 66ebe30b-2bd2-ae59-c225-54a625655d52; domid = 2; progress = 99 / 100

I read a bit on Citrix forums and decided to shutdown the VM. Doing so from XCP-ng center didn't work, so I did it from within the VM.

Upon shutdown it seemed to sort itself, but the VM came up as 'paused' on the primary node. This couldn't be resumed. A force restart from XCP-ng center jsut seemed to hang. I tried to cancel the task with xe task-cancel on hard_reboot and it didn't work so did a xe-toolstack-restart and this reset the VM state back to shutdown. It then booted normally...

Now the secondary host was running an old BIOS still with earlier microcode so not sure if it was related to that in anyway..

john

borzel

@john205 If you have these problems with Windows, can you please test your case with our testsigned drivers? Only if its not production:

https://xcp-ng.org/forum/topic/309/test-xcp-ng-7-5-0-windows-pv-drivers-and-management-agent

john205

@borzel unfortunately this is a production client server so can't do that easily. The two linux VM's moved over fine. Most of the VM's we have are Linux based also but I might be able to do it with an internal one on a different host but need to see if it can be moved easily as it has multiple network interfaces.

I think the one with the issue I mentioned before had XS7.1 drivers installed.

john205

ok, I've got one running XS 6.2 drivers I can test although it's got 100GB of disk so will take some time to do. Both ends are XCP-ng 7.5. I'll live migrate it and back with this driver first as a test and then try some others, see how it goes.

Edit: this vm is Win 2008R2 also where as previous which failed was 2K12.

john205

Ok migration of W2K8R2 with 6.2 drivers from XCP-ng 7.5 to XCP-ng 7.5 failed with msg:

Storage_Interface.Does_not_exist(_)

It also crashed the VM.

I think this could be related to https://bugs.xenserver.org/browse/XSO-785 as there is only 10GB free on the source storage repository so I don't think I can actually do the migration because of that (it's recommended to have 2x space available to migrate a VM, something that becomes increasingly harder with large VMs!).

olivierlambert

Ah damn! I think you spotted the issue then!

john205

@olivierlambert said in XCP-ng 7.5.0 final is here:

Ah damn! I think you spotted the issue then!

I think that is different to the original problem as there was enough disk space in that instance, but this non-production VM I was using to test doesn't have the space on the source host.

New problem I've come across since doing that failed migration, the message-switch process is now eating approx half of the memory on the server:

[root@host log]# free -m
total used free shared buff/cache available
Mem: 1923 1601 27 12 295 221
Swap: 511 26 485
[root@host log]# ps axuww | grep message
root 1920 0.2 52.4 1094032 1033144 ? Ssl Aug16 2:36 /usr/sbin/message-switch --config /etc/message-switch.conf
root 32669 0.0 0.1 112656 2232 pts/24 S+ 12:43 0:00 grep --color=auto message

Anyone know how to restart that?

olivierlambert

Restart the toolstack?

john205

Hm.. I did a toolstack restart and it still had the high memory usage after, but it has now since dropped..

root 1920 0.2 2.0 108860 39516 ? Ssl Aug16 2:39 /usr/sbin/message-switch --config /etc/message-switch.conf

Perhaps it takes a bit of time to clear it out.

dsiminiuk

@olivierlambert said in XCP-ng 7.5.0 final is here:

Yeah but 7.4 won't be maintained, so it would be better to try to see if it's a XS issue or not.

My bleeding edge attempt has been dulled. I'll try again later after seeing if the root cause is identified.

DBLogic

Hi
Has there been any update on this please?

Just tried updating our pool and after updating the master had failed migration of VM (Ubuntu) and then was unable to get VM to start (Missing VDI error). Only solution was to roll master back to 7.4 and everything working again.

Thanks

olivierlambert

Sorry on what exactly? Maybe it could be better to create a dedicated topic to see more clearly

Danp

@dblogic said in XCP-ng 7.5.0 final is here:

Just tried updating our pool and after updating the master had failed migration of VM (Ubuntu) and then was unable to get VM to start (Missing VDI error). Only solution was to roll master back to 7.4 and everything working again.

I ran into a similar situation when I upgraded to 7.5 this week. Did you check to see if there was a non-existent ISO mounted in the CD rom for the VM in question?

DBLogic

Hi. Apologies for delay in responding.
Yes I did check for that and there was none. It occurred on at least 2 VMS I tried. In one case just shutting a VM down and trying to restart gave the missing vdi error. Everything I checked (with my limited knowledge of the CLI) looked fine. As soon as I rolled back to 7.4 on the master the VMS started without any issue.
Its as if the upgrade applied some schema changes to the vdi linking in some way perhaps?

I did try a second time just updating the master but with the same result - stopping and trying to restart a VM on any host (we have 4) gave the vdi missing error. Again rollling back fixed the problem but leaves me worried about upgrading.

DracoAn

@danp said in XCP-ng 7.5.0 final is here:

@dblogic said in XCP-ng 7.5.0 final is here:

Just tried updating our pool and after updating the master had failed migration of VM (Ubuntu) and then was unable to get VM to start (Missing VDI error). Only solution was to roll master back to 7.4 and everything working again.

I ran into a similar situation when I upgraded to 7.5 this week. Did you check to see if there was a non-existent ISO mounted in the CD rom for the VM in question?

Thank you. That was my problem. I gave cd empty. Everything works.