Unable to copy a machine on the same host.
-
Hi Folks,
I'm having a world of problems trying to copy a machine on a test setup I'm evaluating. I have a linux box running on the machine and using XO have selected copy. The snapshot happens and tasks are started.. but nothing more happens. They sit at 0% completed until they eventually vanish.I have started two today and here is the screen shot of where they sit several hours later.
This is XO running from source and has the latest patches. XCP-NG is patched on the machine and is the latest version. The SR is a shared nfs storage pool.
I have another host running in another DC and have managed to start a machine there based on a backup just fine.
I have asked on the XO forum and @olivierlambert has tested this on his setup and its working, so has suggested it might be an issue with XCP itself.
I also just ran xe-toolstack-restart which removed the hung tasks, but restarting the copy produced the same thing (I also tried creating a machine from a snapshot with the same result).
I have also just discovered that there are two machines with that name, one halted and one running (is the second a snapshot?). Neither should be named that if it is the failed copy.
[root@xcp-oldyella log]# xe vm-list uuid=548bf883-7e2b-511a-1847-eecde0b16ab3 uuid ( RO) : 548bf883-7e2b-511a-1847-eecde0b16ab3 name-label ( RW): DarlingDownsPPDB power-state ( RO): halted [root@xcp-oldyella log]# xe vm-list uuid=5f90dcd2-f92a-bd55-634a-84d1872736d0 uuid ( RO) : 5f90dcd2-f92a-bd55-634a-84d1872736d0 name-label ( RW): DarlingDownsPPDB power-state ( RO): running
The command line produces the same result... sigh
xe vm-copy uuid=5f90dcd2-f92a-bd55-634a-84d1872736d0 sr-uuid=b7288376-5c37-5fcb-a3cc-fef74364bfef new-name-label=DarlingDownsDB
More tasks in the task list with 0% against them, and the command never returns.
I'm not sure where to go from here, but this is a bit of a show stopper.
Ideas?
Peter.
-
If it's triggered also by
xe
, it's indeed not a XO issue.Try to take a look at the logs when you start the command to see if you spot any error message.
-
OK, tried to run the command line version again and this time got the following in xensource.log
Apr 3 22:15:58 xcp-oldyella xapi: [ info|xcp-oldyella|27595 UNIX /var/lib/xcp/xapi||cli] xe vm-copy uuid=5f90dcd2-f92a-bd55-634a-84d1872736d0 sr-uuid=b7288376-5c37-5fcb-a3cc-fef74364bfef new-name-label=DarlingDownsDB username=root password=(omitted) Apr 3 22:15:58 xcp-oldyella xapi: [ info|xcp-oldyella|27595 UNIX /var/lib/xcp/xapi|session.login_with_password D:c816a2aa768d|xapi] Session.create trackid=e11966f780034f77cdd9c07de291314f pool=false uname=root originator=cli is_local_superuser=true auth_user_sid= parent=trackid=9834f5af41c964e225f24279aefe4e49 Apr 3 22:15:58 xcp-oldyella xapi: [debug|xcp-oldyella|27595 UNIX /var/lib/xcp/xapi|session.login_with_password D:c816a2aa768d|mscgen] xapi=>xapi [label="pool.get_all"]; Apr 3 22:15:58 xcp-oldyella xapi: [debug|xcp-oldyella|27596 UNIX /var/lib/xcp/xapi||dummytaskhelper] task dispatch:pool.get_all D:918652c94061 created by task D:c816a2aa768d Apr 3 22:15:58 xcp-oldyella xapi: [debug|xcp-oldyella|27595 UNIX /var/lib/xcp/xapi|VM.copy R:72edf37079f1|audit] VM.copy: VM = '5f90dcd2-f92a-bd55-634a-84d1872736d0 (DarlingDownsPPDB)'; new_name = 'DarlingDownsDB'; SR = 'b7288376-5c37-5fcb-a3cc-fef74364bfef (TheBeast1)' Apr 3 22:15:58 xcp-oldyella xapi: [debug|xcp-oldyella|27595 UNIX /var/lib/xcp/xapi|VM.copy R:72edf37079f1|xapi] Copying disks to SR: b7288376-5c37-5fcb-a3cc-fef74364bfef Apr 3 22:15:58 xcp-oldyella xapi: [debug|xcp-oldyella|27595 UNIX /var/lib/xcp/xapi|VM.copy R:72edf37079f1|mscgen] xapi=>xapi [label="session.slave_login"]; Apr 3 22:15:58 xcp-oldyella xapi: [debug|xcp-oldyella|27597 UNIX /var/lib/xcp/xapi||dummytaskhelper] task dispatch:session.slave_login D:92adcd309180 created by task R:72edf37079f1 Apr 3 22:15:58 xcp-oldyella xapi: [ info|xcp-oldyella|27597 UNIX /var/lib/xcp/xapi|session.slave_login D:0bc07e2bf822|xapi] Session.create trackid=feedcfb9da2e88d058dfa9003edadc77 pool=true uname= originator=xapi is_local_superuser=true auth_user_sid= parent=trackid=9834f5af41c964e225f24279aefe4e49 Apr 3 22:15:58 xcp-oldyella xapi: [debug|xcp-oldyella|27597 UNIX /var/lib/xcp/xapi|session.slave_login D:0bc07e2bf822|mscgen] xapi=>xapi [label="pool.get_all"]; Apr 3 22:15:58 xcp-oldyella xapi: [debug|xcp-oldyella|27598 UNIX /var/lib/xcp/xapi||dummytaskhelper] task dispatch:pool.get_all D:de732102bd57 created by task D:0bc07e2bf822 Apr 3 22:15:58 xcp-oldyella xapi: [debug|xcp-oldyella|27595 UNIX /var/lib/xcp/xapi|VM.copy R:72edf37079f1|mscgen] xapi=>xapi [label="VBD.get_record"]; Apr 3 22:15:58 xcp-oldyella xapi: [debug|xcp-oldyella|27599 UNIX /var/lib/xcp/xapi||dummytaskhelper] task dispatch:VBD.get_record D:de8a36833970 created by task R:72edf37079f1 Apr 3 22:15:58 xcp-oldyella xapi: [debug|xcp-oldyella|27595 UNIX /var/lib/xcp/xapi|VM.copy R:72edf37079f1|mscgen] xapi=>xapi [label="VBD.get_record"]; Apr 3 22:15:58 xcp-oldyella xapi: [debug|xcp-oldyella|27600 UNIX /var/lib/xcp/xapi||dummytaskhelper] task dispatch:VBD.get_record D:749b6a23bce2 created by task R:72edf37079f1 Apr 3 22:15:58 xcp-oldyella xapi: [debug|xcp-oldyella|27595 UNIX /var/lib/xcp/xapi|VM.copy R:72edf37079f1|mscgen] xapi=>xapi [label="Async.VDI.copy"]; Apr 3 22:15:58 xcp-oldyella xapi: [debug|xcp-oldyella|27601 UNIX /var/lib/xcp/xapi||dummytaskhelper] task dispatch:VDI.copy D:c6a64ceda36f created by task R:72edf37079f1 Apr 3 22:15:58 xcp-oldyella xapi: [ info|xcp-oldyella|27601 UNIX /var/lib/xcp/xapi|dispatch:VDI.copy D:c6a64ceda36f|taskhelper] task Async.VDI.copy R:99082f096367 (uuid:adb3644f-fa2e-eaa4-c5ce-24504241e114) created (trackid=feedcfb9da2e88d058dfa9003edadc77) by task R:72edf37079f1 Apr 3 22:15:58 xcp-oldyella xapi: [ info|xcp-oldyella|27602 |Async.VDI.copy R:99082f096367|dispatcher] spawning a new thread to handle the current task (trackid=feedcfb9da2e88d058dfa9003edadc77) Apr 3 22:15:58 xcp-oldyella xapi: [debug|xcp-oldyella|27602 |Async.VDI.copy R:99082f096367|audit] VDI.copy: VDI = 'ffc7a041-ff8d-42e1-8194-75327de3a184'; SR = 'b7288376-5c37-5fcb-a3cc-fef74364bfef (TheBeast1)'; base_vdi = 'invalid'; into_vdi = 'invalid' Apr 3 22:15:58 xcp-oldyella xapi: [debug|xcp-oldyella|27602 |Async.VDI.copy R:99082f096367|xapi] Caught exception while marking VDI for VDI.copy in message forwarder: OTHER_OPERATION_IN_PROGRESS: [ VDI; OpaqueRef:4fb03903-4d95-4249-87cd-b927f71f086b ] Apr 3 22:15:58 xcp-oldyella xapi: [debug|xcp-oldyella|27602 |Async.VDI.copy R:99082f096367|helpers] VDI.copy locking failed: caught transient failure OTHER_OPERATION_IN_PROGRESS: [ VDI; OpaqueRef:4fb03903-4d95-4249-87cd-b927f71f086b ] Apr 3 22:15:58 xcp-oldyella xapi: [debug|xcp-oldyella|27602 |Async.VDI.copy R:99082f096367|helpers] Waiting for up to 6.735385 seconds before retrying... Apr 3 22:15:58 xcp-oldyella xapi: [debug|xcp-oldyella|27595 UNIX /var/lib/xcp/xapi|VM.copy R:72edf37079f1|mscgen] xapi=>xapi [label="session.slave_login"]; Apr 3 22:15:58 xcp-oldyella xapi: [debug|xcp-oldyella|27603 UNIX /var/lib/xcp/xapi||dummytaskhelper] task dispatch:session.slave_login D:55e483ef9b42 created by task R:72edf37079f1 Apr 3 22:15:58 xcp-oldyella xapi: [ info|xcp-oldyella|27603 UNIX /var/lib/xcp/xapi|session.slave_login D:4d715d4d32cf|xapi] Session.create trackid=54457b3cb62c69c750d1baa6510f1616 pool=true uname= originator=xapi is_local_superuser=true auth_user_sid= parent=trackid=9834f5af41c964e225f24279aefe4e49 Apr 3 22:15:58 xcp-oldyella xapi: [debug|xcp-oldyella|27603 UNIX /var/lib/xcp/xapi|session.slave_login D:4d715d4d32cf|mscgen] xapi=>xapi [label="pool.get_all"]; Apr 3 22:15:58 xcp-oldyella xapi: [debug|xcp-oldyella|27604 UNIX /var/lib/xcp/xapi||dummytaskhelper] task dispatch:pool.get_all D:28f7d18ac003 created by task D:4d715d4d32cf Apr 3 22:15:58 xcp-oldyella xapi: [debug|xcp-oldyella|27595 UNIX /var/lib/xcp/xapi|VM.copy R:72edf37079f1|mscgen] xapi=>xapi [label="task.get_record"]; Apr 3 22:15:58 xcp-oldyella xapi: [debug|xcp-oldyella|27605 UNIX /var/lib/xcp/xapi||dummytaskhelper] task dispatch:task.get_record D:0f2f88a646d2 created by task R:72edf37079f1 Apr 3 22:15:58 xcp-oldyella xapi: [debug|xcp-oldyella|27595 UNIX /var/lib/xcp/xapi|VM.copy R:72edf37079f1|mscgen] xapi=>xapi [label="task.get_record"]; Apr 3 22:15:58 xcp-oldyella xapi: [debug|xcp-oldyella|27606 UNIX /var/lib/xcp/xapi||dummytaskhelper] task dispatch:task.get_record D:62a20eb48fe0 created by task R:72edf37079f1 Apr 3 22:15:58 xcp-oldyella xapi: [debug|xcp-oldyella|27595 UNIX /var/lib/xcp/xapi|VM.copy R:72edf37079f1|mscgen] xapi=>xapi [label="event.from"]; Apr 3 22:15:58 xcp-oldyella xapi: [debug|xcp-oldyella|27607 UNIX /var/lib/xcp/xapi||dummytaskhelper] task dispatch:event.from D:d549c967696c created by task R:72edf37079f1 Apr 3 22:15:58 xcp-oldyella xapi: [debug|xcp-oldyella|27595 UNIX /var/lib/xcp/xapi|VM.copy R:72edf37079f1|mscgen] xapi=>xapi [label="event.from"]; Apr 3 22:15:58 xcp-oldyella xapi: [debug|xcp-oldyella|27608 UNIX /var/lib/xcp/xapi||dummytaskhelper] task dispatch:event.from D:3ac54939abd2 created by task R:72edf37079f1 Apr 3 22:16:04 xcp-oldyella xapi: [debug|xcp-oldyella|27120 INET :::80||dummytaskhelper] task dispatch:event.from D:99c1c67946fb created by task D:ed4ac73754eb Apr 3 22:16:04 xcp-oldyella xapi: [debug|xcp-oldyella|27602 |Async.VDI.copy R:99082f096367|xapi] Caught exception while marking VDI for VDI.copy in message forwarder: OTHER_OPERATION_IN_PROGRESS: [ VDI; OpaqueRef:4fb03903-4d95-4249-87cd-b927f71f086b ] Apr 3 22:16:04 xcp-oldyella xapi: [debug|xcp-oldyella|27602 |Async.VDI.copy R:99082f096367|helpers] VDI.copy locking failed: caught transient failure OTHER_OPERATION_IN_PROGRESS: [ VDI; OpaqueRef:4fb03903-4d95-4249-87cd-b927f71f086b ] Apr 3 22:16:04 xcp-oldyella xapi: [debug|xcp-oldyella|27602 |Async.VDI.copy R:99082f096367|helpers] Waiting for up to 14.295223 seconds before retrying...
It looks as though there is already a task running somewhere... sigh... this is now really getting nasty.
P. -
@pnunn said in Unable to copy a machine on the same host.:
base_vdi = 'invalid'; into_vdi = 'invalid'
Not sure, but this doesn't look right. Are you having issues cloning all VMs or just a single one?
-
It seems to be a problem with all of the vm's (atleast the ones in the backup rotation, I haven't tried any of the test ones yet but will).
Peter.
-
Hi Guys,
OK, I finally managed to clone a machine using fast clone with the machine off. Will continue digging into this to try and work out why a simple copy wouldn't work.
-
It seems it could be related to something weird happening on your SR.
-
Yeh but unfortunate "something weird" could well be a show stopper. We are in the process of trying to work out weather to go with XCP-NG using our NFS shared storage or go with VMWare on the same storage. My strong preference is XCP, but I have to convince some others that its better than VMWare which is what they are used to. Something weird events are really not helping.
I'm not for a second saying its anything other than my lack of knowledge, but I don't want to paint us into a corner that I will cause me pain once I have production hardware and loads going.
-
All I can tell is that's not an usual issue. Never heard of problem of this kind before… So yeah, there is something weird. Could be related to NFS permissions maybe, or something between XCP-ng and your NFS host. It's not XCP-ng code directly, because otherwise everyone else using NFS will endure the same problems. So it's more a "combo" between XCP-ng and your NFS within your infrastructure.
Please read the wiki on where to check logs (especially SMlog) to spot potential hints on where's the issue.