arc1

arc1

Hi,
I have a question about migration of VMs.
If i put host into maintenance mode, VMs on that host are migrated one by one and it takes quite a bit of time to migrate all of them.
But if i do a manual migration, I can choose multiple (or all VMs) and when i click "migrate" all VMs start migration at the same time. That process is much faster than one by one migration.
Is there any particular reason for not migrating all VMs together on maintenance mode?

arc1

Thank you!
So in this case, if better option for me to join the hosts to existing pool and then shutdown VMs and start it on host with older CPUs.
Thank you again!

arc1

Thank you both for fast answer.
Currently I prefer the option with a separate pool, as I find it more transparent.
Only one question. So currently i have ISCSI storage on cluster with newer hosts.
Can i join same storage pools to new pool with older CPU? Or do i need seperate ISCSI storage pool for new pool in XCP?

Thank you!

arc1

Hi,
What about joining hosts with older Intel CPUs to pool?
So if i have one pool and VMs running on hosts with newer Intel CPU (Intel Xeon Gold 6XXX) can i join two host with older Intel CPU (Intel Xeon E5-2699 v3) to that pool?
Or is better practice to create new pool with hosts with older CPUs and then migrate the VMs over?

And could migration from newer to older even can be done? Or should i shutdown the VM and then start it on host with older CPU?

arc1

@nikade Yes, the MV is frozen without cpu activity.

arc1

@nikade 4cpu, 16ram and roughly 200gb disk.
10ping downtime was on test enviroment with slower speeds between hosts, so this explains longer freeze.
But on production 2x25gb lacp is still noticable freeze on VMs with more sensitive software (keepalived/etcd).Nothing too terrible we were just curious if this is normal behaviour.

arc1

@nikade @planedrop @zmk Thank you all for answering.
We did the test with RockyLinux, Centos 7, Ubuntu 22.04 and Windows Server 2022.
On the Windows Server we only loose a few pings (10 pings in testing enviroment) on Linux we see logs about VM freeze too.
Windows VM isn't busy at all, only test VM but we loose about 10 pings.

Vates support said that "depending on the load and the Ram size you can have some freeze of the VM during migration, unfortunately at the moment there is not a lot that can be done about that".

I'm just curious why @nikade and @planedrop don't get any freeze.

arc1

@nikade Yes, we loose 4 (+-1) ping usually.
The freeze occurs on Xen Orchestra VM too which is Debian 11 (only Debian wbased VM in our enviroment).

arc1

Hi, @nikade thank you for fast answer. Which OS are you using?

Here are information of my setup:
1. Are VM tools installed?
Yes they are - version 7.30.0-7.el9

2. What network speed do you have?
2x 25GB in LACP mode on hosts with 4 paths to ISCSI storage.

3. How much RAM does that XCP-NG host have assigned to dom0?
8gb - host has 768GB.

4. Are you using dynamic memory?
No, dynamic memory is not enabled (on vm advanced setting dynamic is: 16/16GiB).

Thank you for help again!

arc1

Hi,
We have XCP-ng 8.2.1 hosts with latest xen orchestra.
When we migrate VMs (mostly RockyLinux 9 hosts, some CentOS 7 too) there is a mini freeze of vm. VMs with databases/etcd or any other more sensitive programs they report an error for a shot moment. VM continue to work without any issue, but still, is there any solution to that freeze?
Kind regards!

Rockylinux9 error (On CentOS 7 we get similar error):

Aug 08 13:46:38 rocky9linux kernel: Freezing user space processes ... (elapsed 0.003 seconds) done.
Aug 08 13:46:38 rocky9linux kernel: OOM killer disabled.
Aug 08 13:46:38 rocky9linux kernel: Freezing remaining freezable tasks ... (elapsed 0.006 seconds) done.
Aug 08 13:46:38 rocky9linux kernel: ------------[ cut here ]------------
Aug 08 13:46:38 rocky9linux kernel: WARNING: CPU: 1 PID: 2176896 at kernel/workqueue.c:3162 __flush_work.isra.0+0x212/0x230
Aug 08 13:46:38 rocky9linux kernel: Modules linked in: tls nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rfkill ip_set nf_tables nfnetlink vfat fat ppdev joydev pcspkr bochs drm_vram_helper drm_ttm_helper ttm drm_kms_helper syscopyarea sysfillrect sysimgblt parport_pc parport i2c_piix4 drm fuse xfs libcrc32c sr_mod cdrom sg ata_generic ata_piix libata xen_netfront xen_blkfront crc32c_intel serio_raw dm_mirror dm_region_hash dm_log dm_mod
Aug 08 13:46:38 rocky9linux kernel: CPU: 1 PID: 2176896 Comm: kworker/u128:4 Kdump: loaded Tainted: G        W         -------  ---  5.14.0-362.8.1.el9_3.x86_64 #1
Aug 08 13:46:38 rocky9linux kernel: Hardware name: Xen HVM domU, BIOS 4.13 01/31/2024
Aug 08 13:46:38 rocky9linux kernel: Workqueue: events_unbound async_run_entry_fn
Aug 08 13:46:38 rocky9linux kernel: RIP: 0010:__flush_work.isra.0+0x212/0x230
Aug 08 13:46:38 rocky9linux kernel: Code: 8b 4d 00 4c 8b 45 08 89 ca 48 c1 e9 04 83 e2 08 83 e1 0f 83 ca 02 89 c8 48 0f ba 6d 00 03 e9 25 ff ff ff 0f 0b e9 4e ff ff ff <0f> 0b 45 31 ed e9 44 ff ff ff e8 df 89 b2 00 66 66 2e 0f 1f 84 00
Aug 08 13:46:38 rocky9linux kernel: RSP: 0018:ffffa2f1850afcb8 EFLAGS: 00010246
Aug 08 13:46:38 rocky9linux kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffffa9b929b7
Aug 08 13:46:38 rocky9linux kernel: RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff8d6487f2cb30
Aug 08 13:46:38 rocky9linux kernel: RBP: ffff8d6487f2cb30 R08: 0000000000000000 R09: ffff8d638e1021f4
Aug 08 13:46:38 rocky9linux kernel: R10: 000000000000000f R11: 000000000000000f R12: ffff8d6487f2cb30
Aug 08 13:46:38 rocky9linux kernel: R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000001
Aug 08 13:46:38 rocky9linux kernel: FS:  0000000000000000(0000) GS:ffff8d648a640000(0000) knlGS:0000000000000000
Aug 08 13:46:38 rocky9linux kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 08 13:46:38 rocky9linux kernel: CR2: 00007fd03d4be2a2 CR3: 000000000302c006 CR4: 00000000000206e0
Aug 08 13:46:38 rocky9linux kernel: Call Trace:
Aug 08 13:46:38 rocky9linux kernel:  <TASK>
Aug 08 13:46:38 rocky9linux kernel:  ? show_trace_log_lvl+0x1c4/0x2df
Aug 08 13:46:38 rocky9linux kernel:  ? show_trace_log_lvl+0x1c4/0x2df
Aug 08 13:46:38 rocky9linux kernel:  ? __cancel_work_timer+0x103/0x190
Aug 08 13:46:38 rocky9linux kernel:  ? __flush_work.isra.0+0x212/0x230
Aug 08 13:46:38 rocky9linux kernel:  ? __warn+0x81/0x110
Aug 08 13:46:38 rocky9linux kernel:  ? __flush_work.isra.0+0x212/0x230
Aug 08 13:46:38 rocky9linux kernel:  ? report_bug+0x10a/0x140
Aug 08 13:46:38 rocky9linux kernel:  ? handle_bug+0x3c/0x70
Aug 08 13:46:38 rocky9linux kernel:  ? exc_invalid_op+0x14/0x70
Aug 08 13:46:38 rocky9linux kernel:  ? asm_exc_invalid_op+0x16/0x20
Aug 08 13:46:38 rocky9linux kernel:  ? __flush_work.isra.0+0x212/0x230
Aug 08 13:46:38 rocky9linux kernel:  __cancel_work_timer+0x103/0x190
Aug 08 13:46:38 rocky9linux kernel:  ? set_next_entity+0xda/0x150
Aug 08 13:46:38 rocky9linux kernel:  drm_kms_helper_poll_disable+0x1e/0x40 [drm_kms_helper]
Aug 08 13:46:38 rocky9linux kernel:  drm_mode_config_helper_suspend+0x1c/0x80 [drm_kms_helper]
Aug 08 13:46:38 rocky9linux kernel:  pci_pm_freeze+0x53/0xc0
Aug 08 13:46:38 rocky9linux kernel:  ? __pfx_pci_pm_freeze+0x10/0x10
Aug 08 13:46:38 rocky9linux kernel:  dpm_run_callback+0x4c/0x140
Aug 08 13:46:38 rocky9linux kernel:  __device_suspend+0x112/0x470
Aug 08 13:46:38 rocky9linux kernel:  async_suspend+0x1b/0x90
Aug 08 13:46:38 rocky9linux kernel:  async_run_entry_fn+0x30/0x130
Aug 08 13:46:38 rocky9linux kernel:  process_one_work+0x1e5/0x3b0
Aug 08 13:46:38 rocky9linux kernel:  worker_thread+0x50/0x3a0
Aug 08 13:46:38 rocky9linux kernel:  ? __pfx_worker_thread+0x10/0x10
Aug 08 13:46:38 rocky9linux kernel:  kthread+0xe0/0x100
Aug 08 13:46:38 rocky9linux kernel:  ? __pfx_kthread+0x10/0x10
Aug 08 13:46:38 rocky9linux kernel:  ret_from_fork+0x2c/0x50
Aug 08 13:46:38 rocky9linux kernel:  </TASK>
Aug 08 13:46:38 rocky9linux kernel: ---[ end trace 18c4db6d6eef5f95 ]---
Aug 08 13:46:38 rocky9linux kernel: suspending xenstore...
Aug 08 13:46:38 rocky9linux kernel: xen:grant_table: Grant tables using version 1 layout
Aug 08 13:46:38 rocky9linux kernel: xen: --> irq=9, pirq=16
Aug 08 13:46:38 rocky9linux kernel: xen: --> irq=8, pirq=17
Aug 08 13:46:38 rocky9linux kernel: xen: --> irq=12, pirq=18
Aug 08 13:46:38 rocky9linux kernel: xen: --> irq=1, pirq=19
Aug 08 13:46:38 rocky9linux kernel: xen: --> irq=6, pirq=20
Aug 08 13:46:38 rocky9linux kernel: xen: --> irq=4, pirq=21
Aug 08 13:46:38 rocky9linux kernel: xen: --> irq=7, pirq=22
Aug 08 13:46:38 rocky9linux kernel: xen: --> irq=23, pirq=23
Aug 08 13:46:38 rocky9linux kernel: xen: --> irq=28, pirq=24
Aug 08 13:46:38 rocky9linux kernel: usb usb1: root hub lost power or was reset
Aug 08 13:46:38 rocky9linux kernel: ata2: found unknown device (class 0)
Aug 08 13:46:38 rocky9linux kernel: usb 1-2: reset full-speed USB device number 2 using uhci_hcd
Aug 08 13:46:38 rocky9linux kernel: OOM killer enabled.
Aug 08 13:46:38 rocky9linux kernel: Restarting tasks ... done.
Aug 08 13:46:38 rocky9linux NetworkManager[687]: <info>  [1723117598.8391] device (enX0): carrier: link connected
Aug 08 13:46:38 rocky9linux kernel: Setting capacity to 41943040
Aug 08 13:46:39 rocky9linux xe-daemon[669]: Trigger refresh after system resume

arc1

Hi,
I have a question about migration of VMs.
If i put host into maintenance mode, VMs on that host are migrated one by one and it takes quite a bit of time to migrate all of them.
But if i do a manual migration, I can choose multiple (or all VMs) and when i click "migrate" all VMs start migration at the same time. That process is much faster than one by one migration.
Is there any particular reason for not migrating all VMs together on maintenance mode?

arc1

@arc1

Best posts made by arc1

Latest posts made by arc1