Extrlemy slow backup speec = about few MB/s
-
I'm observing this issue from xcp-ng server setup - my virtual machines grows, and on the beginning it wasn't the problem. But the backup speed on xcp-ng and xen-orchestra are terribly slow. Currenlty my test show transfer speed about 1-3MB/s. xoa from source
xcp 7.6 + all updates2x dell r510, 128RAM, perc h700 witch 512MB cache, and 16x 8TB Wd Red disks. Servers are connecthed to the same GB switch (dlink dgs-1210-52).
The raw stack speed tested on Windows 10 installed those dell r510 servers was about 350/300 MB/s. Why on earth the transfer speed using xoa are so slow? To transfer about 500GB from one host to another (Disaster Recovery) xoa need EIGHT HOURS.
63_untangle (XCP02) Snapshot Start: Sep 29, 2019, 06:10:00 AM End: Sep 29, 2019, 06:10:04 AM x1-ext-sr01 (12.88 TiB free) - XCP01 transfer Start: Sep 29, 2019, 06:10:05 AM End: Sep 29, 2019, 06:57:37 AM Duration: an hour Size: 5.4 GiB Speed: 1.94 MiB/s Start: Sep 29, 2019, 06:10:05 AM End: Sep 29, 2019, 06:58:06 AM Duration: an hour Start: Sep 29, 2019, 06:10:00 AM End: Sep 29, 2019, 06:58:07 AM Duration: an hour 41_bookstack (XCP02) Snapshot Start: Sep 29, 2019, 06:58:07 AM End: Sep 29, 2019, 06:58:11 AM x1-ext-sr01 (12.88 TiB free) - XCP01 transfer Start: Sep 29, 2019, 06:58:12 AM End: Sep 29, 2019, 07:44:29 AM Duration: an hour Size: 6.93 GiB Speed: 2.55 MiB/s Start: Sep 29, 2019, 06:58:12 AM End: Sep 29, 2019, 07:45:01 AM Duration: an hour Start: Sep 29, 2019, 06:58:07 AM End: Sep 29, 2019, 07:45:02 AM Duration: an hour 43_RemoteOffice (XCP02) Snapshot Start: Sep 29, 2019, 07:45:02 AM End: Sep 29, 2019, 07:45:06 AM x1-ext-sr01 (12.88 TiB free) - XCP01 transfer Start: Sep 29, 2019, 07:45:07 AM End: Sep 29, 2019, 08:37:55 AM Duration: an hour Size: 9.99 GiB Speed: 3.23 MiB/s Start: Sep 29, 2019, 07:45:07 AM End: Sep 29, 2019, 08:37:55 AM Duration: an hour Start: Sep 29, 2019, 07:45:02 AM End: Sep 29, 2019, 08:37:56 AM Duration: an hour 22_dc02 (XCP02) Snapshot Start: Sep 29, 2019, 06:10:00 AM End: Sep 29, 2019, 06:10:32 AM x1-ext-sr01 (12.88 TiB free) - XCP01 transfer Start: Sep 29, 2019, 06:10:33 AM End: Sep 29, 2019, 09:34:27 AM Duration: 3 hours Size: 31.01 GiB Speed: 2.6 MiB/s Start: Sep 29, 2019, 06:10:33 AM End: Sep 29, 2019, 09:37:19 AM Duration: 3 hours Start: Sep 29, 2019, 06:10:00 AM End: Sep 29, 2019, 09:37:38 AM Duration: 3 hours 51_kmswroclaw-DEV (XCP02) Snapshot Start: Sep 29, 2019, 08:37:56 AM End: Sep 29, 2019, 08:38:05 AM x1-ext-sr01 (12.88 TiB free) - XCP01 transfer Start: Sep 29, 2019, 08:38:05 AM End: Sep 29, 2019, 11:06:03 AM Duration: 2 hours Size: 1.23 GiB Speed: 145.25 kiB/s Start: Sep 29, 2019, 08:38:05 AM End: Sep 29, 2019, 11:06:04 AM Duration: 2 hours Start: Sep 29, 2019, 08:37:56 AM End: Sep 29, 2019, 11:06:06 AM Duration: 2 hours
-
- If it's from the sources, it's not XOA
XOA means "Xen Orchestra virtual Appliance"
- Remember your data has to transfer from host to XO, then to the remote.
- We don't know what kind of remote you are using. SMB, NFS? Ask @nikade about transfer speed, but clearly, there's something wrong about your setup
- If it's from the sources, it's not XOA
-
- How many VM's do you have running on this host?
- What does "top" show on the two hosts when a backup-job is running?
- XO is very much depending on the storage you are backing up to, what kind of remote are you using and are you able to benchmark it with something else than xcp-ng?
-
@nikade said in Extrlemy slow backup speec = about few MB/s:
benchmark it with something else than xcp-ng?
windows 10 - 350 MB/s read, 300MB write - its local storage, RAID10 - its super fast on windows. I'm performing DR backup - so this is vm export - from one r510 to second r510 (the same hardware specs).
typical usage on servers:
xcp1
[root@XCP01 ~]# top top - 13:07:10 up 16 days, 9:15, 1 user, load average: 0.18, 0.26, 0.33 Tasks: 398 total, 1 running, 397 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.7 us, 1.2 sy, 0.2 ni, 97.2 id, 0.6 wa, 0.0 hi, 0.0 si, 0.1 st KiB Mem : 16338996 total, 141804 free, 1804124 used, 14393068 buff/cache KiB Swap: 1048572 total, 1029712 free, 18860 used. 14351156 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 8870 root 20 0 157948 4648 3640 R 15.8 0.0 0:00.06 top 2916 root 10 -10 1304628 156244 9808 S 5.3 1.0 336:20.12 ovs-vswitchd 20715 65540 20 0 251116 14496 9308 S 5.3 0.1 262:15.90 qemu-system-i38 26488 65594 20 0 223468 15744 9420 S 5.3 0.1 65:29.21 qemu-system-i38 1 root 20 0 41828 4876 3092 S 0.0 0.0 17:09.84 systemd 2 root 20 0 0 0 0 S 0.0 0.0 0:01.68 kthreadd 3 root 20 0 0 0 0 S 0.0 0.0 0:15.35 ksoftirqd/0 5 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/0:0H 7 root 20 0 0 0 0 S 0.0 0.0 47:51.95 rcu_sched 8 root 20 0 0 0 0 S 0.0 0.0 0:00.00 rcu_bh 9 root rt 0 0 0 0 S 0.0 0.0 0:02.70 migration/0 10 root rt 0 0 0 0 S 0.0 0.0 0:08.26 watchdog/0 11 root rt 0 0 0 0 S 0.0 0.0 0:08.63 watchdog/1 12 root rt 0 0 0 0 S 0.0 0.0 0:04.06 migration/1 13 root 20 0 0 0 0 S 0.0 0.0 0:21.84 ksoftirqd/1 15 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/1:0H 16 root rt 0 0 0 0 S 0.0 0.0 0:08.18 watchdog/2 17 root rt 0 0 0 0 S 0.0 0.0 0:01.97 migration/2 18 root 20 0 0 0 0 S 0.0 0.0 0:15.07 ksoftirqd/2 20 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/2:0H 21 root rt 0 0 0 0 S 0.0 0.0 0:08.68 watchdog/3 22 root rt 0 0 0 0 S 0.0 0.0 0:03.98 migration/3 23 root 20 0 0 0 0 S 0.0 0.0 0:12.62 ksoftirqd/3 25 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/3:0H 26 root rt 0 0 0 0 S 0.0 0.0 0:08.18 watchdog/4 27 root rt 0 0 0 0 S 0.0 0.0 0:02.07 migration/4 28 root 20 0 0 0 0 S 0.0 0.0 0:21.16 ksoftirqd/4 30 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/4:0H 31 root rt 0 0 0 0 S 0.0 0.0 0:08.54 watchdog/5 32 root rt 0 0 0 0 S 0.0 0.0 0:03.54 migration/5 33 root 20 0 0 0 0 S 0.0 0.0 0:14.32 ksoftirqd/5 35 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/5:0H 36 root rt 0 0 0 0 S 0.0 0.0 0:08.19 watchdog/6 37 root rt 0 0 0 0 S 0.0 0.0 0:02.19 migration/6 38 root 20 0 0 0 0 S 0.0 0.0 0:32.86 ksoftirqd/6 40 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/6:0H 41 root rt 0 0 0 0 S 0.0 0.0 0:08.65 watchdog/7 42 root rt 0 0 0 0 S 0.0 0.0 0:03.60 migration/7 43 root 20 0 0 0 0 S 0.0 0.0 0:24.38 ksoftirqd/7 45 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/7:0H 46 root rt 0 0 0 0 S 0.0 0.0 0:08.16 watchdog/8 47 root rt 0 0 0 0 S 0.0 0.0 0:02.05 migration/8 48 root 20 0 0 0 0 S 0.0 0.0 0:24.35 ksoftirqd/8
xcp2
[root@XCP02 ~]# top top - 13:08:01 up 17 days, 4:38, 1 user, load average: 0.95, 0.48, 0.38 Tasks: 375 total, 1 running, 374 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.6 us, 1.1 sy, 0.0 ni, 97.1 id, 1.0 wa, 0.0 hi, 0.0 si, 0.1 st KiB Mem : 8126876 total, 82472 free, 1221544 used, 6822860 buff/cache KiB Swap: 1048572 total, 1033124 free, 15448 used. 6705140 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 10337 root 20 0 157948 4604 3660 R 16.7 0.1 0:00.06 top 2809 root 10 -10 1304492 154952 9808 S 5.6 1.9 319:01.78 ovs-vswitchd 6600 65537 20 0 170220 10520 5180 S 5.6 0.1 260:22.48 qemu-system-i38 15215 65540 20 0 231660 11432 5208 S 5.6 0.1 348:15.28 qemu-system-i38 1 root 20 0 41896 5200 3516 S 0.0 0.1 17:40.30 systemd 2 root 20 0 0 0 0 S 0.0 0.0 0:02.38 kthreadd 3 root 20 0 0 0 0 S 0.0 0.0 0:09.02 ksoftirqd/0 5 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/0:0H 7 root 20 0 0 0 0 S 0.0 0.0 52:24.75 rcu_sched 8 root 20 0 0 0 0 S 0.0 0.0 0:00.00 rcu_bh 9 root rt 0 0 0 0 S 0.0 0.0 0:02.10 migration/0 10 root rt 0 0 0 0 S 0.0 0.0 0:08.56 watchdog/0 11 root rt 0 0 0 0 S 0.0 0.0 0:09.10 watchdog/1 12 root rt 0 0 0 0 S 0.0 0.0 0:03.78 migration/1 13 root 20 0 0 0 0 S 0.0 0.0 0:06.55 ksoftirqd/1 15 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/1:0H 16 root rt 0 0 0 0 S 0.0 0.0 0:08.58 watchdog/2 17 root rt 0 0 0 0 S 0.0 0.0 0:01.64 migration/2 18 root 20 0 0 0 0 S 0.0 0.0 0:12.46 ksoftirqd/2 20 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/2:0H 21 root rt 0 0 0 0 S 0.0 0.0 0:09.03 watchdog/3 22 root rt 0 0 0 0 S 0.0 0.0 0:03.76 migration/3 23 root 20 0 0 0 0 S 0.0 0.0 0:16.58 ksoftirqd/3 25 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/3:0H 26 root rt 0 0 0 0 S 0.0 0.0 0:08.59 watchdog/4 27 root rt 0 0 0 0 S 0.0 0.0 0:01.94 migration/4 28 root 20 0 0 0 0 S 0.0 0.0 0:07.93 ksoftirqd/4 30 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/4:0H 31 root rt 0 0 0 0 S 0.0 0.0 0:09.13 watchdog/5 32 root rt 0 0 0 0 S 0.0 0.0 0:03.46 migration/5 33 root 20 0 0 0 0 S 0.0 0.0 0:04.52 ksoftirqd/5 35 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/5:0H 36 root rt 0 0 0 0 S 0.0 0.0 0:08.55 watchdog/6 37 root rt 0 0 0 0 S 0.0 0.0 0:01.84 migration/6 38 root 20 0 0 0 0 S 0.0 0.0 0:16.49 ksoftirqd/6 40 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/6:0H 41 root rt 0 0 0 0 S 0.0 0.0 0:09.18 watchdog/7 42 root rt 0 0 0 0 S 0.0 0.0 0:03.76 migration/7 43 root 20 0 0 0 0 S 0.0 0.0 0:03.27 ksoftirqd/7 45 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/7:0H 46 root rt 0 0 0 0 S 0.0 0.0 0:08.55 watchdog/8 47 root rt 0 0 0 0 S 0.0 0.0 0:01.76 migration/8 48 root 20 0 0 0 0 S 0.0 0.0 0:09.96 ksoftirqd/8 50 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/8:0H 51 root rt 0 0 0 0 S 0.0 0.0 0:08.99 watchdog/9 52 root rt 0 0 0 0 S 0.0 0.0 0:03.81 migration/9
-
Thanks for the output. Is this "top" running while you are doing a DR? I dont see any processes related to copying vdi's or even sparse_dd.
-
This is on idle, I'll run while DR backup and post later the results.
@olivierlambert "Remember your data has to transfer from host to XO, then to the remote." - is there any data write on XenOrchestra VM while backup is in progres or strict data transfer in RAM?
-
There's no intermediate storage on XOA at all. It's a stream
-
So why it is so slow? the XenOrchestra in on the same host where source VMs are. When I export the vm from command line the export speed is about 210MB/s within the server's local storage. For the test I've setup this DR backup:
- the test is within one dell r510 server with perc h700
- local storage (8x8TB wd reds)
- two raids - raid10 and raid5
- xen orchestra sits on raid10 volume
- DR backup test - from raid10 to raid5 volume
- Two Intel X5675 CPUS, 128 GB RAM
The results:
Top on idle (11 vm running on test r510 host):
Last login: Fri Oct 4 13:07:07 2019 from 10.0.0.102 [root@XCP01 ~]# top top - 09:57:08 up 19 days, 6:05, 1 user, load average: 0.12, 0.16, 0.29 Tasks: 400 total, 1 running, 399 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.2 us, 0.5 sy, 0.0 ni, 99.0 id, 0.1 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem : 16338996 total, 4881352 free, 1815788 used, 9641856 buff/cache KiB Swap: 1048572 total, 1024492 free, 24080 used. 14357448 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2916 root 10 -10 1304628 156508 9808 S 1.6 1.0 395:05.17 ovs-vswitchd 4093 65545 20 0 234732 15512 9212 S 1.6 0.1 382:26.90 qemu-system-i38 26488 65594 20 0 225516 15692 9368 S 1.6 0.1 128:18.99 qemu-system-i38 27402 65595 20 0 208108 15644 9356 S 1.6 0.1 129:11.60 qemu-system-i38 22063 65542 20 0 237804 14724 9428 S 1.3 0.1 405:49.63 qemu-system-i38 22834 65544 20 0 207084 14328 9248 S 1.3 0.1 312:12.40 qemu-system-i38 582 65601 20 0 215276 14716 9388 S 1.0 0.1 50:00.18 qemu-system-i38 1873 root 20 0 63156 10240 5092 S 1.0 0.1 348:53.94 forkexecd 12452 65589 20 0 260332 14856 9524 S 1.0 0.1 114:15.44 qemu-system-i38 20715 65540 20 0 251116 14492 9304 S 1.0 0.1 311:07.99 qemu-system-i38 30977 root 20 0 157952 4680 3640 R 1.0 0.0 0:00.14 top 31518 65551 20 0 251116 14628 9448 S 1.0 0.1 278:13.51 qemu-system-i38 2463 root 20 0 1067864 64084 4508 S 0.7 0.4 168:41.81 xcp-rrdd 20150 65569 20 0 255212 15400 9288 S 0.7 0.1 117:21.64 qemu-system-i38 7 root 20 0 0 0 0 S 0.3 0.0 56:30.75 rcu_sched 1930 root 20 0 295760 9124 1424 S 0.3 0.1 51:41.59 xapi-storage-sc 2733 root 20 0 64744 13248 3444 S 0.3 0.1 23:31.58 xcp-rrdd-xenpm 2880 root 10 -10 46960 5084 3536 S 0.3 0.0 32:57.58 ovsdb-server 3041 root 20 0 528088 21560 4556 S 0.3 0.1 52:53.53 xcp-networkd 4495 root 20 0 195716 10584 1112 S 0.3 0.1 19:33.88 mpathalert 20196 root 20 0 42428 6204 2832 S 0.3 0.0 21:35.79 tapdisk 1 root 20 0 41828 4856 3072 S 0.0 0.0 20:19.98 systemd 2 root 20 0 0 0 0 S 0.0 0.0 0:02.00 kthreadd 3 root 20 0 0 0 0 S 0.0 0.0 0:20.00 ksoftirqd/0 5 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/0:0H
The top during DR test:
top - 10:00:03 up 19 days, 6:08, 1 user, load average: 1.85, 0.59, 0.41 Tasks: 407 total, 1 running, 406 sleeping, 0 stopped, 0 zombie %Cpu(s): 7.6 us, 4.8 sy, 0.0 ni, 78.1 id, 7.8 wa, 0.0 hi, 1.1 si, 0.5 st KiB Mem : 16338996 total, 2498564 free, 1834260 used, 12006172 buff/cache KiB Swap: 1048572 total, 1024492 free, 24080 used. 14336504 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 4421 root 20 0 3589768 9664 3924 S 81.2 0.1 628:48.72 stunnel 3510 root 20 0 5924260 1.010g 12020 S 64.5 6.5 1847:36 xapi 20408 root 20 0 0 0 0 S 17.1 0.0 162:52.33 vif34.0-q0-gues 32544 root 20 0 46716 10520 2800 S 16.4 0.1 0:04.61 tapdisk 20415 root 20 0 0 0 0 S 10.2 0.0 21:33.92 vif34.0-q3-deal 3723 root 20 0 45164 8716 3620 S 6.2 0.1 23:42.15 tapdisk 32664 root 20 0 40800 4604 2772 D 5.9 0.0 0:02.07 tapdisk 2916 root 10 -10 1304628 156508 9808 S 2.3 1.0 395:08.61 ovs-vswitchd 4093 65545 20 0 234732 15512 9212 S 2.0 0.1 382:29.57 qemu-system-i38 22063 65542 20 0 237804 14724 9428 S 2.0 0.1 405:52.32 qemu-system-i38 582 65601 20 0 215276 14716 9388 S 1.6 0.1 50:02.25 qemu-system-i38 20715 65540 20 0 251116 14492 9304 S 1.6 0.1 311:10.05 qemu-system-i38 26488 65594 20 0 225516 15692 9368 S 1.6 0.1 128:21.63 qemu-system-i38 27402 65595 20 0 208108 15644 9356 S 1.6 0.1 129:14.28 qemu-system-i38 31518 65551 20 0 251116 14628 9448 S 1.6 0.1 278:15.57 qemu-system-i38 12452 65589 20 0 260332 14856 9524 S 1.3 0.1 114:17.52 qemu-system-i38 20150 65569 20 0 255212 15400 9288 S 1.3 0.1 117:23.76 qemu-system-i38 22834 65544 20 0 207084 14328 9248 S 1.3 0.1 312:14.46 qemu-system-i38 32754 root 20 0 157952 4640 3596 R 1.3 0.0 0:00.30 top 1 root 20 0 41828 4856 3072 S 1.0 0.0 20:20.14 systemd 1791 root 20 0 272072 3388 2904 S 1.0 0.0 39:36.45 rsyslogd
and little later TAPDISK causes 100% disk usage:
top - 10:00:45 up 19 days, 6:08, 1 user, load average: 2.69, 0.98, 0.55 Tasks: 407 total, 2 running, 405 sleeping, 0 stopped, 0 zombie %Cpu(s): 1.7 us, 8.1 sy, 0.0 ni, 86.3 id, 3.8 wa, 0.0 hi, 0.0 si, 0.1 st KiB Mem : 16338996 total, 94932 free, 1828464 used, 14415600 buff/cache KiB Swap: 1048572 total, 1024492 free, 24080 used. 14343268 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 32544 root 20 0 46716 10560 2840 R 95.4 0.1 0:41.81 tapdisk 3510 root 20 0 5932456 1.010g 12020 S 40.9 6.5 1847:56 xapi 105 root 20 0 0 0 0 S 4.6 0.0 204:37.94 kswapd0 4093 65545 20 0 234732 15512 9212 S 2.0 0.1 382:30.26 qemu-system-i38 2916 root 10 -10 1304628 156508 9808 S 1.7 1.0 395:09.41 ovs-vswitchd 582 65601 20 0 215276 14716 9388 S 1.3 0.1 50:02.78 qemu-system-i38 668 root 20 0 157952 4744 3696 R 1.3 0.0 0:00.16 top 1950 root 20 0 70392 8136 3548 S 1.3 0.0 64:25.71 oxenstored 20715 65540 20 0 251116 14492 9304 S 1.3 0.1 311:10.58 qemu-system-i38 22063 65542 20 0 237804 14724 9428 S 1.3 0.1 405:52.99 qemu-system-i38 22834 65544 20 0 207084 14328 9248 S 1.3 0.1 312:15.00 qemu-system-i38 26488 65594 20 0 225516 15692 9368 S 1.3 0.1 128:22.28 qemu-system-i38 27402 65595 20 0 208108 15644 9356 S 1.3 0.1 129:14.96 qemu-system-i38 31292 root 20 0 270420 55960 13336 S 1.3 0.3 3:40.25 xcp-rrdd-iostat 1873 root 20 0 63156 10292 5144 S 1.0 0.1 348:56.48 forkexecd 12452 65589 20 0 260332 14856 9524 S 1.0 0.1 114:18.05 qemu-system-i38 20150 65569 20 0 255212 15400 9288 S 1.0 0.1 117:24.31 qemu-system-i38 31518 65551 20 0 251116 14628 9448 S 1.0 0.1 278:16.10 qemu-system-i38 1022 root 20 0 28616 2460 2240 S 0.7 0.0 29:25.36 systemd-journal 1791 root 20 0 272072 3388 2904 S 0.3 0.0 39:36.65 rsyslogd 1926 root 20 0 522448 11260 3244 S 0.3 0.1 21:13.46 squeezed 1930 root 20 0 295760 9124 1424 S 0.3 0.1 51:42.04 xapi-storage-sc 20408 root 20 0 0 0 0 S 0.3 0.0 162:53.10 vif34.0-q0-gues 26200 root 20 0 42336 6112 2832 S 0.3 0.0 15:24.10 tapdisk 30924 root 20 0 139916 9036 7740 S 0.3 0.1 0:00.15 sshd 1 root 20 0 41828 4856 3072 S 0.0 0.0 20:20.17 systemd 2 root 20 0 0 0 0 S 0.0 0.0 0:02.00 kthreadd 3 root 20 0 0 0 0 S 0.0 0.0 0:20.08 ksoftirqd/0 5 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/0:0H 7 root 20 0 0 0 0 S 0.0 0.0 56:31.34 rcu_sched 8 root 20 0 0 0 0 S 0.0 0.0 0:00.00 rcu_bh 9 root rt 0 0 0 0 S 0.0 0.0 0:03.30 migration/0 10 root rt 0 0 0 0 S 0.0 0.0 0:09.76 watchdog/0 11 root rt 0 0 0 0 S 0.0 0.0 0:10.25 watchdog/1
So it looks that the TAPDISK is the issue? The export speed in XenOrchestra is huge problem.
-
- Do you have compression enabled?
- It's unclear, you are talking about DR, you are using DR backup then right?
You have something wrong somewhere, but it's hard to spot right now "as is".
-
@olivierlambert
no compression, this is from backup pane:
-
- Try to disable HTTPS (in Settings server, add the host with the URL
http://<IP ADDR>
, disconnect/reconnect and check the DR speed after - I don't understand, are you DRing to the same host?
- Try to disable HTTPS (in Settings server, add the host with the URL
-
@olivierlambert
yes for the test I'm doing the DR test backup within one server to eliminate any network related issues, as I wrote before - raw file copy from raid10 to raid5 is about 210MB/s on this host so I expected that the DR backup will be done with similar speed.Olivier do You know anything about TAPDISK high CPU usage in xcp-ng ? this could be the caouse of slow backups?
-
This is related to VM storage. It's likely the VDI attached to the dom0 for the export.
-
dell r510 connected to xenorchestra thru http://ip
top - 10:39:43 up 19 days, 6:47, 1 user, load average: 2.02, 1.25, 1.48 Tasks: 409 total, 3 running, 406 sleeping, 0 stopped, 0 zombie %Cpu(s): 2.7 us, 9.0 sy, 0.0 ni, 84.5 id, 3.7 wa, 0.0 hi, 0.0 si, 0.1 st KiB Mem : 16338996 total, 91748 free, 1847752 used, 14399496 buff/cache KiB Swap: 1048572 total, 1024492 free, 24080 used. 14326732 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 19196 root 20 0 46716 10520 2800 R 95.7 0.1 1:06.16 tapdisk 3510 root 20 0 7906732 1.025g 13808 S 45.1 6.6 1862:07 xapi 19913 root 20 0 127984 17948 7296 R 16.1 0.1 0:00.49 ISOSR 105 root 20 0 0 0 0 S 4.9 0.0 206:12.14 kswapd0 1873 root 20 0 63156 10292 5144 S 2.3 0.1 349:22.90 forkexecd 2916 root 10 -10 1304628 156508 9808 S 2.3 1.0 395:58.63 ovs-vswitc+ 4093 65545 20 0 234732 15512 9212 S 1.6 0.1 383:06.39 qemu-syste+ 22063 65542 20 0 237804 14724 9428 S 1.6 0.1 406:29.39 qemu-syste+ 27402 65595 20 0 208108 15644 9356 S 1.6 0.1 129:51.15 qemu-syste+ 582 65601 20 0 215276 14716 9388 S 1.3 0.1 50:30.96 qemu-syste+ 12452 65589 20 0 260332 14856 9524 S 1.3 0.1 114:46.29 qemu-syste+ 16530 65602 20 0 215276 15340 9256 S 1.3 0.1 0:16.89 qemu-syste+ 22834 65544 20 0 207084 14328 9248 S 1.3 0.1 312:43.08 qemu-syste+ 26488 65594 20 0 225516 15692 9368 S 1.3 0.1 128:58.17 qemu-syste+ 31292 root 20 0 258436 46792 13336 S 1.3 0.3 3:59.62 xcp-rrdd-i+ 31518 65551 20 0 251116 14628 9448 S 1.3 0.1 278:44.21 qemu-syste+ 2463 root 20 0 1067872 64076 4500 S 1.0 0.4 168:56.56 xcp-rrdd 18264 root 20 0 157952 4688 3644 R 1.0 0.0 0:01.58 top 20715 65540 20 0 251116 14492 9304 S 1.0 0.1 311:38.49 qemu-syste+ 1950 root 20 0 70392 8136 3548 S 0.7 0.0 64:41.95 oxenstored 7 root 20 0 0 0 0 S 0.3 0.0 56:39.08 rcu_sched 1022 root 20 0 28616 2460 2240 S 0.3 0.0 29:31.77 systemd-jo+ 1791 root 20 0 272072 3388 2904 S 0.3 0.0 39:45.22 rsyslogd 1915 root 20 0 296096 7024 2572 S 0.3 0.0 18:17.79 v6d 1930 root 20 0 295760 9124 1424 S 0.3 0.1 51:47.13 xapi-stora+ 3021 root 20 0 16272 2424 2192 S 0.3 0.0 5:38.92 lldpad 3041 root 20 0 528088 22684 4556 S 0.3 0.1 52:58.77 xcp-networ+ 8374 root 20 0 0 0 0 S 0.3 0.0 0:24.18 cifsd 13065 root 20 0 0 0 0 S 0.3 0.0 0:00.04 kworker/14+ 20196 root 20 0 42672 6308 2824 S 0.3 0.0 21:38.64 tapdisk 22399 root 20 0 43792 7568 2832 S 0.3 0.0 33:32.07 tapdisk 23958 root 20 0 0 0 0 S 0.3 0.0 1:42.64 vif7.0-q1-+ 25141 root 20 0 0 0 0 S 0.3 0.0 0:19.15 kworker/7:2 26200 root 20 0 42336 6112 2832 S 0.3 0.0 15:30.53 tapdisk 31278 root 20 0 44192 7968 2832 S 0.3 0.0 3:37.06 tapdisk 1 root 20 0 41828 4856 3072 S 0.0 0.0 20:22.00 systemd 2 root 20 0 0 0 0 S 0.0 0.0 0:02.00 kthreadd
-
Can you give me number in plain text so I haven't to find the value in screenshots? Thanks
-
Eveything except the nmon is in text, here is nmon:
These are the values during the DR backup problem:
lnmonq16eqqqqqq[H for help]qqqHostname=XCP01qqqqqqqqRefresh= 2secs qqq11:59.13qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqk x CPU Utilisation qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq x x---------------------------+-------------------------------------------------+ x xCPU User% Sys% Wait% Steal|0 |25 |50 |75 100| x x 1 1.5 5.1 8.1 0.5|ssWWWW > | x x 2 0.5 0.0 0.0 99.5| > | x x 3 2.5 7.1 6.6 83.8|UsssWWW> | x x 4 1.5 2.0 0.0 96.5|s> | x x 5 5.1 51.8 1.0 42.1|UUsssssssssssssssssssssssss > | x x 6 0.5 1.0 0.0 98.5|> | x x 7 4.5 17.0 7.0 0.5|UUssssssssWWW> | x x 8 5.0 6.5 12.4 0.5|UUsssWWWWWW> | x x 9 6.2 29.2 12.3 52.3|UUUssssssssssssssWWWWWW> | x x 10 0.0 1.0 0.0 99.0|> | x x 11 1.5 3.5 4.0 0.5|sWW> | x x 12 0.0 0.5 0.0 99.5|> | x x 13 0.5 4.0 0.5 95.0|ss > | x x 14 0.5 2.0 0.0 97.5|> | x x 15 2.0 4.5 5.0 0.5|UssWW> | x x 16 0.5 0.5 0.0 99.0| > | x x---------------------------+-------------------------------------------------+ x xAvg 2.0 8.5 3.6 0.1|UssssW> | x x---------------------------+-------------------------------------------------+ x x Network I/O qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq x xI/F Name Recv=KB/s Trans=KB/s packin packout insize outsize Peak->Recv Trans x x vif54.0 0.0 2.3 0.0 25.4 0.0 93.6 13.4 8.6 x x lo 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 x x eth3 0.9 0.1 8.0 1.5 115.0 66.0 4.6 1.6 x x vif10.0 0.0 2.3 0.0 25.4 0.0 93.6 0.2 5.8 x x vif60.0 0.0 2.3 0.0 25.4 0.0 93.6 0.0 5.5 x x eth2 1.4 0.1 7.5 1.0 186.3 137.0 13.1 1.0 x x vif9.0 0.0 2.3 0.0 25.4 0.0 93.6 0.6 9.0 x x vif5.0 0.0 2.3 0.0 25.4 0.0 93.6 0.0 5.5 x x vif67.0 0.4 3.3 2.0 27.9 204.8 122.2 1038.5 1050.0 x x vif59.0 0.2 2.5 2.0 27.4 118.5 92.3 1.0 6.2 x x vif16.0 0.0 2.3 0.0 25.4 0.0 93.6 0.0 5.5 x xovs-syste 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 x x xenbr0 2.4 9.6 30.9 8.0 78.7 1229.2 1051.3 1064.4 x x eth5 0.9 0.0 10.0 0.0 91.0 0.0 2.3 1085.7 x x vif66.0 0.0 2.3 0.0 25.4 0.0 93.6 0.0 5.5 x x eth1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 x x xenbr1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 x x xapi0 2.0 0.0 25.4 0.0 79.6 0.0 4.6 0.0 x x eth4 0.4 0.5 5.5 1.5 74.6 311.0 1080.7 13.0 x x eth0 2.7 8.7 27.9 8.0 97.6 1122.5 1101.5 1119.1 x x vif7.0 0.0 2.3 0.0 25.4 0.0 93.6 0.0 5.5 x x Network Error Counters qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq x xI/F Name iErrors iDrop iOverrun iFrame oErrors oDrop oOverrun oCarrier oColls x x vif54.0 0 0 0 0 0 0 0 0 0 x x lo 0 0 0 0 0 0 0 0 0 x x eth3 33 32 32 16 0 0 0 0 0 x x vif10.0 0 0 0 0 0 0 0 0 0 x x vif60.0 0 0 0 0 0 0 0 0 0 x x eth2 4 0 0 2 0 0 0 0 0 x x vif9.0 0 0 0 0 0 0 0 0 0
-
That's not what I'm telling
Everybody isn't used to those software, try to just sum up with the value you want to put in light. And why this value isn't what you expect.
-
For me the 100% cpu of TAPDISK is the problem, IDK what else to check, below result of DR test (backup within the same server) - backup speed: 3,61MB/s
28_Venus (XCP01) Snapshot Start: Oct 7, 2019, 11:56:48 AM End: Oct 7, 2019, 11:56:54 AM x1-ext-sr02 (14.43 TiB free) - XCP01 transfer Start: Oct 7, 2019, 11:57:11 AM End: Oct 7, 2019, 12:30:24 PM Duration: 33 minutes Size: 7.02 GiB Speed: 3.61 MiB/s Start: Oct 7, 2019, 11:57:11 AM End: Oct 7, 2019, 12:30:45 PM Duration: 34 minutes Start: Oct 7, 2019, 11:56:48 AM End: Oct 7, 2019, 12:30:45 PM Duration: 34 minutes
Olivier that do you tkink about that:
https://portal.hiveio.com/kb/KBA-001506-URaAAO.aspxRoot Cause:
When executing locally, XenServer's VIF uses up too much CPU, causing CPU contention with tapdisk. As a result, I/O throughput is much lower.Solution:
Enable TSO/GSO on the Volume VM via rc.local. This will result in the VIF using up much less CPU and giving tapdisk more resources to process I/O.
Instructions on how to change TSO/GSO settings can be found here: Performance tuning your USX environment. -
Anything in
dmesg
? -
with 'errors' only that:
[ 28.275518] Could not initialize VPMU for cpu 0, error -95 [ 28.275569] Performance Events: running under Xen, no PMU driver, software events only. [ 28.276154] NMI watchdog: disabled (cpu0): hardware events not enabled [ 28.276155] NMI watchdog: Shutting down hard lockup detector on all cpus
xen top: Domain-0 210% cpu usage?
[root@XCP01 ~]# xentop xentop - 14:22:32 Xen 4.7.6-6.6.xcpng 11 domains: 1 running, 10 blocked, 0 paused, 0 crashed, 0 dying, 0 shutdown Mem: 134204508k total, 78431792k used, 55772716k free CPUs: 24 @ 3066MHz NAME STATE CPU(sec) CPU(%) MEM(k) MEM(%) MAXMEM(k) MAXMEM(%) VCPUS NETS NETTX(k) NETRX(k) VBDS VBD_OO VBD_RD VBD_WR VBD_RSECT VBD_WSECT SSID 221_rc01 --b--- 16765 3.9 8388600 6.3 8389632 6.3 4 0 0 0 2 0 610009 1090329 29411455 34377572 0 223_rfs01 --b--- 53419 9.0 8388600 6.3 8389632 6.3 4 0 0 0 2 0 662551 1308614 35627468 50098513 0 225_gtlas --b--- 44368 2.4 8388600 6.3 8389632 6.3 4 0 0 0 2 0 4857299 1985548 374377550 77974490 0 228_henfs --b--- 20814 2.9 4194296 3.1 4195328 3.1 2 0 0 0 2 0 16850 2694099 668534 42791106 0 320_dsaomete --b--- 8001 0.7 4194296 3.1 4195328 3.1 2 0 0 0 2 0 405243 1103612 33437483 27560602 0 333_judsater --b--- 551 0.3 4194292 3.1 4195328 3.1 2 0 0 0 1 0 4590 4065 169846 53784 0 414_wd4310_s --b--- 52670 2.5 4194296 3.1 4195328 3.1 2 0 0 0 1 0 986147 2221354 39669481 69579321 0 445_dd3en9 --b--- 3430 0.3 4194292 3.1 4195328 3.1 2 0 0 0 1 0 6911 130167 313526 6857488 0 619_und13i_c --b--- 41818 3.1 4194292 3.1 4195328 3.1 2 0 0 0 1 0 11867 3998049 424982 69774736 0 18_xoa_x1 --b--- 1724 94.6 8388596 6.3 8389632 6.3 4 0 0 0 1 0 28555 2058 511456 52712 0 Domain-0 -----r 1022492 189.0 16777216 12.5 16777216 12.5 16 0 0 0 0 0 0 0 0 0 0 Delay Networks vBds Tmem VCPUs Repeat header Sort order Quit
also I've got a lot off errors on network level but in abovementioned example the data nevel leaves the xcp-ng host during the DR backup I presume?
tcpdump -i eth0 -v -nn| grep -i incorrect
222099 packets received by filter
1966 packets dropped by kernelBut I don't see any transfers drops to and from VMs within out netowrk - everything works over 100MB/s - IDK - if this is the way to find the root cause of the slow backup issue, but the switch does not report major problems on network:
simple hdparm (raid10 and raid5 volume):
/dev/sda: Timing cached reads: 10966 MB in 1.95 seconds = 5625.90 MB/sec Timing buffered disk reads: 1194 MB in 3.00 seconds = 397.78 MB/sec [root@XCP01 ~]# hdparm -tT /dev/sdc /dev/sdc: Timing cached reads: 11722 MB in 1.97 seconds = 5941.87 MB/sec Timing buffered disk reads: 1210 MB in 3.00 seconds = 402.90 MB/sec