Extrlemy slow backup speec = about few MB/s


  • XCP-ng Team

    Anything in dmesg?



  • with 'errors' only that:

    [   28.275518] Could not initialize VPMU for cpu 0, error -95
    [   28.275569] Performance Events: running under Xen, no PMU driver, software events only.
    [   28.276154] NMI watchdog: disabled (cpu0): hardware events not enabled
    [   28.276155] NMI watchdog: Shutting down hard lockup detector on all cpus
    

    xen top: Domain-0 210% cpu usage?

    
    [root@XCP01 ~]# xentop
    xentop - 14:22:32   Xen 4.7.6-6.6.xcpng
    11 domains: 1 running, 10 blocked, 0 paused, 0 crashed, 0 dying, 0 shutdown
    Mem: 134204508k total, 78431792k used, 55772716k free    CPUs: 24 @ 3066MHz
          NAME  STATE   CPU(sec) CPU(%)     MEM(k) MEM(%)  MAXMEM(k) MAXMEM(%) VCPUS NETS NETTX(k) NETRX(k) VBDS   VBD_OO   VBD_RD   VBD_WR  VBD_RSECT  VBD_WSECT SSID
       221_rc01 --b---      16765    3.9    8388600    6.3    8389632       6.3     4    0        0        0    2        0   610009  1090329   29411455   34377572    0
      223_rfs01 --b---      53419    9.0    8388600    6.3    8389632       6.3     4    0        0        0    2        0   662551  1308614   35627468   50098513    0
      225_gtlas --b---      44368    2.4    8388600    6.3    8389632       6.3     4    0        0        0    2        0  4857299  1985548  374377550   77974490    0
      228_henfs --b---      20814    2.9    4194296    3.1    4195328       3.1     2    0        0        0    2        0    16850  2694099     668534   42791106    0
    320_dsaomete --b---       8001    0.7    4194296    3.1    4195328       3.1     2    0        0        0    2        0   405243  1103612   33437483   27560602    0
    333_judsater --b---        551    0.3    4194292    3.1    4195328       3.1     2    0        0        0    1        0     4590     4065     169846      53784    0
    414_wd4310_s --b---      52670    2.5    4194296    3.1    4195328       3.1     2    0        0        0    1        0   986147  2221354   39669481   69579321    0
    445_dd3en9 --b---       3430    0.3    4194292    3.1    4195328       3.1     2    0        0        0    1        0     6911   130167     313526    6857488    0
    619_und13i_c --b---      41818    3.1    4194292    3.1    4195328       3.1     2    0        0        0    1        0    11867  3998049     424982   69774736    0
     18_xoa_x1 --b---       1724   94.6    8388596    6.3    8389632       6.3     4    0        0        0    1        0    28555     2058     511456      52712    0
      Domain-0 -----r    1022492  189.0   16777216   12.5   16777216      12.5    16    0        0        0    0        0        0        0          0          0    0
    
    
      Delay  Networks  vBds  Tmem  VCPUs  Repeat header  Sort order  Quit
    
    

    also I've got a lot off errors on network level but in abovementioned example the data nevel leaves the xcp-ng host during the DR backup I presume?

    tcpdump -i eth0 -v -nn| grep -i incorrect
    

    222099 packets received by filter
    1966 packets dropped by kernel

    But I don't see any transfers drops to and from VMs within out netowrk - everything works over 100MB/s - IDK - if this is the way to find the root cause of the slow backup issue, but the switch does not report major problems on network:
    084798e5-db3d-4bc5-a58a-1f27743edfa2-image.png

    simple hdparm (raid10 and raid5 volume):

    /dev/sda:
     Timing cached reads:   10966 MB in  1.95 seconds = 5625.90 MB/sec
     Timing buffered disk reads: 1194 MB in  3.00 seconds = 397.78 MB/sec
    [root@XCP01 ~]# hdparm -tT /dev/sdc
    /dev/sdc:
     Timing cached reads:   11722 MB in  1.97 seconds = 5941.87 MB/sec
     Timing buffered disk reads: 1210 MB in  3.00 seconds = 402.90 MB/sec
    
    


  • @olivierlambert

    can You take a look on this: when I copy the VM within the one server it runs as hell, there is no TAPDISK problem with high CPU, so why when doing Didaster Recovery backup the TAPDISK hit 100% of CPU?

    90da95e4-3ce2-4db7-a6b7-4d8173a5355e-image.png


  • XCP-ng Team

    VDI copy and DR aren't using the same mechanism. It's probably XVA creation that got a problem on your setup. Try Continuous Replication to see if you have a better perf.



  • for sure something is wrong, but what? how the DR backup copy process is different from VM Copy? Can You give me more info? I don't know how to troubleshoot that, this is major problem for us...


  • XCP-ng Team

    If it's blocking in production, I would suggest to open a support ticket, it's more efficient for us to assist than here 🙂

    Regarding copy, you said you used VM copy: you mean the copy button in the VM view? Or fast clone? (last one isn't the same). Because indeed, VM copy is very similar to DR.

    Please try with CR to see if there's a perf change (that might be helpful to pinpoint the problem)



  • @olivierlambert said in Extrlemy slow backup speec = about few MB/s:

    Regarding copy, you said you used VM copy: you mean the copy button in the VM view? Or fast clone? (last one isn't the same). Because indeed, VM copy is very similar to DR.

    Please:) of course full clone.


  • XCP-ng Team

    So you did use "Clone" button or "Copy" button?



  • I don't have clone button, only Copy
    5f5e0309-adbf-4d81-92ae-e1120934ac8c-image.png


  • XCP-ng Team

    You have a clone button but in advanced tab, if the VM is halted 🙂

    But okay, so you used copy, and you select the same destination SR than with DR feature? Have you tested with CR to see the diff?



  • @olivierlambert I'll test the CR copy to the end of the day



  • OK So this is it:

    in CR test for me it is fast enough

    • test done on the same server dell r510 as above
    • CR backup from Raid10 to Raid5 volume within one server
    • Speeds are about 100-140MB/s.

    This is good for me. So there is an issue with DR backup. TAKDISK 7% of CPU.

    1f4a2ce2-cf82-4018-a896-67d6d99ac66b-image.png

    lnmonq16eqqqqqqqqqqqqqqqqqqqqqHostname=XCP01qqqqqqqqRefresh= 2secs qqq13:06.26qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqk
    x Disk I/O qq/proc/diskstatsqqqqmostly in KB/sqqqqqWarning:contains duplicatesqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqx
    xDiskName Busy  Read WriteMB|0          |25         |50          |75       100|                                            x
    xloop0      0%    0.0    0.0|>                                                |                                            x
    xsda        0%    0.0    0.0|      >                                          |                                            x
    xsda1       0%    0.0    0.0|      >                                          |                                            x
    xsda2       0%    0.0    0.0|>                                                |                                            x
    xsda3       0%    0.0    0.0|>                                                |                                            x
    xsda5       0%    0.0    0.0|>                                                |                                            x
    xsda6       0%    0.0    0.0|>                                                |                                            x
    xsdb        9%   19.3    0.1|RRRRR        >                                   |                                            x
    xsdc       60%    0.0   43.3|WWWWWWWWWWWWWWWWWWWWWWWWWWWWWW              >    |                                            x
    xsr0        0%    0.0    0.0|>                                                |                                            x
    xdm-0       9%   19.3    0.1|RRRRR        >                                   |                                            x
    xdm-1      60%    0.0   43.6|WWWWWWWWWWWWWWWWWWWWWWWWWWWWWW              >    |                                            x
    xtdl        0%    0.0    0.0|>                                                |                                            x
    xtdn        0%    0.0    0.0|>                                                |                                            x
    xtde        0%    0.0    0.0|>                                                |                                            x
    xtdd        0%    0.0    0.0|>                                                |                                            x
    xtdm        0%    0.0    0.0|>                                                |                                            x
    xtdk        0%    0.0    0.0|>                                                |                                            x
    xtdj        0%    0.0    0.0|>                                                |                                            x
    xtdb        0%    0.0    0.0|>                                                |                                            x
    xtdc        0%    0.0    0.0|>                                                |                                            x
    xtdf        0%    0.0    0.0|>                                                |                                            x
    xtdg        0%    0.0    0.0|>                                                |                                            x
    xtdo        0%    0.0    0.0|>                                                |                                            x
    xtda        0%    0.0    0.0|>                                                |                                            x
    xtdh       19%   23.7    0.0|RRRRRRRRRR>                                      |                                            x
    xtdp       66%    0.0   21.0|WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW               >                                            x
    xTotals Read-MB/s=62.4     Writes-MB/s=108.1    Transfers/sec=1966.4          |                                            x
    x Top Processes Procs=0 mode=3 (1=Basic, 3=Perf 4=Size 5=I/O)qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqx
    x  PID       %CPU    Size     Res    Res     Res     Res    Shared    Faults   Command                                     x
    x             Used      KB     Set    Text    Data     Lib    KB     Min   Maj                                             x
    x     8241    69.1 3392956    8284     128 3346492       0  4088      0      0 stunnel                                     x
    x    16752    15.7       0       0       0       0       0     0      0      0 vif67.0-q0-gues                             x
    x    23244    12.9   41192    4996     228    2056       0  2800      0      0 tapdisk                                     x
    x    23476    10.5   39780    3584     228     672       0  2772      0      0 tapdisk                                     x
    x    16759     8.1       0       0       0       0       0     0      0      0 vif67.0-q3-deal                             x
    x    22397     5.2   20436    6956     152    6748       0  2208    380      0 nmon                                        x
    x    23276     4.3  173288   30696   10188  114616       0 14884      0      0 vhd-tool                                    x
    x    23532     2.9   90684   28144   10188   32012       0 15036      0      0 vhd-tool                                    x
    x    27402     2.4  215276   15664    5052   53752       0  9356      0      0 qemu-system-i38                             x
    x     7346     1.9 5604708  359784   24020 5507644       0 24092      2      0 xapi                                        x
    x      582     1.4  215276   14716    5052   66040       0  9388      0      0 qemu-system-i38                             x
    x     2916     1.4 1304628  156508    1768 1255364       0  9808      0      0 ovs-vswitchd                                x
    x     4093     1.4  235756   15556    5052   53752       0  9212      0      0 qemu-system-i38                             x
    x    12452     1.4  260332   14864    5052   66040       0  9524      0      0 qemu-system-i38                             x
    x    16530     1.4  232684   15356    5052   53752       0  9256      0      0 qemu-system-i38                             x
    x    22063     1.4  238828   14768    5052   53752       0  9428      0      0 qemu-system-i38                             x
    x    26488     1.4  230636   15724    5052   57848       0  9368      0      0 qemu-system-i38                             x
    x    31518     1.4  251116   14640    5052   53752       0  9448      0      0 qemu-system-i38                             x
    x     7006     1.0   65384   17384    8424   10332       0 10120     13      0 forkexecd                                   x
    x     7296     1.0  259372   41276    9720  197440       0 13248      0      0 xcp-rrdd-iostat                             x
    x    16753     1.0       0       0       0       0       0     0      0      0 vif67.0-q0-deal                             x
    x    22834     1.0  207084   14428    5052   53752       0  9248      0      0 qemu-system-i38                             x
    x        7     0.5       0       0       0       0       0     0      0      0 rcu_sched                                   x
    x       48     0.5       0       0       0       0       0     0      0      0 ksoftirqd/8                                 x
    x       73     0.5       0       0       0       0       0     0      0      0 ksoftirqd/13                                x
    

  • XCP-ng Team

    There is an issue with DR backup AND your setup. We don't have any other report of similar problems. Also DR is asking XS/XCP to export VM as XVA, there's no other thing we ask.



  • @olivierlambert just for the test:

    tapdisk high cpu issue in DR backup and slow speed:

    xtdf        0%    0.0    0.0|>                                                |                                                           x
    xtdg        0%    0.0    0.0|>                                                |                                                           x
    xtdo        0%    0.0    0.0|>                                                |                                                           x
    xtda        0%    0.0    0.0|>                                                |                                                           x
    xtdh       95%  258.1    0.0|RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR >                                                           x
    xtdi        0%    0.0    0.0|                                                 >                                                           x
    xTotals Read-MB/s=258.1    Writes-MB/s=0.1      Transfers/sec=6199.8                                                                      x
    x Top Processes Procs=0 mode=3 (1=Basic, 3=Perf 4=Size 5=I/O)qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqx
    x  PID       %CPU    Size     Res    Res     Res     Res    Shared    Faults   Command                                                    x
    x             Used      KB     Set    Text    Data     Lib    KB     Min   Maj                                                            x
    x    29822    95.4   46892   10908     228    7756       0  3012      0      0 tapdisk                                                    x
    x     7346    42.9 5604708  361204   24020 5507644       0 22940      2      0 xapi                                                       x
    x      105     4.8       0       0       0       0       0     0      0      0 kswapd0                                                    x
    x    27740     4.3   20340    7008     152    6652       0  2364    374      0 nmon                                                       x
    x     2916     2.4 1304628  156508    1768 1255364       0  9808      0      0 ovs-vswitchd                                               x
    x    27402     2.4  215276   15592    5052   53752       0  9284      0      0 qemu-system-i38                                            x
    

    I have no idea how to deal with this issue, also my setup is - as You can see in CR backup is quite good, there is no issue with hardware or storage layer.


  • XCP-ng Team

    Yes, the problem is with the XVA import or export (or both at the same time). CR doesn't use XVA format.



  • @olivierlambert said in Extrlemy slow backup speec = about few MB/s:

    Yes

    OK. Any suggestion how to troubleshoot this issue?


  • XCP-ng Team

    I have no idea without digging more remotely.



  • Nope it's not good in CR - the task in XenOrchestra finished much sooner that the reads/writes on the host ended.

    DR test results

     28_Venus (XCP01) 
     Snapshot 
    Start: Oct 8, 2019, 01:11:40 PM
    End: Oct 8, 2019, 01:11:43 PM
     x1-ext-sr02 (14.41 TiB free) - XCP01 
     transfer 
    Start: Oct 8, 2019, 01:12:01 PM
    End: Oct 8, 2019, 01:46:13 PM
    Duration: 34 minutes
    Size: 7.02 GiB
    Speed: 3.51 MiB/s
    Start: Oct 8, 2019, 01:12:01 PM
    End: Oct 8, 2019, 01:46:19 PM
    Duration: 34 minutes
    Start: Oct 8, 2019, 01:11:40 PM
    End: Oct 8, 2019, 01:46:29 PM
    Duration: 35 minutes
    

    CR test results:

     28_Venus (XCP01) 
     Snapshot 
    Start: Oct 8, 2019, 01:00:55 PM
    End: Oct 8, 2019, 01:00:58 PM
     x1-ext-sr02 (14.41 TiB free) - XCP01 
     transfer 
    Start: Oct 8, 2019, 01:00:58 PM
    End: Oct 8, 2019, 01:07:40 PM
    Duration: 7 minutes
    Size: 2.49 GiB
    Speed: 6.35 MiB/s
    Start: Oct 8, 2019, 01:00:58 PM
    End: Oct 8, 2019, 01:07:40 PM
    Duration: 7 minutes
    Start: Oct 8, 2019, 01:00:55 PM
    End: Oct 8, 2019, 01:07:40 PM
    Duration: 7 minutes
    Type: full
    

  • XCP-ng Team

    I don't understand what you mean.



  • It would be interesting if you could try the DR-copy to another remote-machine just to see if tapdisk is as busy.
    I noted that it is having a rather high % in the busy-column, around 90%, which would indicate that there's a bottleneck somewhere.

    Did you try any other backup-type to a remote nfs for example?


Log in to reply
 

XCP-ng Pro Support

XCP-ng Pro Support