Extrlemy slow backup speec = about few MB/s



  • Oh I see, sorry for reading a bit too fast @akurzawa
    I dont use the Disaster-recovery myself, we use Continous-replication because we want it to be incremental.

    Is tapdisk reaching 100% on both hosts or only on the sender?



  • @olivierlambert I meant that in XO in Tasks pane the CR backup task finished for exmaple after 5 minutes, but I was observing nmon on the server, and the data read/write that was initiated by CR task finished 5 minutes later after the XO reported finish of the task. It look like some server-background vdi task like coalescence.



  • @nikade the same situation - probably second server is the same dell r510 server with the same configuration. That was the main idea - two same server, DR copy for production security.



  • what is strange that the DR backup is in progress, the progress is 66% BUT there is NO activity on the target server - dr test from xcp01 to xcp02

    is this by design? what is happening during DR copy? some preparation on source server and then send file to target server?

    25db63fa-9e46-461b-bc8b-3dabcd84d333-image.png



  • To be honest there is a lot of "magic" stuff happening during the handling of disks. I've noticed as well there there might be activity on the sources-server before the destination-server even reacts.

    I guess you'd have to read the code to be able to give a certain answer to that question - All tho I wouldnt suprise me if there is some snapshotting and stuff going on on the source-server.

    Are you using 7200k disks by any chance? The increase going to 10k or even 15k sas-drives is incredible and I know that XS/XCP is really picky when it comes to I/O.
    Even our machines at work that has PERC H710p controllers with 1Gb cache and SSD-drives seems slow at times....



  • @nikade This is no problem with performance - all Vms that runs on the host perform excelent in a matter of disk speed. I've got many samba server and everyone is performing with 113MB/s UP/DOWN the network (simple network with gigabit switch). Also simple vm-export also works excellent - it saturated the raid10 speed for about 210MB/s. I suspect issue with NICs and TCP OFFLOAD but I don't have knowledge in centos/xenserver to kill that issue. Disks - WD80EFAX raid10 and raid5.



  • @nikade said in Extrlemy slow backup speec = about few MB/s:

    Is tapdisk reaching 100% on both hosts or only on the sender?

    on the source server:
    xcp01 is source, xcp02 is target, but again when I was doing DR-test within one server tapdisk also was 100%.
    https://xcp-ng.org/forum/assets/uploads/files/1570616741985-25db63fa-9e46-461b-bc8b-3dabcd84d333-image.png

    This is overall performance of localstorage raid10, can You spot anything unusual?
    07dc2263-e201-4fc4-816c-85db4d756790-image.png



  • @akurzawa Have you tried testing with the pre-built appliance instead of your "from sources" VM to see if it exhibits the same symptoms?



  • @Danp yes, I've got spare host, I've installed XOA, the same results.



  • @akurzawa I think the stats looks fine to be honest. Maybe there is something with the TCP offloading but I have no experience from that.
    Please share if you manage to find anything that speeds up the performance.



  • @nikade @akurzawa
    a good way to test if it is a TCP offload issue vs DR/CR or storage bottle neck issue is to run ipref between both XCP-ng nodes. This should place the majority of the load on the network cards and no other components, if it is a offload issue you will notice a considerably high increase in CPU utilization when iperf starts saturating the network links.

    iperf docs --https://iperf.fr/iperf-doc.php

    Also I don't think you mentioned it but what model are the network adapters on the XCP-ng hosts?



  • Broadcom Limited NetXtreme II BCM5716 Gigabit Ethernet - integrated in Dell R510 - also the management
    Intel Corporation 82576 Gigabit Network Connection - for VMs (BOND)

    how do You install iperf on the host?



  • @akurzawa
    general instructions for additional packages are available here: https://github.com/xcp-ng/xcp/wiki/Additional-packages

    In your case this section (https://github.com/xcp-ng/xcp/wiki/Additional-packages#from-xcp-ng-repositories) should suffice but I strongly recommend reading through the wiki and understanding it especially if you decide to install more packages.

    Another thing to note is that there have been relatively A LOT of issues with Broadcom NICs going on around the forum so this may be another case of that. That said, another test option for you in addition to iperf is to move the management interface to the Intel NIC and see if the issue persist.



  • As I pressume these values are representing transferring data trhu XO(A) ? Can You determine if those values are on sufficient level - for example compare megaIOps with XOA in Your infrastructure?

    cb840a93-c2ab-4baf-9f80-5db49e71c9b7-image.png


  • Admin

    This is an average on time (by week/month). It's NOT at all a metric that could be used to compare vs something else, because it's just an average on a long time. Not just during a transfer. It's 122 milli IOPS in average 24/7 during a month.

    By the way, because it seems you like units a lot 😉 here is a recap on SI prefixes: https://en.wikipedia.org/wiki/Metric_prefix (m = milli , 10^-3)



  • I like to laugh too, but this is not funny:
    37d6488e-3dbe-45c9-8fdf-fe68a5624563-image.png


  • Admin

    Thing is: we can't reproduce and so far nobody could, so it's hard to assist from there 🤷

    I would advise to switch to CR so you can get rid of this issue, which is somehow related to XVA generation on the fly.



  • Update:

    Despite the export and copy of the VM works WELL on unfamous "broadcom" NIC I've change the management to 4-Port Intel card, and wow, the problem remains, TAPDISK 100%, DR copy speed about 4-5 MB/s or MiB

    x
    x 16   1.0   2.2   2.7  94.1|sW  >                                            |                                                     x
    x---------------------------+-------------------------------------------------+                                                     x
    xAvg   2.2   8.6   3.6   0.1|UssssW>                                          |                                                     x
    x---------------------------+-------------------------------------------------+                                                     x
    x Top Processes Procs=0 mode=3 (1=Basic, 3=Perf 4=Size 5=I/O)qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqx
    x  PID       %CPU    Size     Res    Res     Res     Res    Shared    Faults   Command                                              x
    x             Used      KB     Set    Text    Data     Lib    KB     Min   Maj                                                      x
    x    17460    96.1   46716   10520     228    7580       0  2800      0      0 tapdisk                                              x
    x     7346    44.7 6084304  882236   24020 5987240       0 13468      2      0 xapi                                                 x
    x      105     5.1       0       0       0       0       0     0      0      0 kswapd0                                              x
    x    18487     2.4   20192    6648     152    6504       0  2172    192      0 nmon                                                 x
    x     1950     2.0   70872    8612     728    6616       0  3492      0      0 oxenstored                                           x
    x     7006     1.7   63156   12276    8424    8104       0  4952     37      0 forkexecd                                            x
    x    22063     1.7  251116   14680    5052   53752       0  9340      0      0 qemu-system-i38                                      x
    x     2916     1.5 1304628  157712    1768 1255364       0  9808      0      0 ovs-vswitchd                                         x
    x     4093     1.5  250092   15476    5052   53752       0  9144      0      0 qemu-system-i38                                      
    

  • Admin

    I'd like to help but without a way to reproduce the issue, I don't know what to do. Here is my result on a DR without compression on the same SR, exactly the way you are testing it:

    drtest.png

    edit: and on the network speed side:

    drtestspeed.png



  • @olivierlambert said in Extrlemy slow backup speec = about few MB/s:

    I would advise to switch to CR so you can get rid of this issue, which is somehow related to XVA generation on the fly.

    DR is "better" for me, becouse this is FULL export of the VM. I've tested in in action - dead host, I could start a DR vm in a snap. Another thing - DR copy is consistent - It's exact copy of VM in one piece - so it always will be better in terms of security compatiblity and recovery.

    I keep the DR copies of VMs for about two monts back - on four different serves.
    In CR copy that would required - AFAIK - the snapshot on each VM to be kept for two monts back - for mi this is somewhat insecure. Also how to deal with rorating backup storage and CR COPY? That will not work.


Log in to reply
 

XCP-ng Pro Support