XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    How to improve backup performance, where's my bottleneck

    Scheduled Pinned Locked Moved Xen Orchestra
    19 Posts 4 Posters 3.8k Views 2 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • olivierlambertO Offline
      olivierlambert Vates 🪐 Co-Founder CEO
      last edited by

      That's because on how streams are working, there's a back pressure applied. I don't think we keep any buffer (or a very tiny one) on the buffer level.

      I'll ask @julien-f about this.

      M 1 Reply Last reply Reply Quote 0
      • M Offline
        markhewitt1978 @olivierlambert
        last edited by

        @olivierlambert said in How to improve backup performance, where's my bottleneck:

        That's because on how streams are working, there's a back pressure applied. I don't think we keep any buffer (or a very tiny one) on the buffer level.

        I'll ask @julien-f about this.

        Yes indeed, I've noticed that over a long time and the amount varies with the amount of memory XOA is given. The default of 2GB very little, I've given it 32GB so it stores quite a bit.

        1 Reply Last reply Reply Quote 0
        • olivierlambertO Offline
          olivierlambert Vates 🪐 Co-Founder CEO
          last edited by olivierlambert

          That's a good question, I have a very partial understanding on how Node streams are working at low-level. This is more a question for Node experts 😛

          1 Reply Last reply Reply Quote 0
          • M Offline
            markhewitt1978
            last edited by

            b5cf96e8-5965-4828-a808-9ee3d0586d3c-image.png

            The main question being. If the disks can take 300MByte/s in burst, could they maintain that all the time? Perhaps not. Just that nikade's post above makes me wonder if there is some 1Gbit (with overheads) limit in node itself.

            It's most likely disk speed limited, but interesting to explore all the same.

            1 Reply Last reply Reply Quote 0
            • olivierlambertO Offline
              olivierlambert Vates 🪐 Co-Founder CEO
              last edited by

              I don't think it's a Node limit, my bet is the content that require a lot of random seeks everywhere, limiting the storage speed. It would be interesting to do your same backup on a very fast NVMe drive, just for the benchmark.

              1 Reply Last reply Reply Quote 1
              • M Offline
                markhewitt1978
                last edited by

                As it happens I do have an expiring server I can use for testing with NVME. So XOA stays where it is and I establish a new CentOS7 VM on this host to act as a remote. It's using a standard xcp-ng VDI not passthrough disk like the other one.

                An fio test gives: write: IOPS=135k, BW=526MiB/s (551MB/s)(1950MiB/3711msec)

                Now keep in mind my current backups are still running through XOA - it's a production backup.

                However I kick off a backup from a different subset of servers that the current live backup is runing from. It hits the NVME NFS server and writes to disk at .... 30MBps :Z

                The most interesting aspect is the incoming bandwidth to XOA. You'd think if it were a backpressure issue due to not being able to write to the disk of the remote fast enough, that the incoming bandwidth would increase as it's sending it out to two hosts. In fact the opposite seems to happen, plus the bandwdith being sent to my 'live' backup decreases in proportion to the bandwidth now being sent to the NVME remote.

                2317e848-5b08-41b2-80ef-f37806e3334a-image.png

                I'm not sure what this means but it is interesting.

                1 Reply Last reply Reply Quote 0
                • olivierlambertO Offline
                  olivierlambert Vates 🪐 Co-Founder CEO
                  last edited by

                  Indeed, it's interesting. You did the fio randwrite test with 4k?

                  1 Reply Last reply Reply Quote 0
                  • M Offline
                    markhewitt1978
                    last edited by

                    I used the command line you sent to me

                    fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 --size=4G --readwrite=randwrite --ramp_time=4
                    test: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=64
                    fio-3.7
                    Starting 1 process
                    test: No I/O performed by libaio, perhaps try --debug=io option for details?4s]
                    
                    test: (groupid=0, jobs=1): err= 0: pid=2290: Thu Jan 30 10:24:50 2020
                      write: IOPS=148k, BW=576MiB/s (604MB/s)(1905MiB/3306msec)
                       bw (  KiB/s): min=554144, max=599104, per=99.60%, avg=587724.00, stdev=16710.46, samples=6
                       iops        : min=138536, max=149776, avg=146931.00, stdev=4177.61, samples=6
                      cpu          : usr=11.56%, sys=42.24%, ctx=30262, majf=0, minf=23
                      IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=215.0%
                         submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
                         complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
                         issued rwts: total=0,487662,0,0 short=0,0,0,0 dropped=0,0,0,0
                         latency   : target=0, window=0, percentile=100.00%, depth=64
                    
                    Run status group 0 (all jobs):
                      WRITE: bw=576MiB/s (604MB/s), 576MiB/s-576MiB/s (604MB/s-604MB/s), io=1905MiB (1998MB), run=3306-3306msec
                    
                    Disk stats (read/write):
                      xvdb: ios=0/1037381, merge=0/2159, ticks=0/437297, in_queue=437329, util=98.68%
                    
                    1 Reply Last reply Reply Quote 0
                    • olivierlambertO Offline
                      olivierlambert Vates 🪐 Co-Founder CEO
                      last edited by

                      Great, so indeed, this storage should be able to deal with a lot of random write IOPS 🙂

                      1 Reply Last reply Reply Quote 0
                      • M Offline
                        markhewitt1978
                        last edited by

                        I think the next experiment would be to use the upcoming backup proxies to remove XOA from the equation and allow for multiple locations feeding into the NFS remote.

                        1 Reply Last reply Reply Quote 0
                        • olivierlambertO Offline
                          olivierlambert Vates 🪐 Co-Founder CEO
                          last edited by

                          Yes, but to compare apples to apples, you'll need to avoid using "bigger" network bandwidth (combining proxies bandwidth). You have already 10G everywhere, right?

                          M 1 Reply Last reply Reply Quote 0
                          • M Offline
                            markhewitt1978 @olivierlambert
                            last edited by

                            @olivierlambert said in How to improve backup performance, where's my bottleneck:

                            Yes, but to compare apples to apples, you'll need to avoid using "bigger" network bandwidth (combining proxies bandwidth). You have already 10G everywhere, right?

                            No. The backups server has 10G but the server it's backing up have a mixture of 1G and 2G.

                            At the moment things are acceptable, and disk performance is going to be a limitation anyway, this is now from a personal interest side!

                            1 Reply Last reply Reply Quote 0
                            • olivierlambertO Offline
                              olivierlambert Vates 🪐 Co-Founder CEO
                              last edited by

                              And this is great that you are interested to that, because it helps to push things forward 🙂 Eager to have proxies coming in beta.

                              1 Reply Last reply Reply Quote 0
                              • S Offline
                                snigy
                                last edited by snigy

                                server has 10G DAC to dedicated 10G SW, Disk its writing to is a NAS iSCSI Target with a single 10GBe NIC. Network Traffic is low under 30Mb/s
                                ![alt text](Screenshot from 2020-02-01 16-26-10.png image url)
                                png)
                                pdd.png
                                first peek is memory read.

                                1 Reply Last reply Reply Quote 0
                                • First post
                                  Last post