XCP-ng

    • Register
    • Login
    • Search
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups

    How to improve backup performance, where's my bottleneck

    Xen Orchestra
    4
    19
    1640
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • M
      markhewitt1978 last edited by

      b5cf96e8-5965-4828-a808-9ee3d0586d3c-image.png

      The main question being. If the disks can take 300MByte/s in burst, could they maintain that all the time? Perhaps not. Just that nikade's post above makes me wonder if there is some 1Gbit (with overheads) limit in node itself.

      It's most likely disk speed limited, but interesting to explore all the same.

      1 Reply Last reply Reply Quote 0
      • olivierlambert
        olivierlambert Vates 🪐 Founder & CEO 🦸 last edited by

        I don't think it's a Node limit, my bet is the content that require a lot of random seeks everywhere, limiting the storage speed. It would be interesting to do your same backup on a very fast NVMe drive, just for the benchmark.

        1 Reply Last reply Reply Quote 1
        • M
          markhewitt1978 last edited by

          As it happens I do have an expiring server I can use for testing with NVME. So XOA stays where it is and I establish a new CentOS7 VM on this host to act as a remote. It's using a standard xcp-ng VDI not passthrough disk like the other one.

          An fio test gives: write: IOPS=135k, BW=526MiB/s (551MB/s)(1950MiB/3711msec)

          Now keep in mind my current backups are still running through XOA - it's a production backup.

          However I kick off a backup from a different subset of servers that the current live backup is runing from. It hits the NVME NFS server and writes to disk at .... 30MBps :Z

          The most interesting aspect is the incoming bandwidth to XOA. You'd think if it were a backpressure issue due to not being able to write to the disk of the remote fast enough, that the incoming bandwidth would increase as it's sending it out to two hosts. In fact the opposite seems to happen, plus the bandwdith being sent to my 'live' backup decreases in proportion to the bandwidth now being sent to the NVME remote.

          2317e848-5b08-41b2-80ef-f37806e3334a-image.png

          I'm not sure what this means but it is interesting.

          1 Reply Last reply Reply Quote 0
          • olivierlambert
            olivierlambert Vates 🪐 Founder & CEO 🦸 last edited by

            Indeed, it's interesting. You did the fio randwrite test with 4k?

            1 Reply Last reply Reply Quote 0
            • M
              markhewitt1978 last edited by

              I used the command line you sent to me

              fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 --size=4G --readwrite=randwrite --ramp_time=4
              test: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=64
              fio-3.7
              Starting 1 process
              test: No I/O performed by libaio, perhaps try --debug=io option for details?4s]
              
              test: (groupid=0, jobs=1): err= 0: pid=2290: Thu Jan 30 10:24:50 2020
                write: IOPS=148k, BW=576MiB/s (604MB/s)(1905MiB/3306msec)
                 bw (  KiB/s): min=554144, max=599104, per=99.60%, avg=587724.00, stdev=16710.46, samples=6
                 iops        : min=138536, max=149776, avg=146931.00, stdev=4177.61, samples=6
                cpu          : usr=11.56%, sys=42.24%, ctx=30262, majf=0, minf=23
                IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=215.0%
                   submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
                   complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
                   issued rwts: total=0,487662,0,0 short=0,0,0,0 dropped=0,0,0,0
                   latency   : target=0, window=0, percentile=100.00%, depth=64
              
              Run status group 0 (all jobs):
                WRITE: bw=576MiB/s (604MB/s), 576MiB/s-576MiB/s (604MB/s-604MB/s), io=1905MiB (1998MB), run=3306-3306msec
              
              Disk stats (read/write):
                xvdb: ios=0/1037381, merge=0/2159, ticks=0/437297, in_queue=437329, util=98.68%
              
              1 Reply Last reply Reply Quote 0
              • olivierlambert
                olivierlambert Vates 🪐 Founder & CEO 🦸 last edited by

                Great, so indeed, this storage should be able to deal with a lot of random write IOPS 🙂

                1 Reply Last reply Reply Quote 0
                • M
                  markhewitt1978 last edited by

                  I think the next experiment would be to use the upcoming backup proxies to remove XOA from the equation and allow for multiple locations feeding into the NFS remote.

                  1 Reply Last reply Reply Quote 0
                  • olivierlambert
                    olivierlambert Vates 🪐 Founder & CEO 🦸 last edited by

                    Yes, but to compare apples to apples, you'll need to avoid using "bigger" network bandwidth (combining proxies bandwidth). You have already 10G everywhere, right?

                    M 1 Reply Last reply Reply Quote 0
                    • M
                      markhewitt1978 @olivierlambert last edited by

                      @olivierlambert said in How to improve backup performance, where's my bottleneck:

                      Yes, but to compare apples to apples, you'll need to avoid using "bigger" network bandwidth (combining proxies bandwidth). You have already 10G everywhere, right?

                      No. The backups server has 10G but the server it's backing up have a mixture of 1G and 2G.

                      At the moment things are acceptable, and disk performance is going to be a limitation anyway, this is now from a personal interest side!

                      1 Reply Last reply Reply Quote 0
                      • olivierlambert
                        olivierlambert Vates 🪐 Founder & CEO 🦸 last edited by

                        And this is great that you are interested to that, because it helps to push things forward 🙂 Eager to have proxies coming in beta.

                        1 Reply Last reply Reply Quote 0
                        • S
                          snigy last edited by snigy

                          server has 10G DAC to dedicated 10G SW, Disk its writing to is a NAS iSCSI Target with a single 10GBe NIC. Network Traffic is low under 30Mb/s
                          ![alt text](Screenshot from 2020-02-01 16-26-10.png image url)
                          png)
                          pdd.png
                          first peek is memory read.

                          1 Reply Last reply Reply Quote 0
                          • First post
                            Last post