How to improve backup performance, where's my bottleneck

olivierlambert

I don't think it's a Node limit, my bet is the content that require a lot of random seeks everywhere, limiting the storage speed. It would be interesting to do your same backup on a very fast NVMe drive, just for the benchmark.

markhewitt1978

As it happens I do have an expiring server I can use for testing with NVME. So XOA stays where it is and I establish a new CentOS7 VM on this host to act as a remote. It's using a standard xcp-ng VDI not passthrough disk like the other one.

An fio test gives: write: IOPS=135k, BW=526MiB/s (551MB/s)(1950MiB/3711msec)

Now keep in mind my current backups are still running through XOA - it's a production backup.

However I kick off a backup from a different subset of servers that the current live backup is runing from. It hits the NVME NFS server and writes to disk at .... 30MBps :Z

The most interesting aspect is the incoming bandwidth to XOA. You'd think if it were a backpressure issue due to not being able to write to the disk of the remote fast enough, that the incoming bandwidth would increase as it's sending it out to two hosts. In fact the opposite seems to happen, plus the bandwdith being sent to my 'live' backup decreases in proportion to the bandwidth now being sent to the NVME remote.

I'm not sure what this means but it is interesting.

olivierlambert

Indeed, it's interesting. You did the fio randwrite test with 4k?

markhewitt1978

I used the command line you sent to me

fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 --size=4G --readwrite=randwrite --ramp_time=4
test: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=64
fio-3.7
Starting 1 process
test: No I/O performed by libaio, perhaps try --debug=io option for details?4s]

test: (groupid=0, jobs=1): err= 0: pid=2290: Thu Jan 30 10:24:50 2020
  write: IOPS=148k, BW=576MiB/s (604MB/s)(1905MiB/3306msec)
   bw (  KiB/s): min=554144, max=599104, per=99.60%, avg=587724.00, stdev=16710.46, samples=6
   iops        : min=138536, max=149776, avg=146931.00, stdev=4177.61, samples=6
  cpu          : usr=11.56%, sys=42.24%, ctx=30262, majf=0, minf=23
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=215.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued rwts: total=0,487662,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
  WRITE: bw=576MiB/s (604MB/s), 576MiB/s-576MiB/s (604MB/s-604MB/s), io=1905MiB (1998MB), run=3306-3306msec

Disk stats (read/write):
  xvdb: ios=0/1037381, merge=0/2159, ticks=0/437297, in_queue=437329, util=98.68%

olivierlambert

Great, so indeed, this storage should be able to deal with a lot of random write IOPS

markhewitt1978

I think the next experiment would be to use the upcoming backup proxies to remove XOA from the equation and allow for multiple locations feeding into the NFS remote.

olivierlambert

Yes, but to compare apples to apples, you'll need to avoid using "bigger" network bandwidth (combining proxies bandwidth). You have already 10G everywhere, right?

markhewitt1978

@olivierlambert said in How to improve backup performance, where's my bottleneck:

Yes, but to compare apples to apples, you'll need to avoid using "bigger" network bandwidth (combining proxies bandwidth). You have already 10G everywhere, right?

No. The backups server has 10G but the server it's backing up have a mixture of 1G and 2G.

At the moment things are acceptable, and disk performance is going to be a limitation anyway, this is now from a personal interest side!

olivierlambert

And this is great that you are interested to that, because it helps to push things forward Eager to have proxies coming in beta.

snigy

server has 10G DAC to dedicated 10G SW, Disk its writing to is a NAS iSCSI Target with a single 10GBe NIC. Network Traffic is low under 30Mb/s
![alt text]( Screenshot from 2020-02-01 16-26-10.png image url)
png)

first peek is memory read.