Question on NDB backups

archw

While testing NDB over the last few days, I saw this note ”Remember that it works ONLY with a remote (Backup Repository) configured with "multiple data blocks".” I then saw this note:
“Store backup as multiple data blocks instead of a whole VHD file. (creates 500-1000 files per backed up GB but allows faster merge)”

I set one up and backedup a small VM and, sure enough, with a 5gb VM, it made 15,587 files. I can only imagine a large VM drive would generate a massive amount of files.
Questions:

Is there anything of concern with respect to generating all these files on the NFS server vs just a couple of large files?
One area said “your remote must be able to handle parallel access (up to 16 write processes per backup)”…how does one know if it will do so?
Also, do you still need to do this: “edit your XO config.toml file (near the end of it) with:
[backups]
useNbd = true
Does anyone use this vs the other methods of backup?

Thanks!

TS79

@archw

Hi. The only input I have is around your first question.

When anything stores multiple smaller files (especially thousands of files per GB of data), you will have filesystem overhead on your storage device (backup target) and 'wasted space'. This could mean that a lot of additional capacity will be used up, and if you're already near capacity, that problem could scale up to where it becomes a big problem.

As a dumbed-down version using round numbers, for example:

on a filesystem with 4K block size, a 1-byte file (e.g. a TXT file with the letter "a" in it) will consume 4K of disk capacity.
If you scale this up to a thousand files, you are consuming 4,000,000 bytes of disk capacity, with only 1,000 bytes of data.

Also, if you are using any other apps/utils to scan, monitor, or sync at the filesystem level (for example a sync tool, or anti-malware, or checksums) - it will need to process many thousands of files instead of just a hundred or so. This will add latency to processing files.

Again, depends on scale, so another round-number example:

assume an app/util needs 200 milliseconds to open and close each file per operation
if you have 100 files, you have 20 seconds of 'wait time'.
If you have 1,000,000 files, you are looking at about 55 hours of 'wait time'.

Not a very realistic example, but just something to be aware of when you explode data into many, many smaller file containers.

archw

@TS79
That makes sense - thanks!