Delta backup stuck on "Clean VM Directory" for a long time

norpan

Hi,

I have problems with delta backup which take a long time to finish.
The remote is in a local Minio instance to buckets with a retention-policy set to get immutable backups.
I has been working fine for about 4 month but it's after that it got bad.

First of all, I'm not sure it's related to the long backup times but I get some of these in the logs and my guess it's because with retention enabled Minio keeps empty directories/prefixes until the deleted files exceeds than the retention period:

no alias references VHD

Transfer and Merge seems quite ok and I think it is "Clean VM Directory" which is the culprit, it can go on for hours.
At first I thought the long backup times could be related to performance on Minio but I can't see any S3 requests at that time.
During my troubleshooting I start to think it's the backup-process which is idle or waiting and will continue after a timeout but I don't know how to debug this further.
There have been a few timeout logged:

            "message": "Connection timed out after 600000 ms",
            "stack": "TimeoutError: Connection timed out after 600000 ms\n    at ClientRequest.<anonymous> (/opt/xen-orchestra/node_modules/@aws-sdk/node-http-handler/node_modules/@smithy/node-http-handler/dist-cjs/set-socket-timeout.js:7:30)\n    at Object.onceWrapper (node:events:632:28)\n    at ClientRequest.emit (node:events:518:28)\n    at ClientRequest.patchedEmit [as emit] (/opt/xen-orchestra/@xen-orchestra/log/configure.js:52:17)\n    at TLSSocket.emitRequestTimeout (node:_http_client:863:9)\n    at Object.onceWrapper (node:events:632:28)\n    at TLSSocket.emit (node:events:530:35)\n    at TLSSocket.patchedEmit [as emit] (/opt/xen-orchestra/@xen-orchestra/log/configure.js:52:17)\n    at Socket._onTimeout (node:net:604:8)\n    at listOnTimeout (node:internal/timers:588:17)"

I have adjusted Concurrency both up and down, with a lower value the timeout seems to be avoided, but even without there errors the backup times is just as large.

Some assistance would be much appreciated.

olivierlambert

Hi,

Sadly, even if Minio is probably better than others (eg BackBlaze), they can suffer from various issues. There's a reason why we only provide official pro support on AWS.

Would it be possible to test another provider to double check it's not XO related?

norpan

One thing I've notices is that if I do a trace in Minio I see no S3-requests but suddenly it shows a couple of s3.list_objects_v2 witch gives status 499 and the duration is just about 10 minutes. But this gives no timeoutin the backuplog.

Are you thinking about test a public provider such as AWS?
I'm not sure we are able to do so. One thing I'm about to do is to test a newer version of Minio in docker on another server running Ubuntu 24.04
As of now it's an older version running native on a OmniOS server ontop of ZFS-storage.

Is there a way to get more verbose logging from the backup job?

olivierlambert

Adding @florent in the loop

florent

@olivierlambert we have a WIP for S3 to improve the listting performance . It will take a few days before having a testable fix

olivierlambert

Fine by me!

norpan

That sounds interesting, the backups are on a new remote on a new Minio-instance so the issue is handled for now. I guess we will see over time if the backup times increases.