S3 Chunk Size

rizaemet 0

Hello,
I added a S3-compatible cloud storage as a remote backup target to XenOrchestra. There are no problems, everything is working as expected and the performance is good. I can create backups and restore them.
But, the service provider's documentation states, "You must set the minimum Chunk Size for backups to 4MB". How can I find out what this value is for my system? And if I need to change it, how can I do so?

florent

@rizaemet-0 this is not configurable .
before qcow2 the S3 blocks were aligned with the VHD blocks, then compressed, so changing it would have been a lot of work
now with qcow2 support almost reaching stable, we have some of the tools available to resize the block size on the file ( since qcow2 are based on 64KB blocks) , so it's not impossible, but it's still quite complex to do reliably

Could you name the provider ?

rizaemet 0

@florent We are a university in Türkiye. The S3 service is provided to us by the National Academic Network and Information Center (ULAKBİM), an official institution of Türkiye.

olivierlambert

Do you know what software are they using? Ceph? Minio? Garage?

rizaemet 0

@olivierlambert Ceph. There are a few configuration examples. That's how I learned it's Ceph.

Edit: When I asked the AI some questions, it said something like this: "If the chunk size is too small → the risk of a 502 increase". Seeing this, I ran a few tests. A backup of a virtual machine with 80 GB of disk space (backup size: 70 GB) went through without any problems. However, a backup of a virtual machine with 16 GB of disk space (backup size: 3 GB) resulted in a failure. It seems the 502 error occurred during the clean-vm phase of the backup. However, the backup appears to have been created and it was working when I restored. I was backing up virtual machines with large disk sizes and had never encountered this error before.
This section in the log exists both before the snapshot and after the export:

...
{
  "id": "1768672615028",
  "message": "clean-vm",
  "start": 1768672615028,
  "status": "failure",
  "end": 1768673350279,
  "result": {
    "$metadata": {
      "httpStatusCode": 502,
      "clockSkewCorrected": true,
      "attempts": 3,
      "totalRetryDelay": 112
    },
    "message": "Expected closing tag 'hr' (opened in line 9, col 1) instead of closing tag 'body'.:11:1
  Deserialization error: to see the raw response, inspect the hidden field {error}.$response on this object.",
    "name": "Error",
    "stack": "Error: Expected closing tag 'hr' (opened in line 9, col 1) instead of closing tag 'body'.:11:1
  Deserialization error: to see the raw response, inspect the hidden field {error}.$response on this object.
    at st.parse (/opt/xo/xo-builds/xen-orchestra-202601171930/node_modules/fast-xml-parser/lib/fxp.cjs:1:20727)
    at parseXML (/opt/xo/xo-builds/xen-orchestra-202601171930/node_modules/@aws-sdk/xml-builder/dist-cjs/xml-parser.js:17:19)
    at /opt/xo/xo-builds/xen-orchestra-202601171930/node_modules/@aws-sdk/core/dist-cjs/submodules/protocols/index.js:1454:52
    at process.processTicksAndRejections (node:internal/process/task_queues:103:5)
    at async parseXmlErrorBody (/opt/xo/xo-builds/xen-orchestra-202601171930/node_modules/@aws-sdk/core/dist-cjs/submodules/protocols/index.js:1475:17)
    at async de_CommandError (/opt/xo/xo-builds/xen-orchestra-202601171930/node_modules/@aws-sdk/client-s3/dist-cjs/index.js:5154:11)
    at async /opt/xo/xo-builds/xen-orchestra-202601171930/node_modules/@smithy/middleware-serde/dist-cjs/index.js:8:24
    at async /opt/xo/xo-builds/xen-orchestra-202601171930/node_modules/@aws-sdk/middleware-sdk-s3/dist-cjs/index.js:488:18
    at async /opt/xo/xo-builds/xen-orchestra-202601171930/node_modules/@smithy/middleware-retry/dist-cjs/index.js:254:46
    at async /opt/xo/xo-builds/xen-orchestra-202601171930/node_modules/@aws-sdk/middleware-flexible-checksums/dist-cjs/index.js:318:18"
  }
}
...

florent

@rizaemet-0 the cleanVM is the most demanding part of the backup job (mostly listing, moving and deleting blocks )

olivierlambert

Yeah, it's not surprising. Ceph S3 implementation is known to be "average" and failure-prone in those cases.

rizaemet 0

Could you please share which version of aws-sdk Xen-Orchestra is currently using? I will share this with our S3 service provider.

florent

@rizaemet-0
sure : "@aws-sdk/client-s3": "^3.54.0",

john.c

@olivierlambert said in S3 Chunk Size:

Yeah, it's not surprising. Ceph S3 implementation is known to be "average" and failure-prone in those cases.

@olivierlambert @florent Well you’re likely to see more use of Ceph by providers, following MinIO entering maintenance mode. Also Canonical are going to be doing more development and selling what it calls Micro Ceph. It’s blog post gives more details.

https://ubuntu.com/blog/microceph-why-its-the-superior-minio-alternative

florent

@john.c yes for sure, and maybe we will be able to set the chunk size on a future date ( especially since we did some of the ground work for vhd / qcow2)

olivierlambert

I heard better feedback from Garage or RustFS than Ceph for a successor after Minio.

Pilow

@olivierlambert planning to give RustFS a try, i'll report back (currently full minio)

olivierlambert

Keep us posted, happy to hear from it!

bmunier

@olivierlambert Hi all. We've deployed one client on the last version of XCP and XOA and we need to backup on S3 remote (installed on Ceph and working with all of our backup solution : Veeam, cinder...)

Backup failed with 502 error on clean-vm step :

{
             "id": "1772809915090",
             "message": "clean-vm",
             "start": 1772809915090,
             "status": "failure",
             "end": 1772811181179,
             "result": {
               "name": "502",
               "$fault": "client",
               "$metadata": {
                 "httpStatusCode": 502,
                 "attempts": 3,
                 "totalRetryDelay": 70
               },

I only have this destination (my customer needs to use 50To), how can i correcting this pb (a ticket opened in vates)

olivierlambert

502 is an answer coming from your S3, telling the server is having an issue.

Adding @florent in the loop