Title: Slow Backup Performance with Commvault Agentless Backup on XCP-ng Cluster

yeopil21

Hello,

We are currently experiencing very slow backup speeds when using Commvault's agentless VM backup in the following environment:

Environment:
Commvault Server - Windows server 2019
Hypervisor: 3-node XCP-ng cluster
Shared Storage: NetApp iSCSI SR
Backup Solution: Commvault (v11.36.49)
Backup Method: Agentless VM backup
Backup Target Storage: NetApp NFS volume

Issue:
When backing up virtual machines via agentless mode, the transfer speed is extremely slow—just a few MB per second—resulting in backup jobs taking several hours to complete.

No significant load on the VMs being backed up
No bottlenecks observed in Commvault job logs
All basic network(1Gbps) and disk performance checks look normal

Questions:
Are there any known limitations or bottlenecks for Commvault agentless backups on XCP-ng environments?
Which parts of the backup flow (e.g., VM snapshot, disk read, CBT parsing) typically cause performance issues in agentless mode?
Any recommended best practices or settings to improve agentless VM backup speeds in this setup?

olivierlambert

Hi,

I think the best first approach is to ask Commvault. It might be interesting to compare with XO to check if it's the same ballpark numbers for backup speed.

yeopil21

We are currently experiencing very slow backup speeds when using Commvault's agentless VM backup in the following environment:

Environment:
Commvault Server - Windows server 2019
Hypervisor: 3-node XCP-ng cluster
Shared Storage: NetApp iSCSI SR
Backup Solution: Commvault (v11.36.49)
Backup Method: Agentless VM backup
Backup Target Storage: NetApp NFS volume

Issue: VM Backup – Unmount Process Taking 10+ Minutes Is This Normal?
We're currently running backups for several VMs in our XCP-ng environment using Commvault(3rd-party backup solution)
During the backup process, we’ve noticed that the unmount step consistently takes around 10 minutes to complete for certain VMs.

Questions:

Is this a normal behavior during the backup process?
Are there any known causes (e.g., high I/O, slow SR response, snapshot handling issues) that could lead to this delay?
Any recommendations to reduce this time?
Has anyone else experienced long unmount durations like this?

There are no obvious errors in the logs, and backups do complete successfully – just very slowly at the unmount stage.
Any insight would be greatly appreciated.

olivierlambert

Before answering your question, that would be good to get already something to compare. Can you try with XO and see if it's similar? It's important to understand, if the problem is related to XCP-ng or your backup solution. Also, why not asking CommVault directly too?

yeopil21

@olivierlambert

It is a configuration related to Commvault and I have opened a case with Commvault, but it seems that the API signal is not being received from XCP-ng.

olivierlambert

Keep us posted

yeopil21

@olivierlambert

4124 1c60 04/29 16:10:49 221 CXenInfo::UnmountVM() - Have existing AddDisksMutex [0xd5c] -> ummount request
4124 1c60 04/29 16:10:49 221 CXenInfo::UnmountVM() - VDI UUID to delete: uuid -> Snapshot deletion request
4124 1c60 04/29 16:21:01 221 CXenInfo::UnmountVM() - Deleted VDI [uuid]. -> Snapshot deletion request

Answer Commvault contents

Commvault will send the API request for creating and deleting the snapshots to hypervisor.
Once backup is done, commvault has sent the request to delete and then unmount the snapshot.
Here in this case, what I observed that, once the backup is done and delete request has been sent from Commvault.
The process of deletion and unmount is taking time on hypervisor.
That is why I suggested you check from XEN end too to narrow down the issue, that why unmount process is taking time.
Hope this clears your query. Do let know if you have further query, If not kindly suggest for further closure of the case,

olivierlambert

It could take time to clear it (especially if there's load on the SR), we have the same issue with XO. Our solution (in XO) was to improve the retry mechanism for a longer number of tries and a long period of time.

tjkreidl

@yeopil21 Run top, xentop, and iostat to see if dom0 and.or the storage device might be a bottleneck. The configuration of your storage device can also be a big factor,
and in some casaes, various performance tweaks are possible. The specific configuration would be helpful: connectivity (NFS, fibre channel, iSCSI), number and size and speeds of disks, RAID configuration, total number of VMs resident on the device, number of independent SRs, provisioning (thin or thick), network (if not fibre channel) speed and settings.
STorage optimization is a bit of an art and in many cases, can be the limiting factor, but as stated, so can the lack of dom0 resources. Also, what clock speed and number of CPUs are on your hosts?