Too many snapshots

Pilow

@McHenry I dont think more than 3 snapshots triggers an error, just tested on one VM

it is not recommended for "in production" VMs, but for a CR destination, it's OK (as you would need to start a copy anyway)

your problem, failing CR jobs is probably due to garbage collection not finishing in the one hour timeframe when chain is long.

McHenry

@Pilow

CR jobs are not failing just XO reports too many snapshots under:
Dashboard >> Health

All good if I can just ignore this warning but thought best to check in case it was an issue.

I got the value of 3 from here.
https://docs.xen-orchestra.com/manage_infrastructure#too-many-snapshots

Pilow

@McHenry could you screen the health page ?
where we could see the chain length

henri9813

Hello,

I see also this behavior which is "new" since few weeks.

Previously, when a backup start:

it stake a snapshot ( if there another one before, it delete it ).
it upload the snapshot as a backup
it coalesce the backup on the remote.
end of the game.

Now, the old snapshots are not deleted anymore which can lead easily to some disk full.

Even with a retention of 1, the problem is present.

I observe this only in Backup job, not DR/CR job.

I just updated my XO to latest version, i will see if the issue is fixed.

McHenry

@Pilow

The number of snapshots shows 16, which makes sense as I have two backup schedules, one with a retention of 15 and one with a retention of 1. The daily backup with a retention of 1 resets the chain, as it is a full backup.

henri9813

Hello @McHenry .

Yes but no, once the snapshot is exported, the previous one must be cleaned on local.

Best regards,

McHenry

@henri9813

Thanks.
The old snapshots are being removed as the total never increases beyond 16, so when a new snapshot is added, the old one is removed.

Pilow

@McHenry said:

Thanks.
The old snapshots are being removed as the total never increases beyond 16, so when a new snapshot is added, the old one is removed.

immediatly removed, yes, but then Garbage collection takes place.
and perhaps with 19x16 GC to process it can't be done in one hour, and then next CR is launched, etc etc...

McHenry

@Pilow

I did check this and it definitely completes within the hour.

I am testing a lesser value for CR retention to see if this resolves it.

poddingue

If the lower retention value gets things stable, that probably confirms Pilow's hypothesis. If it doesn't help, that's the signal that something heavier is going on, and a @Team-XO-Backend ping would make sense. Would you mind dropping the result back here either way? Helps the next person hitting the same wall.

florent

@poddingue this is something that was hidden with the previous system ( same disk chains, but not shown as snapshot )

@julienxovates are you ok to not check the vm tagged as replication from this chech ?

julienXOvates

@florent I think we discussed this and we thought it's not meaningful to have CR snapshots counted in the Dashboard (otherwise, we should maybe display standalone snapshots and snapshots from backups separately in 2 colmuns in XO6).

florent

@julienXOvates PR is here https://github.com/vatesfr/xen-orchestra/pull/9868

julienXOvates

@henri9813 said:

Hello,

I see also this behavior which is "new" since few weeks.

Previously, when a backup start:

it stake a snapshot ( if there another one before, it delete it ).

it upload the snapshot as a backup

it coalesce the backup on the remote.

end of the game.

Now, the old snapshots are not deleted anymore which can lead easily to some disk full.

Even with a retention of 1, the problem is present.

I observe this only in Backup job, not DR/CR job.

I just updated my XO to latest version, i will see if the issue is fixed.

Hi @henri9813 ,
Issue should be resolved in 6.5.x, can you confirm on your side ?
Thanks!