Automatic backup verification
-
During a live vlog with @lawrencesystems the conversation came up with regards to automatic backup verification. For example Veeam, Storagecraft, etc. allow for booting up backups and taking screenshots (usually with network disabled) to ensure backups actually boot.
It was an interesting topic to see if this is something that would beneficial to XO, leaving this here as per @olivierlambert to open up the discussion.
-
Sure, so the hard part is: how to automatically test them?
The easiest path to me, is to detect tools started.
-
That would be a simple method. Boot the system with a host only / isolated network, confirm tools start, append that to the backup log, done.
-
Should it be integrated to the backup job? Or should we have a different scheduler to test backup at other time than in the end of a backup?
So right now, trying to imagine this:
- enabling backup restore on a backup job (or in a dedicated scheduler for it)
- selecting the main network for restore (ideally, one should use a network where no conflict should happen with production VM). Should we scrub the network interfaces automatically in all VMs to avoid any IP conflict problem?
- we could be sure the system will boot (if we got tools answered, XO can detect this). However, we can't "validate" data eg in an extra disk in the VM. I don't see any easy way to do it, except having an agent in the VM reporting what you need is here.
- then, when we get the result, we can put it in a dedicated report, on in the backup job report
-
@olivierlambert I don't think it really has to do anything as advanced as checking any other part of the system. Tools such as the Datto backup uses "Screenshot Backup Verification" https://www.datto.com/technologies/screenshot-verification which shows the VM boots but nothing else to verify any other application status within the VM. I think the tool verification and starting the system with an isolated network interface to avoid IP conflict issues or the sever reaching out to anything would be enough. I like the idea as this being part of a backup job. For example I currently have delta jobs running during the week and I have a separate full back job that does the offline snapshot that runs on the weekends and that is where I would also like the "Restore Test" type of feature.
-
So, a kind of new option in the backup job, with a field like "Check backup every XX", opening new fields:
- where to restore those test backups (SR and network)
- checking if tools are up after booting them
- then removing them
- sending a dedicated report on the restore test
Am I missing anything else?
-
@olivierlambert That sounds like enough to me, maybe @sheridancompute or @tekwendell have a few other thoughts on it.
-
Adding @marcungeschikts in the loop so we can think on creating an issue/card with the detailed spec (with @julien-f opinion on how this could fit in our current backup code)
-
Sounds right to me, that's pretty much how storagecraft do it, boot to login prompt, take screenshot and email it.
We've used StorageCraft for backup for years, this was one of the original winning features for us.
https://support.arcserve.com/s/topic/0TO36000000Ln5gGAC/advanced-verification?language=en_US
-
@olivierlambert on it
-
Yes, it should be safe enough to try importing the backup, then starting the VM without network (ie VIFs) and see if it stays up for a few minutes.
I think we could start by adding a manual feature and then see in a second step how we could automate it.