Hi
First thank you for such an amazing product.
We connect to 10+ remote sites I have a copy of a backup script running to pull down a nightly copy of meta data.
Due to the quality of some of the remote sites internet some times the ipsec tunnels drop momentarily.
XOA is not robust or aggressive at reconnecting these pools when they drop.
This breaks backups when a pool is disconnected and the backups run.
We see this error when a pool is disconnected.
Error: no such object abcdxxx
I would like to see two things and they may need to be two separate feature requests.
First: Be more aggressive about reconnecting a disconnected pool.
My suggestion would be to have a standard interval that you try to reconnect the pool.
This could be IF pool heartbeat is missed for 30 seconds attempt to reconnect, then wait 1 min then try again each minute thereafter.
I think this would resolve most of the issues I see.
Second I would like to see a test when backing up a pool to attempt to reconnect before running the backup.
Something along the lines of:
Finished backing up Pool 1, check if Pool 2's heartbeat was seen in less than 30 seconds, if not attempt reconnect, if this fails skip pool and notify, else backup as normal. etc
The skip pool and continue makes the backups more useful while the reconnect would make them more robust.
I would also love an API hook / message / etc that I could use to tell my firewall to reconnect the tunnel.
As a side bar we have a customer that has two sites and we run the backups between them, this fails if the connection the the remote site is down. I would like to see it continue locally until the remote site is back up.