I can answer the question: not on a different pool, but coalesce for the whole pool with a shared SR yes (a broken VHD will break coalesce on the entire SR it resides on, not on the others).
There are no shared SR in my test environment, except for a CIFS ISO SR.
This is a thumbnail sketch:
Pool X:
- Host A
- Local SR A1
- Running VMs:
- VM 1 (part of CR job)
- VM 2 (part of CR job)
- VM 3
- Halted VMs:
- Several
- Host B
- Local SR B1
- Running VMs:
- VM 4 (part of CR job)
- VM 5
- VM 6
- Halted VMs:
- Several
- Local SR B2
- Destination SR for CR job
- No other VMs, halted or running
Pool Y:
- Host C (single-host pool)
- Local SR C'
- Running VMs:
- VM 7 (part of CR job) (also, instance of XO managing the CR job)
- Halted VMs:
- Several
There are other pools/hosts, but they're not implicated in any of this.
All of the unhealthy VDIs are on local SR B2, the destination for the CR job. How can an issue with coalescing a VDI on local SR A1 cause that? How can a VM's VDI on pool Y, host C, local SR C1, replicated to pool X, host B, local SR B2, be affected by a coalesce issue on with a VDI on pool X, host A, local SR A1?
Regarding shared SR, I'm somewhat gobsmacked by your assertion that a bad VDI can basically break an entire shared SR. Brittle doesn't quite capture it. I honestly don't think I could recommend XCP-ng to anyone if that were really true. At least for now, I can say that assertion is demonstrably false when it comes to local SR. As I've mentioned previously, I can create and destroy snapshots on any VM/host/SR in my test environment, and they coalesce quickly and without a fuss, >>including<< snapshots against the VMs which are suffering exceptions as detailed above.
By the way, the CR ran again this evening. Four more unhealthy VDIs.
Tomorrow I will purge all CR VMs and snapshots, and start over. The first incremental will be Saturday night. We'll see.
I've also spun up a new single host pool, new XO instance, and a new CR job to see if it does the same thing, or if it behaves as everyone seems to say it should behave. I'm more interested in learning why my test environment >>doesn't<<.