The sr.scan-driven SMlog growth angle that gumbo2k surfaced is a real lead; there's some context in the storage-related log files reference, but the docs don't go as far as "here's how to throttle it safely on a pool where the underlying disks should spin down."
Soft ping to @Team-Storage and @Team-Hypervisor-Kernel: could one of you weigh in on whether other-config:auto-scan=false on the SR is the supported way to reduce scan pressure, or if there's a better lever? I don't want to send anyone down a path that breaks an SR. Apologies if this has already been answered somewhere I haven't seen.
If the lower retention value gets things stable, that probably confirms Pilow's hypothesis. If it doesn't help, that's the signal that something heavier is going on, and a @Team-XO-Backend ping would make sense. Would you mind dropping the result back here either way? Helps the next person hitting the same wall.
@Mathieu-L
linstor n l was included in my original post.
All nodes were updated to May 2026 Security and Maintenance Updates for XCP-ng 8.3 LTS, all nodes were restarted.
May 2026 Updates #2 for XCP-ng 8.3 LTS was released, and a couple days later I installed on all hosts. No host restarted.
When xen04 was restarted, that is when this issue happened.
I had used systemctl restart linstor-controller here (https://xcp-ng.org/forum/post/105309) to restart the controller.