Long time lurker, first time commenter.
Just wanted to add my 2c worth to this conversation, that may assist.
We are running 37 XCP-ng servers, most on Xen 8.0, mixture of Dell R630/640 and some odd Supermicro servers, and we have been experiencing this issue on some of them where DOM0 runs out of memory (free -m like the first post shows very little RAM left).
We see a performance impact (but the VMS still run) with DOM0 - just trying to SSH to DOM0 / using xsconsole is slow, and then eventually DOM0's network fails and whatever we try doesn't restart the networking. When DOM0's network fails all the vms also loose network connectivity. The only resolution is to manually stop each vm via command line and then reboot the xen host.
With one exception, all Xen servers that have experienced this issue generally has an uptime of at least 200 days, but the thing I find interesting is the servers that also have issues has a kubernetes data node on them.
I assume something that kubernetes does is causing the issue. The boxes that do not have kubernetes on them (with 1 exception) never has had this issue.
I have a spare Dell R640 that I'm currently doing some testing on to see if I can create lots of VM and do a heap of CPU/Memory/IO on it to see if I can replicate the issue and if I can try the alternative kernel to see if that makes any difference.
I'll provide feedback on what I find.