CPU Stats bottoming out to Zero every five minutes
-
I am running four instances of xcp-ng 8.3 with latest maintenance patches installed over the weekend. All four servers are the same Dell R720 but two of the boxes have a little more load than the other two. When viewing the stats of the two heavier loaded boxes they are now showing a CPU usage dropping to zero every 5 minutes or thereabouts. The server logs are totally empty and there is no slowness felt during these "outages". Is anybody else seeing this, or know what is going on, it appears to be a display problem only so far.
Running XO Community:
- Xen Orchestra, commit 9a833
- Master, commit a82af
-
Hi,
First time I'm seeing this. To remove XO from the equation, I would check the RRD values. Pinging @andriy.sultanov to assist in here
-
It doesn't seem to be entirely 5 minute based, looking at the second server I see a different interval. When I get a minute I'll pull some raw RRD data for you.
-
It would seem that I am seeing this effect on all four of my servers, the usage was too low to see the event before. Here is my 3rd busiest server:
-
@DKirk Very odd. Maybe a electrical power issue? Do you see this if you run xentop on each host and really important, do they happen at the same time on all your servers?
Any chance they are overheating and pausing briefly? -
@tjkreidl No chance of electrical issue as all four servers have dual power supplies fed by two independent UPSs. This oddity only began after the updates applied three days ago. Does not happen with the same frequency or timing. A/C server room sitting at 70 degrees, all four servers racked together with great air flow. Again, this just started after the last updates.
-
I also see it in my lab
-
@olivierlambert I think this has already been fixed upstream (https://github.com/xapi-project/xen-api/pull/6458) - I will backport it for the release after the LTS and see if it fixes the issue for people in this thread.
-
@andriy.sultanov - that sure sounds like it, thank you, let me know if I can help in any way. These are production servers but I can't imagine RRD causing any issues for them.
-
@DKirk That all makes sense, thanks for clarifying. Looks like there are further comments below that seem to pinpoint where the issue may lay. The key point you make is only "after the last updates" is when this started happening!