PCPU assumed down

shaddow

I am having the same crash on 2 servers, they start at 9:30pm and stop crashing by 03:00am.
The servers are HP ProLiant DL380 G6
I had moved all VM from one to the second server as the first started this about a month ago and was almost every night, the 2nd server was stable for a week then last night the same problem as the first.

WARN Got zeros for pr_status note 16 - PCPU assumed down
WARN Got zeros for xen_crash_core note 16 - PCPU assumed down

I have backed up all the files in the crash folder and can add if needed.
Any help would be great, or if someone else has posted about this then a like to there post please.

Thanks

shaddow

For the ones that might need more I have the dump files at

[link text]http://support.haddow.au/xcpng/(link url)

There is both Zip and the hole folders.

If you need more info then please ask.

Andrew

@shaddow This error was seen in Xen 7.5 but claimed to be fixed in 7.6 but still happened. It seemed to be related to a network driver. I did not see a resolution to the issue other than changing network cards.

What is the system doing at that time? Backups?

Maybe try disabling HyperThreading? Do you have any other NICs in the server?

citrix post
mailing list post

May be it's a regression bug in old code on an old server.

shaddow

Andrew

The First server (that is off for now) had Hyperthreading off and it was happening whenever it wanted to for some reason.

The current one was installed (with what I thought) is the current version, the only programs that have not changed are the VM's including Xen-O

"What is the system doing at that time? Backups?" that's the crazy thing as backups don't start till after midnight, but for some reason the crash mostly start at 930pm

For the nic's, the ones I am using are onboard, But think I might be able to install others, just have to re-config the hole server, but think I can do this, the fun part will be getting the cards, (They are only 1Gb Nic's )

Thanks for you help.

Andrew

@shaddow Strange.... I have a G5 that I use for light testing and it's fine (old and slow). I don't have any G6/G7 servers in use. I have several G8 systems that work well with 8.2.1 and 8.3. The only time I had an HP crash is when there was a bad CPU. An actual bad CPU chip. I replaced it and things were fine.

I'll see if I have a G6 that still works I can rebuild for testing...

shaddow

@Andrew
At the moment I can not find why or what could be running at 21:30 but have a few thoughts into it, Including possible outside hack attempts, as I am on a Fixed IPv4 & IPv6 and the modem is bridged mode to one NIC.

But on the VM I have running Pfsense2.6, Plex (Debian10), (Emails, dns, web) Freebsd12, Unifi (debain10), Asterisk (debian10), XenOrchestra7.4CE (Debian10).
Not running but can be spun up, Winxp64, XenOrchestra(Paid Version)

I have a replacement Nic in the mail and hope this will fix the problem, but wondering if I need to disable the onboard ones to stop the crash even if I force all to use the new ones, but hope this replacement card fixes the problems.