Server Locks Up Periodically with ASRock X570D4I-2T AMD Ryzen 9 3900X and Intel X550-AT2
-
Well, unfortunately I got nothin... Extremely weird indeed
-
Yeah wish I had a better response here but this is indeed odd.
Do you by chance have a PCIe ethernet card you can swap in to use for connectivity (and just not use the X550 ports), just to test and see if the X550 is causing the crashes.
It's a longshot though if I'm honest.
-
IHMO, memtest failure are pointing a hardware issue but which component? In general, I'm removing or disabling devices one by one until it runs without any error.
-
@olivierlambert Yeah @R2rho I am with this, it's strange to see memtest errors at all.
May be another component causing the failures though, and not the RAM itself. Possibly the board or the mem controller on the CPU.
You don't by chance have another AM4 CPU you can swap in do you?
-
Yeah defective CPU can do this, or bent pins on the motherboard too.
-
@olivierlambert Yup, I've had exactly that a few times, usually on used boards.
@R2rho if possible, however annoying, I would also take the CPU out and check for pins on the motherboard being bent with a flashlight.
-
Thank you guys for the feedback. Strangely enough, I have two of these exact same servers as I was attempting to configure them as a pool. I installed XCP-NG on them separately and am having the exact same issue on both servers. They just lock up and stop responding. It could be a hardware issue, especially since I did see the memtest failures, but seems weird if its happening on both. I initially thought it was a RAM incompatibility issue because I added RAM to these after they arrived and saw all of these issues. But I've since removed the additional RAM and went back to what it had originally, but still having the issues.
I'm probably not going to remove the CPU because I will most likely return these, but I am going to install Ubuntu and see if they continue to be problematic. If that doesn't have any issues, then I think there's some underlying incompatibility with this AsRock Rack that probably needs further diagnosing and evaluation. Either way I'll probably go with something else.
-
@planedrop @olivierlambert @probain so I installed Ubuntu 22.04 on these last night and came back to the same frozen lockup as I was having with XCP-NG so it looks like I somehow received two equivalent servers from OnLogic that were both faulty to some degree. So definitely not an issue with XCP-NG in this case. Thank you for your help, I will be processing a return on these servers and go with a different product altogether.
-
@R2rho
Faulty gear always sucks. But who would've guessed that two separate systems would produce the same problems. That is highly unlikely, but never impossible.Good luck with the RMA
-
@R2rho Yeah that is really surprising.
I suppose it could be some kind of wider hardware incompatibility or something, but still crazy either way.
Glad you got that somewhat sorted out though.
-
Thanks a lot for the feedback. Shit happens, we usually took hardware for granted, and it's not
-
@R2rho We were building dozens ASRock Rack mainboard- and barebone based systems over the past few years. Starting with the X470D4U which worked realy great. Since the X570D4, it started to get messy. The B650D4U is also affected. We had random periodic reboots and freezes, mostly after some weeks or months uptime.
Interestingly we have identical systems which have an uptime of over a year. I would say, about 60% of the systems were affected.
BIOS version and attached hardware did not really matter.
I once contacted the ASRock support, but they did not know of a general problem, instead they suggested to check other components. (which we also did)
We went the RMA way and we even had some exchanged RMA mainboards, which also were faulty.
But: The most recent mainboard returning from RMA seems to work...so maybe you`re lucky
-
@dave That's pretty brutal honestly, I'm thinking about just calling it a day and moving away from Asrock servers entirely. I'm looking to set XCP-NG up on some IOT/Edge servers on some short-depth racks in a factory environment, so I really liked the form factor of these from OnLogic, but I've had the worst experience, and seeing your feedback definitely makes me want to go a different direction. I'm looking at some short-depth servers from SuperMicro geared specifically for IOT/Edge that I think will work out much better.
-
@R2rho yeah, there are Supermicro systems with AM5 which can handle a decent amount of load, like based on the h13sae-mf, like:
https://www.supermicro.com/de/products/system/mainstream/1u/as-1015a-mt
(with less depth)Seem to be stable, but we have a small issue regarding onboard graphics ATM: