XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Server Locks Up Periodically with ASRock X570D4I-2T AMD Ryzen 9 3900X and Intel X550-AT2

    Scheduled Pinned Locked Moved XCP-ng
    21 Posts 5 Posters 1.0k Views 4 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • planedropP Offline
      planedrop Top contributor
      last edited by

      Yeah wish I had a better response here but this is indeed odd.

      Do you by chance have a PCIe ethernet card you can swap in to use for connectivity (and just not use the X550 ports), just to test and see if the X550 is causing the crashes.

      It's a longshot though if I'm honest.

      1 Reply Last reply Reply Quote 0
      • olivierlambertO Offline
        olivierlambert Vates 🪐 Co-Founder CEO
        last edited by

        IHMO, memtest failure are pointing a hardware issue but which component? In general, I'm removing or disabling devices one by one until it runs without any error.

        planedropP 1 Reply Last reply Reply Quote 1
        • planedropP Offline
          planedrop Top contributor @olivierlambert
          last edited by

          @olivierlambert Yeah @R2rho I am with this, it's strange to see memtest errors at all.

          May be another component causing the failures though, and not the RAM itself. Possibly the board or the mem controller on the CPU.

          You don't by chance have another AM4 CPU you can swap in do you?

          1 Reply Last reply Reply Quote 0
          • olivierlambertO Offline
            olivierlambert Vates 🪐 Co-Founder CEO
            last edited by

            Yeah defective CPU can do this, or bent pins on the motherboard too.

            planedropP 1 Reply Last reply Reply Quote 1
            • planedropP Offline
              planedrop Top contributor @olivierlambert
              last edited by

              @olivierlambert Yup, I've had exactly that a few times, usually on used boards.

              @R2rho if possible, however annoying, I would also take the CPU out and check for pins on the motherboard being bent with a flashlight.

              1 Reply Last reply Reply Quote 1
              • R Offline
                R2rho
                last edited by

                Thank you guys for the feedback. Strangely enough, I have two of these exact same servers as I was attempting to configure them as a pool. I installed XCP-NG on them separately and am having the exact same issue on both servers. They just lock up and stop responding. It could be a hardware issue, especially since I did see the memtest failures, but seems weird if its happening on both. I initially thought it was a RAM incompatibility issue because I added RAM to these after they arrived and saw all of these issues. But I've since removed the additional RAM and went back to what it had originally, but still having the issues.

                I'm probably not going to remove the CPU because I will most likely return these, but I am going to install Ubuntu and see if they continue to be problematic. If that doesn't have any issues, then I think there's some underlying incompatibility with this AsRock Rack that probably needs further diagnosing and evaluation. Either way I'll probably go with something else.

                1 Reply Last reply Reply Quote 0
                • R Offline
                  R2rho
                  last edited by

                  @planedrop @olivierlambert @probain so I installed Ubuntu 22.04 on these last night and came back to the same frozen lockup as I was having with XCP-NG so it looks like I somehow received two equivalent servers from OnLogic that were both faulty to some degree. So definitely not an issue with XCP-NG in this case. Thank you for your help, I will be processing a return on these servers and go with a different product altogether.

                  P planedropP daveD 3 Replies Last reply Reply Quote 2
                  • P Offline
                    probain @R2rho
                    last edited by

                    @R2rho
                    Faulty gear always sucks. But who would've guessed that two separate systems would produce the same problems. That is highly unlikely, but never impossible.

                    Good luck with the RMA

                    1 Reply Last reply Reply Quote 1
                    • planedropP Offline
                      planedrop Top contributor @R2rho
                      last edited by

                      @R2rho Yeah that is really surprising.

                      I suppose it could be some kind of wider hardware incompatibility or something, but still crazy either way.

                      Glad you got that somewhat sorted out though.

                      1 Reply Last reply Reply Quote 0
                      • olivierlambertO Offline
                        olivierlambert Vates 🪐 Co-Founder CEO
                        last edited by

                        Thanks a lot for the feedback. Shit happens, we usually took hardware for granted, and it's not 😞

                        1 Reply Last reply Reply Quote 1
                        • daveD Offline
                          dave @R2rho
                          last edited by

                          @R2rho We were building dozens ASRock Rack mainboard- and barebone based systems over the past few years. Starting with the X470D4U which worked realy great. Since the X570D4, it started to get messy. The B650D4U is also affected. We had random periodic reboots and freezes, mostly after some weeks or months uptime.

                          Interestingly we have identical systems which have an uptime of over a year. I would say, about 60% of the systems were affected.

                          BIOS version and attached hardware did not really matter.

                          I once contacted the ASRock support, but they did not know of a general problem, instead they suggested to check other components. (which we also did)

                          We went the RMA way and we even had some exchanged RMA mainboards, which also were faulty.

                          But: The most recent mainboard returning from RMA seems to work...so maybe you`re lucky 🙂

                          R 1 Reply Last reply Reply Quote 1
                          • R Offline
                            R2rho @dave
                            last edited by

                            @dave That's pretty brutal honestly, I'm thinking about just calling it a day and moving away from Asrock servers entirely. I'm looking to set XCP-NG up on some IOT/Edge servers on some short-depth racks in a factory environment, so I really liked the form factor of these from OnLogic, but I've had the worst experience, and seeing your feedback definitely makes me want to go a different direction. I'm looking at some short-depth servers from SuperMicro geared specifically for IOT/Edge that I think will work out much better.

                            daveD 1 Reply Last reply Reply Quote 0
                            • daveD Offline
                              dave @R2rho
                              last edited by

                              @R2rho yeah, there are Supermicro systems with AM5 which can handle a decent amount of load, like based on the h13sae-mf, like:

                              https://www.supermicro.com/de/products/system/mainstream/1u/as-1015a-mt
                              (with less depth)

                              Seem to be stable, but we have a small issue regarding onboard graphics ATM:

                              https://xcp-ng.org/forum/topic/9976/black-screen-after-install-on-supermicro-h13sae-mf-with-ryzen-9950x/3?_=1734419502978

                              1 Reply Last reply Reply Quote 0
                              • First post
                                Last post