XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    HPC with 2x64core (256 threads) possible with XCP-ng?

    Scheduled Pinned Locked Moved Compute
    14 Posts 4 Posters 1.6k Views 4 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • olivierlambertO Offline
      olivierlambert Vates 🪐 Co-Founder CEO
      last edited by

      If you can, please provide feedback 🙂 We'll be happy to learn if there's any problem!

      1 Reply Last reply Reply Quote 0
      • ForzaF Online
        Forza
        last edited by

        Of course 🙂

        Does anyone else have any experience with HPC on this scale with XCP-ng/Xen/Xenserver?

        In the testing we did on the EPYC we see that best performance is gained when only physical cores are allocated to the VM. So giving 24 cores to the VM was faster than giving 48 virtual threads. I suspect that the bottleneck is RAM bandwidth. The simulation uses about 100-200GB RAM (in these tests). I am not sure how it would scale with a dual CPU (and so a NUMA situation) would happen.

        We did the tests on a an older dual Xeon CPU workstation (not virtualised) with 512GB RAM. The software seems to detect hyperthreading and only uses half of the available threads. This detection did not happen when we run it in a VM, which might explain the results.

        1 Reply Last reply Reply Quote 0
        • olivierlambertO Offline
          olivierlambert Vates 🪐 Co-Founder CEO
          last edited by

          For HPC, you might want to use CPU pinning or things like that. Flexibility of virtualization is maybe not required will squeezing most performance is the key parameter.

          ForzaF 1 Reply Last reply Reply Quote 0
          • ForzaF Online
            Forza @olivierlambert
            last edited by

            The question relates to effective use of expensive hardware. Virtualizing it enables more possible when simulations aren't running by allowing other vms on that host.

            But do we know if more than 64 threads are a possibility with xcp-ng?

            T 1 Reply Last reply Reply Quote 0
            • olivierlambertO Offline
              olivierlambert Vates 🪐 Co-Founder CEO
              last edited by

              The VM will run, yes. It's not a very known territory, that's why I'm asking for feedback 🙂

              1 Reply Last reply Reply Quote 0
              • T Offline
                tuxen Top contributor @Forza
                last edited by tuxen

                @Forza Take a look:

                https://xcp-ng.org/forum/post/49400

                At the time of this topic, I remember asking a coworker to boot a CentOS 7.9 with more than 64 vcpus on a 48C/96T Xeon server. The VM started normally, but it didn't recognizes the vcpus > 64.

                I've not tested that VM param platform:acpi=0 as a possible solution and the trade-offs. In the past, some old RHEL 5.x VMs without acpi support would simply power off (like pulling the power cord) instead of a clean shutdown on a vm-shutdown command.

                Regarding that CFD software, does it support a worker/farm design? vGPU offload? I'm not a HPC expert but considering the EPYC MCM architecture, instead of a big VM, spreading the workload across many workers pinned to each CCD (or each numa nodes on a NPS4 confg) may be interesting.

                Before buying those monsters, I would ask AMD to deploy a PoC using the target server model. For such demands, it's very important to do some sort of certification/validation.

                ForzaF 1 Reply Last reply Reply Quote 1
                • ForzaF Online
                  Forza @tuxen
                  last edited by

                  @tuxen thank you. Those are very valuable thoughts. There is a remote render mode that can be used to render on a farm of nodes. The problem is making models that scale well in such a configuration. This is why we started with a Xeon workstation many years ago, but I do agree that it might be worth looking at this option again! The cost for render licensing is also higher than that of the hardware which is another factor. Maybe it's possible to rent some cloud hw space and test.

                  1 Reply Last reply Reply Quote 0
                  • M Offline
                    mersper
                    last edited by

                    Hi @Forza

                    I run a few linux HPC clusters, one particular platform runs on XCP-ng VMs using AMD epycs. We see a max of 64 vcpus recognised by the VM on XCP. You can assign more, but they are not visible from the VM OS. There also seems to be a total RAM size per VM too, which is 512GB, if memory serves (no pun intended 🙂 ).

                    With regard to your other question - 7773X vs 7204 - on linux I would suspect that most codes would run the same binaries on both, but you may see a performance hit if an optimized binary wasn't compiled against one or the other cpu. But of course, there could just as easily be lots of other reasons for differences in performance across these boxes.

                    ForzaF 1 Reply Last reply Reply Quote 2
                    • ForzaF Online
                      Forza @mersper
                      last edited by

                      @mersper What kind of VMs do you use with this, and how do you think the performance scales to 64x cores?

                      M 1 Reply Last reply Reply Quote 0
                      • M Offline
                        mersper @Forza
                        last edited by

                        @Forza , Virtualization is PVHVM, running RockyLinux8. VM root disk is local to the host on RAID10 SAS drives, CPUs are dual-socket AMD EPYC 7552 48C/96T, and 512GB physical RAM. And no GPU.

                        Flexibility is prioritised over performance on this cluster - it's used for undergraduate teaching and projects. We don't do cpu-pinning for instance.

                        We typically run bioinformatics and Molecular Dynamics codes. If we look at MD codes (high CPU, low RAM , low IO), they scale as expected up to 64 cores - I'm pretty happy with the performance. But having said that, I haven't compared directly with bare-metal.

                        ForzaF 1 Reply Last reply Reply Quote 1
                        • ForzaF Online
                          Forza @mersper
                          last edited by

                          @mersper Thank you. I will re-think the setup. Having 256 threads in one VM isn't perhaps possible. I have scheduled a meeting with the software manufacturer to talk about network rendering etc. It might be better to have serveral VMs with pinned CPUs and run render jobs. I'll update on the progress 😃

                          1 Reply Last reply Reply Quote 0
                          • First post
                            Last post