XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Can't boot a VM with 1TB memory / 128 CPUs

    Scheduled Pinned Locked Moved Compute
    14 Posts 3 Posters 79 Views 2 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • F Offline
      fva-anssi
      last edited by

      I'm using XCP-NG 8.3 on a DELL R7625 AMD EPYC Host. It has 1.5 TB memory, 128 CPUs (256 threads) and I try to boot a domU with 128 CPUs and 1TB of memory.

      The DomU is created and started, I can see a 100% CPU usage on CPU0 but the console displays "Guest has not initialized the display (yet)."

      Virtualization mode : Hardware virtualization (HVM)
      CPU limits 128/128
      Topology: 2 sockets with 64 cores per socket
      Memory limits (min/max) Static: 1 GiB/1 TiB Dynamic: 1 TiB/1 TiB

      1 Reply Last reply Reply Quote 0
      • olivierlambertO Offline
        olivierlambert Vates 🪐 Co-Founder CEO
        last edited by

        Hi,

        What kind of guest do you use? Linux or Windows?

        F 1 Reply Last reply Reply Quote 0
        • F Offline
          fva-anssi @olivierlambert
          last edited by

          @olivierlambert Debian 12 template with PXE boot, local storage
          Host has 2 x AMD EPYC 9534 64-Core Processor / 1.5 TiB of memory (24 x 64 GiB)

          The issue comes from the number of CPUs, not the memory. I've made more testing. I've created some VMs from XOA 5.107.1 with template Debian Bookworm 12, PXE boot, local storage (EXT4)

          no boot means console with "Guest has not initialized the display"
          boot means : Tianocore logo and PXE boot

          64x / 1 TiB => boot => upgrade to 128x => no boot
          128x / 1TiB => no boot => downgrade to 64x => no boot
          96x / 256 GiB => boot => upgrade to 102x => no boot => downgrade to 96x => no boot
          97x / 256 GiB => no boot
          99x / 512 GiB => no boot

          There is a limitation of 96 cores, I've tried different CPU topology without any success

          1 Reply Last reply Reply Quote 0
          • olivierlambertO Offline
            olivierlambert Vates 🪐 Co-Founder CEO
            last edited by

            Yeah that makes sense, beyond 64 there's potential topology issues. But I remember we tested internally up to 128 and it worked, probably on a single socket though.

            The official limit for strict security support is 32 though: https://docs.xcp-ng.org/installation/requirements/#xcp-ng-83-lts-1

            I'm pretty sure we managed to get 128 working. But still, there's a lot of topology work to be done in Xen upstream to go beyond. The real complex limit should be 256. So fixing <256 is probably more bug fixing than huge rework.

            1 Reply Last reply Reply Quote 0
            • olivierlambertO Offline
              olivierlambert Vates 🪐 Co-Founder CEO
              last edited by olivierlambert

              I remember we tested 128vCPU and it worked (Ubuntu VM, but Debian should be similar).

              htop18vcpus.png

              It was on a 2CRSi machine (1 socket however).

              1 Reply Last reply Reply Quote 0
              • olivierlambertO Offline
                olivierlambert Vates 🪐 Co-Founder CEO
                last edited by

                Can you try the guest in BIOS mode in case that could change anything?

                F 1 Reply Last reply Reply Quote 0
                • F Offline
                  fva-anssi @olivierlambert
                  last edited by

                  @olivierlambert it works in BIOS mode. The system is Dell Poweredge R7625 with AMD EPYC

                  I can't go beyond 128 cores : xenopsd internal error: Xenctrl.Error("22: Invalid argument")

                  1 Reply Last reply Reply Quote 0
                  • olivierlambertO Offline
                    olivierlambert Vates 🪐 Co-Founder CEO
                    last edited by

                    Okay so there's 2 things:

                    1. UEFI 96 cores limitation is hard coded here: https://github.com/xcp-ng-rpms/edk2/blob/5af940036ace555e5315a78a301912d294277ec0/SPECS/edk2.spec#L138 (to be invetigated)
                    2. BIOS hasn't such limitation, but then you are maxing the next barrier, 128 cores, due to something else (up to 255 vCPUs)
                    3. Then the "final" boss is beyond that
                    F 1 Reply Last reply Reply Quote 0
                    • F Offline
                      fva-anssi @olivierlambert
                      last edited by

                      @olivierlambert Thanks a lot, the limitations of 128 cores and BIOS are fine for my usage.

                      1 Reply Last reply Reply Quote 1
                      • olivierlambertO Offline
                        olivierlambert Vates 🪐 Co-Founder CEO
                        last edited by

                        Thanks, feel free to reach us if you need more details on the point 2 and our progress, we can prioritize if your org is needing it.

                        F 1 Reply Last reply Reply Quote 0
                        • F Offline
                          fva-anssi @olivierlambert
                          last edited by

                          @olivierlambert it's for a rare usecase and we can wait for a proper fix

                          1 Reply Last reply Reply Quote 0
                          • D Offline
                            DustinB
                            last edited by

                            I can't even imagine how long a memtest would take on a box with this much ram. Ha

                            1 Reply Last reply Reply Quote 0
                            • olivierlambertO Offline
                              olivierlambert Vates 🪐 Co-Founder CEO
                              last edited by

                              We are testing some machines with 6TiB RAM 😆

                              D 1 Reply Last reply Reply Quote 0
                              • D Offline
                                DustinB @olivierlambert
                                last edited by

                                @olivierlambert said in Can't boot a VM with 1TB memory / 128 CPUs:

                                We are testing some machines with 6TiB RAM 😆

                                My god

                                1 Reply Last reply Reply Quote 0
                                • First post
                                  Last post