XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Can't boot a VM with 1TB memory / 128 CPUs

    Scheduled Pinned Locked Moved Compute
    14 Posts 3 Posters 80 Views 2 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • olivierlambertO Offline
      olivierlambert Vates 🪐 Co-Founder CEO
      last edited by

      Hi,

      What kind of guest do you use? Linux or Windows?

      F 1 Reply Last reply Reply Quote 0
      • F Offline
        fva-anssi @olivierlambert
        last edited by

        @olivierlambert Debian 12 template with PXE boot, local storage
        Host has 2 x AMD EPYC 9534 64-Core Processor / 1.5 TiB of memory (24 x 64 GiB)

        The issue comes from the number of CPUs, not the memory. I've made more testing. I've created some VMs from XOA 5.107.1 with template Debian Bookworm 12, PXE boot, local storage (EXT4)

        no boot means console with "Guest has not initialized the display"
        boot means : Tianocore logo and PXE boot

        64x / 1 TiB => boot => upgrade to 128x => no boot
        128x / 1TiB => no boot => downgrade to 64x => no boot
        96x / 256 GiB => boot => upgrade to 102x => no boot => downgrade to 96x => no boot
        97x / 256 GiB => no boot
        99x / 512 GiB => no boot

        There is a limitation of 96 cores, I've tried different CPU topology without any success

        1 Reply Last reply Reply Quote 0
        • olivierlambertO Offline
          olivierlambert Vates 🪐 Co-Founder CEO
          last edited by

          Yeah that makes sense, beyond 64 there's potential topology issues. But I remember we tested internally up to 128 and it worked, probably on a single socket though.

          The official limit for strict security support is 32 though: https://docs.xcp-ng.org/installation/requirements/#xcp-ng-83-lts-1

          I'm pretty sure we managed to get 128 working. But still, there's a lot of topology work to be done in Xen upstream to go beyond. The real complex limit should be 256. So fixing <256 is probably more bug fixing than huge rework.

          1 Reply Last reply Reply Quote 0
          • olivierlambertO Offline
            olivierlambert Vates 🪐 Co-Founder CEO
            last edited by olivierlambert

            I remember we tested 128vCPU and it worked (Ubuntu VM, but Debian should be similar).

            htop18vcpus.png

            It was on a 2CRSi machine (1 socket however).

            1 Reply Last reply Reply Quote 0
            • olivierlambertO Offline
              olivierlambert Vates 🪐 Co-Founder CEO
              last edited by

              Can you try the guest in BIOS mode in case that could change anything?

              F 1 Reply Last reply Reply Quote 0
              • F Offline
                fva-anssi @olivierlambert
                last edited by

                @olivierlambert it works in BIOS mode. The system is Dell Poweredge R7625 with AMD EPYC

                I can't go beyond 128 cores : xenopsd internal error: Xenctrl.Error("22: Invalid argument")

                1 Reply Last reply Reply Quote 0
                • olivierlambertO Offline
                  olivierlambert Vates 🪐 Co-Founder CEO
                  last edited by

                  Okay so there's 2 things:

                  1. UEFI 96 cores limitation is hard coded here: https://github.com/xcp-ng-rpms/edk2/blob/5af940036ace555e5315a78a301912d294277ec0/SPECS/edk2.spec#L138 (to be invetigated)
                  2. BIOS hasn't such limitation, but then you are maxing the next barrier, 128 cores, due to something else (up to 255 vCPUs)
                  3. Then the "final" boss is beyond that
                  F 1 Reply Last reply Reply Quote 0
                  • F Offline
                    fva-anssi @olivierlambert
                    last edited by

                    @olivierlambert Thanks a lot, the limitations of 128 cores and BIOS are fine for my usage.

                    1 Reply Last reply Reply Quote 1
                    • olivierlambertO Offline
                      olivierlambert Vates 🪐 Co-Founder CEO
                      last edited by

                      Thanks, feel free to reach us if you need more details on the point 2 and our progress, we can prioritize if your org is needing it.

                      F 1 Reply Last reply Reply Quote 0
                      • F Offline
                        fva-anssi @olivierlambert
                        last edited by

                        @olivierlambert it's for a rare usecase and we can wait for a proper fix

                        1 Reply Last reply Reply Quote 0
                        • D Offline
                          DustinB
                          last edited by

                          I can't even imagine how long a memtest would take on a box with this much ram. Ha

                          1 Reply Last reply Reply Quote 0
                          • olivierlambertO Offline
                            olivierlambert Vates 🪐 Co-Founder CEO
                            last edited by

                            We are testing some machines with 6TiB RAM 😆

                            D 1 Reply Last reply Reply Quote 0
                            • D Offline
                              DustinB @olivierlambert
                              last edited by

                              @olivierlambert said in Can't boot a VM with 1TB memory / 128 CPUs:

                              We are testing some machines with 6TiB RAM 😆

                              My god

                              1 Reply Last reply Reply Quote 0
                              • First post
                                Last post