XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Can't boot a VM with 1TB memory / 128 CPUs

    Scheduled Pinned Locked Moved Compute
    14 Posts 3 Posters 75 Views 2 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • F Offline
      fva-anssi @olivierlambert
      last edited by

      @olivierlambert Debian 12 template with PXE boot, local storage
      Host has 2 x AMD EPYC 9534 64-Core Processor / 1.5 TiB of memory (24 x 64 GiB)

      The issue comes from the number of CPUs, not the memory. I've made more testing. I've created some VMs from XOA 5.107.1 with template Debian Bookworm 12, PXE boot, local storage (EXT4)

      no boot means console with "Guest has not initialized the display"
      boot means : Tianocore logo and PXE boot

      64x / 1 TiB => boot => upgrade to 128x => no boot
      128x / 1TiB => no boot => downgrade to 64x => no boot
      96x / 256 GiB => boot => upgrade to 102x => no boot => downgrade to 96x => no boot
      97x / 256 GiB => no boot
      99x / 512 GiB => no boot

      There is a limitation of 96 cores, I've tried different CPU topology without any success

      1 Reply Last reply Reply Quote 0
      • olivierlambertO Offline
        olivierlambert Vates 🪐 Co-Founder CEO
        last edited by

        Yeah that makes sense, beyond 64 there's potential topology issues. But I remember we tested internally up to 128 and it worked, probably on a single socket though.

        The official limit for strict security support is 32 though: https://docs.xcp-ng.org/installation/requirements/#xcp-ng-83-lts-1

        I'm pretty sure we managed to get 128 working. But still, there's a lot of topology work to be done in Xen upstream to go beyond. The real complex limit should be 256. So fixing <256 is probably more bug fixing than huge rework.

        1 Reply Last reply Reply Quote 0
        • olivierlambertO Offline
          olivierlambert Vates 🪐 Co-Founder CEO
          last edited by olivierlambert

          I remember we tested 128vCPU and it worked (Ubuntu VM, but Debian should be similar).

          htop18vcpus.png

          It was on a 2CRSi machine (1 socket however).

          1 Reply Last reply Reply Quote 0
          • olivierlambertO Offline
            olivierlambert Vates 🪐 Co-Founder CEO
            last edited by

            Can you try the guest in BIOS mode in case that could change anything?

            F 1 Reply Last reply Reply Quote 0
            • F Offline
              fva-anssi @olivierlambert
              last edited by

              @olivierlambert it works in BIOS mode. The system is Dell Poweredge R7625 with AMD EPYC

              I can't go beyond 128 cores : xenopsd internal error: Xenctrl.Error("22: Invalid argument")

              1 Reply Last reply Reply Quote 0
              • olivierlambertO Offline
                olivierlambert Vates 🪐 Co-Founder CEO
                last edited by

                Okay so there's 2 things:

                1. UEFI 96 cores limitation is hard coded here: https://github.com/xcp-ng-rpms/edk2/blob/5af940036ace555e5315a78a301912d294277ec0/SPECS/edk2.spec#L138 (to be invetigated)
                2. BIOS hasn't such limitation, but then you are maxing the next barrier, 128 cores, due to something else (up to 255 vCPUs)
                3. Then the "final" boss is beyond that
                F 1 Reply Last reply Reply Quote 0
                • F Offline
                  fva-anssi @olivierlambert
                  last edited by

                  @olivierlambert Thanks a lot, the limitations of 128 cores and BIOS are fine for my usage.

                  1 Reply Last reply Reply Quote 1
                  • olivierlambertO Offline
                    olivierlambert Vates 🪐 Co-Founder CEO
                    last edited by

                    Thanks, feel free to reach us if you need more details on the point 2 and our progress, we can prioritize if your org is needing it.

                    F 1 Reply Last reply Reply Quote 0
                    • F Offline
                      fva-anssi @olivierlambert
                      last edited by

                      @olivierlambert it's for a rare usecase and we can wait for a proper fix

                      1 Reply Last reply Reply Quote 0
                      • D Online
                        DustinB
                        last edited by

                        I can't even imagine how long a memtest would take on a box with this much ram. Ha

                        1 Reply Last reply Reply Quote 0
                        • olivierlambertO Offline
                          olivierlambert Vates 🪐 Co-Founder CEO
                          last edited by

                          We are testing some machines with 6TiB RAM 😆

                          D 1 Reply Last reply Reply Quote 0
                          • D Online
                            DustinB @olivierlambert
                            last edited by

                            @olivierlambert said in Can't boot a VM with 1TB memory / 128 CPUs:

                            We are testing some machines with 6TiB RAM 😆

                            My god

                            1 Reply Last reply Reply Quote 0
                            • First post
                              Last post