XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    VM's with around 24GB+ crashes on migration.

    Scheduled Pinned Locked Moved Compute
    8 Posts 4 Posters 407 Views 4 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • K Offline
      Kevin87
      last edited by

      HI all,

      When i do a VM migration from node to node with the use of a shared storage and this vm has more then 24GB of memory it always crashes.
      Making a kdump to vmcore-dmesg

      This are the lines:
      vmcore-dmesg.txt

      Anyone knows why this gets triggered? The VM's uses static memory with the same min/max limit. So its not trying to lower the memory before the migration.
      Also the VM is not really busy, and is transferred over 10Gbit network in less then a minute.

      Kind regards,

      Kevin

      1 Reply Last reply Reply Quote 0
      • DanpD Offline
        Danp Pro Support Team
        last edited by

        Hi Kevin,

        Are you running Xen or XCP-ng? Which OS is being used in the VM?

        Dan

        1 Reply Last reply Reply Quote 0
        • K Offline
          Kevin87
          last edited by

          Hi Danp,

          I am using XCP-NG8.2 latest updates.
          The VM OS is in most cases Centos7.9 With guest tools installed.

          1 Reply Last reply Reply Quote 0
          • olivierlambertO Offline
            olivierlambert Vates 🪐 Co-Founder CEO
            last edited by

            xenwatch: page allocation failure: order:5, mode:0xc0d0

            Not a good sign 😕

            @andSmv can you take a look when you can?

            1 Reply Last reply Reply Quote 0
            • andSmvA Offline
              andSmv Vates 🪐 XCP-ng Team Xen Guru
              last edited by

              Hmmm, there's two poblems here (page alloc failure warning and NULL pointer BUG) in context of xenwatch kernel thread and basically both of them happenning when configuring XEN network frontend/backend communications.

              Normally this isn't related to memory footprint of the VM, but rather to XEN frontend/backend xenbus communication framework. Does the bugs desappear when you reduce the memory size for the VM and when all others params/environnement are the same?

              1 Reply Last reply Reply Quote 0
              • K Offline
                Kevin87
                last edited by

                I have not tested what happens if i reduce the memory of that VM cause the VM need that amount of memory. I do know that we have around 190 vm's and it only happens with vm's with alot of memory.

                1 Reply Last reply Reply Quote 0
                • andSmvA Offline
                  andSmv Vates 🪐 XCP-ng Team Xen Guru
                  last edited by

                  It's obviously is not exluded that the issue is related to the memory footprint. Moreover the first warning "complains" about failure on memory allocation. (I suppose that the "receiver" node has enough memory to host the VM).

                  Normally XEN hasn't limitations on Live Migration 24GB VM. So, it's difficult to say what's the issue here. But clearly there's a possibity that this is a bug in XEN/toolstack... Memory fragmentation on the receiver" node can be an issue too.

                  You can probably run some different configurations to try to pinpoint this issue.
                  May be for the start try to migrate a VM when no other VMs are running on the "receiver" node. Also try to migrate a VM with no network connections (as the issue seems to be related to network backend status changes)....

                  1 Reply Last reply Reply Quote 0
                  • K Offline
                    Kevin87
                    last edited by

                    Dear,

                    There is indeed enough memory on the recieving node. We are having nodes with 1TB of memory, and currently they are loaded with around 500GB each. Ill try to reproduce it with a cloned production server. So i can reproduce it a few times with and without network and to a empty receiving node. Ill keep you updated.

                    1 Reply Last reply Reply Quote 1
                    • First post
                      Last post