XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    XCP-ng 8.3 with VM crashing

    Scheduled Pinned Locked Moved Hardware
    16 Posts 5 Posters 2.5k Views 4 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • A Offline
      AlbertK @andyhhp
      last edited by

      @andyhhp
      This the param list of one of the VM that is self rebooting.

      uuid ( RO)                                  : 1199c4b4-6072-7086-7286-7d7d1cad2c33
                                  name-label ( RW): K8s-node1
                            name-description ( RW):
                                user-version ( RW): 1
                               is-a-template ( RW): false
                         is-default-template ( RW): false
                               is-a-snapshot ( RO): false
                                 snapshot-of ( RO): <not in database>
                                   snapshots ( RO):
                               snapshot-time ( RO): 19700101T00:00:00Z
                               snapshot-info ( RO):
                                      parent ( RO): <not in database>
                                    children ( RO):
                           is-control-domain ( RO): false
                                 power-state ( RO): running
                               memory-actual ( RO): 4297039872
                               memory-target ( RO): 4294967296
                             memory-overhead ( RO): 39845888
                           memory-static-max ( RW): 4294967296
                          memory-dynamic-max ( RW): 4294967296
                          memory-dynamic-min ( RW): 4294967296
                           memory-static-min ( RW): 1073741824
                            suspend-VDI-uuid ( RW): <not in database>
                             suspend-SR-uuid ( RW): <not in database>
                                VCPUs-params (MRW):
                                   VCPUs-max ( RW): 4
                            VCPUs-at-startup ( RW): 4
                      actions-after-shutdown ( RW): Destroy
                    actions-after-softreboot ( RW): Soft reboot
                        actions-after-reboot ( RW): Restart
                         actions-after-crash ( RW): Restart
                               console-uuids (SRO): 7c1c7058-8b18-06ca-60f5-9cbfedec2d11
                                         hvm ( RO): true
                                    platform (MRW): timeoffset: 0; nic_type: e1000; device-model: qemu-upstream-uefi; secureboot: false; vga: std; videoram: 8; viridian: false; device_id: 0001; nx: true; acpi: 1; apic: true; pae: true; hpet: true
                          allowed-operations (SRO): metadata_export; changing_VCPUs_live; changing_dynamic_range; migrate_send; pool_migrate; suspend; hard_reboot; hard_shutdown; clean_reboot; clean_shutdown; pause; checkpoint; snapshot
                          current-operations (SRO):
                          blocked-operations (MRW):
                         allowed-VBD-devices (SRO): 1; 2; 4; 5; 6; 7; 8; 9; 10; 11; 12; 13; 14; 15; 16; 17; 18; 19; 20; 21; 22; 23; 24; 25; 26; 27; 28; 29; 30; 31; 32; 33; 34; 35; 36; 37; 38; 39; 40; 41; 42; 43; 44; 45; 46; 47; 48; 49; 50; 51; 52; 53; 54; 55; 56; 57; 58; 59; 60; 61; 62; 63; 64; 65; 66; 67; 68; 69; 70; 71; 72; 73; 74; 75; 76; 77; 78; 79; 80; 81; 82; 83; 84; 85; 86; 87; 88; 89; 90; 91; 92; 93; 94; 95; 96; 97; 98; 99; 100; 101; 102; 103; 104; 105; 106; 107; 108; 109; 110; 111; 112; 113; 114; 115; 116; 117; 118; 119; 120; 121; 122; 123; 124; 125; 126; 127; 128; 129; 130; 131; 132; 133; 134; 135; 136; 137; 138; 139; 140; 141; 142; 143; 144; 145; 146; 147; 148; 149; 150; 151; 152; 153; 154; 155; 156; 157; 158; 159; 160; 161; 162; 163; 164; 165; 166; 167; 168; 169; 170; 171; 172; 173; 174; 175; 176; 177; 178; 179; 180; 181; 182; 183; 184; 185; 186; 187; 188; 189; 190; 191; 192; 193; 194; 195; 196; 197; 198; 199; 200; 201; 202; 203; 204; 205; 206; 207; 208; 209; 210; 211; 212; 213; 214; 215; 216; 217; 218; 219; 220; 221; 222; 223; 224; 225; 226; 227; 228; 229; 230; 231; 232; 233; 234; 235; 236; 237; 238; 239; 240; 241; 242; 243; 244; 245; 246; 247; 248; 249; 250; 251; 252; 253; 254
                         allowed-VIF-devices (SRO): 1; 2; 3; 4; 5; 6
                              possible-hosts ( RO): f8cc6a6c-8ff4-4e3b-9f92-b5f62bef04ed
                                 domain-type ( RW): hvm
                         current-domain-type ( RO): hvm
                             HVM-boot-policy ( RW): BIOS order
                             HVM-boot-params (MRW): order: cdn; firmware: uefi
                       HVM-shadow-multiplier ( RW): 1.000
                                   PV-kernel ( RW):
                                  PV-ramdisk ( RW):
                                     PV-args ( RW):
                              PV-legacy-args ( RW):
                               PV-bootloader ( RW):
                          PV-bootloader-args ( RW):
                         last-boot-CPU-flags ( RO): vendor: AuthenticAMD; features: 178bfbff-f6f83203-2e500800-040001f3-0000000f-219c01a9-00400004-00000000-00101005-00000000-00000000-10000044-00000000-00000000-00000000-00000000-00000000-00000000-00000000-00000000-00000000-00000000
                            last-boot-record ( RO): '{"platformdata":{"timeoffset":"0","featureset":"178bfbff-f6f83203-2e500800-040001f3-0000000f-219c01a9-00400004-00000000-00101005-00000000-00000000-10000044-00000000-00000000-00000000-00000000-00000000-00000000-00000000-00000000-00000000-00000000","usb":"true","usb_tablet":"true","device-model":"qemu-upstream-uefi","secureboot":"false","vga":"std","videoram":"8","viridian":"false","device_id":"0001","nx":"true","acpi":"1","apic":"true","pae":"true","hpet":"true"},"xen_platform":[1,2],"pv_drivers_detected":true,"pci_power_mgmt":false,"pci_msitranslate":true,"qemu_vifs":[],"qemu_vbds":[],"suspend_memory_bytes":2149556224,"original_profile":"Qemu_upstream_uefi","profile":"Qemu_upstream_uefi","nested_virt":false,"nomigrate":false,"domain_config":["X86",{"misc_flags":[],"emulation_flags":["X86_EMU_LAPIC","X86_EMU_HPET","X86_EMU_PM","X86_EMU_RTC","X86_EMU_IOAPIC","X86_EMU_PIC","X86_EMU_VGA","X86_EMU_IOMMU","X86_EMU_PIT","X86_EMU_USE_PIRQ"]}],"last_start_time":1730178556.316762,"ty":["HVM",{"firmware":["Uefi",{"backend":"xapidb","on_boot":"Persist"}],"qemu_stubdom":false,"qemu_disk_cmdline":false,"boot_order":"cdn","pci_passthrough":false,"pci_emulations":[],"serial":"pty","acpi":true,"video":"Standard_VGA","video_mib":8,"timeoffset":"0","shadow_multiplier":1.0,"hap":true}],"build_info":{"has_hard_affinity":false,"priv":["BuildHVM",{"video_mib":8,"shadow_multiplier":1.0}],"vcpus":2,"kernel":"/usr/libexec/xen/boot/hvmloader","memory_target":2097152,"memory_max":2097152},"version":2}'
                                 resident-on ( RO): f8cc6a6c-8ff4-4e3b-9f92-b5f62bef04ed
                                    affinity ( RW): <not in database>
                                other-config (MRW): auto_poweron: true; xo:1199c4b4: {"creation":{"date":"2024-10-28T05:11:47.183Z","template":"df1a0e64-3799-482b-aa9f-1ed713c7dac5","user":"98707372-26e6-4877-8a14-85064b5f853a"}}; base_template_name: Ubuntu Jammy Jellyfish 22.04; import_task: OpaqueRef:807d2f23-5607-4fc9-2e3f-a3e9f055e800; mac_seed: 38c38661-6a24-4b1b-63e2-86c3ff2035d3; linux_template: true; install-methods: cdrom,nfs,http,ftp
                                      dom-id ( RO): 2
                             recommendations ( RO): <restrictions><restriction field="memory-static-max" max="1649267441664"/><restriction field="vcpus-max" max="64"/><restriction field="has-vendor-device" value="false"/><restriction field="allow-gpu-passthrough" value="1"/><restriction field="allow-vgpu" value="1"/><restriction field="allow-network-sriov" value="1"/><restriction field="supports-bios" value="yes"/><restriction field="supports-uefi" value="yes"/><restriction field="supports-secure-boot" value="yes"/><restriction max="255" property="number-of-vbds"/><restriction max="7" property="number-of-vifs"/></restrictions>
                               xenstore-data (MRW): vm-data/mmio-hole-size: 268435456; vm-data:
                  ha-always-run ( RW) [DEPRECATED]: false
                         ha-restart-priority ( RW):
                                       blobs ( RO):
                                  start-time ( RO): 20250320T19:19:07Z
                                install-time ( RO): 20241028T05:11:47Z
                                VCPUs-number ( RO): 4
                           VCPUs-utilisation (MRO): 0: 0.115; 1: 0.106; 2: 0.111; 3: 0.107
                                  os-version (MRO): name: Ubuntu 24.04; uname: 6.8.0-54-generic; distro: Ubuntu
                          PV-drivers-version (MRO): major: 1; minor: 0; micro: 0; build: proto-0.4.0
          PV-drivers-up-to-date ( RO) [DEPRECATED]: true
                                      memory (MRO):
                                       disks (MRO):
                                        VBDs (SRO): f0602bf4-1f5f-12f1-957b-f6c99669d98c; de3047f9-b097-e345-3a8a-77094f5f8de7
                                    networks (MRO): 0/ip: 192.168.8.86; 0/ipv4/0: 192.168.8.86; 0/ipv6/0: fe80::dc3b:cff:fef0:d3ed
                         PV-drivers-detected ( RO): true
                                       other (MRO): platform-feature-xs_reset_watches: 1; platform-feature-multiprocessor-suspend: 1; has-vendor-device: 0; feature-vcpu-hotplug: 1; feature-suspend: 1; feature-reboot: 1; feature-poweroff: 1; feature-balloon: 1
                                        live ( RO): true
                  guest-metrics-last-updated ( RO): 20250320T19:19:18Z
                         can-use-hotplug-vbd ( RO): unspecified
                         can-use-hotplug-vif ( RO): unspecified
                    cooperative ( RO) [DEPRECATED]: true
                                        tags (SRW):
                                   appliance ( RW): <not in database>
                                      groups ( RW):
                           snapshot-schedule ( RW): <not in database>
                            is-vmss-snapshot ( RO): false
                                 start-delay ( RW): 0
                              shutdown-delay ( RW): 0
                                       order ( RW): 0
                                     version ( RO): 0
                               generation-id ( RO):
                   hardware-platform-version ( RO): 0
                           has-vendor-device ( RW): false
                             requires-reboot ( RO): false
                             reference-label ( RO): ubuntu-22.04
                                bios-strings (MRO): bios-vendor: Xen; bios-version: ; system-manufacturer: Xen; system-product-name: HVM domU; system-version: ; system-serial-number: ; baseboard-manufacturer: ; baseboard-product-name: ; baseboard-version: ; baseboard-serial-number: ; baseboard-asset-tag: ; baseboard-location-in-chassis: ; enclosure-asset-tag: ; hp-rombios: ; oem-1: Xen; oem-2: MS_VM_CERT/SHA1/bdbeb6e0a816d43fa6d3fe8aaef04c2bad9d3e3d
                           pending-guidances ( RO):
                                       vtpms ( RO):
               pending-guidances-recommended ( RO):
                      pending-guidances-full ( RO):
      
      A 1 Reply Last reply Reply Quote 0
      • A Offline
        andyhhp Xen Guru @AlbertK
        last edited by

        @AlbertK Thanks. There's no nested-virt configured there.

        I have to admit this is looking more and more like buggy CPU. Memory corruption is a possibility, but this is a clearly corrupt field in the middle of otherwise sane-looking fields in the VMCB.

        Do you have any other identical systems? Can you swap this CPU out for another one to see what happens?

        A 1 Reply Last reply Reply Quote 0
        • A Offline
          AlbertK @andyhhp
          last edited by

          @andyhhp Unfortunately no, I do not have another machine to test out the CPU. I have ordered another set of 2x16GB of RAM to test if it is RAM issue.

          Will report back.

          1 Reply Last reply Reply Quote 0
          • J Offline
            joebeasley @AlbertK
            last edited by

            @AlbertK I had a similar issue where the whole server would just reboot randomly. Turned out to be an option in the bios called "cstates". It has something to do with processor power saving. I disabled any mention of cstates and have not had the reboot problems.

            A 1 Reply Last reply Reply Quote 0
            • A Offline
              AlbertK @joebeasley
              last edited by AlbertK

              @joebeasley Mine is more of one or more VM will auto reboot and sometime one VM will be not be accessible (cannot ssh or console from XO) (CPU 99%, no network or disk activity as seen in XO and need to force reboot). After that a few hours later the Host will reboot. This is happening every day now.

              I am seeing a lot of this in the host dmesg.

              [105679.203854] vif vif-6-0 vif6.0: Guest Rx stalled
              [105689.395996] vif vif-6-0 vif6.0: Guest Rx ready
              [105707.532509] vif vif-6-0 vif6.0: Guest Rx stalled
              [105717.555832] vif vif-6-0 vif6.0: Guest Rx ready
              [105744.154415] vif vif-6-0 vif6.0: Guest Rx stalled
              [105754.163666] vif vif-6-0 vif6.0: Guest Rx ready
              
              A 1 Reply Last reply Reply Quote 0
              • A Offline
                AlbertK @AlbertK
                last edited by

                I have installed a fresh set of RAM and still the system crash randomly with some of the crashes with crash log but some does not.

                This happen on a daily basis it is either the VM reboots or Host. I notice that with the crash log there is a consistent pattern of SVM error in CPU8 and once on CPU11.

                I then tried to disable the CPU8 and CPU11 from the cpu pool. There is no reboot from VM or Host for the last 7 days. Any ideas on why?.

                xl cpupool-cpu-remove 8,11
                
                A 1 Reply Last reply Reply Quote 0
                • A Offline
                  andyhhp Xen Guru @AlbertK
                  last edited by

                  As I said before, this is looking like a buggy CPU, and you've proved it, given a week with no incident if CPU8 is excluded.

                  1 Reply Last reply Reply Quote 0
                  • olivierlambertO Offline
                    olivierlambert Vates 🪐 Co-Founder CEO
                    last edited by

                    Clearly, there's one or 2 damaged core(s). Likely faulty CPU I'm afraid 😞

                    1 Reply Last reply Reply Quote 0
                    • R Offline
                      Riven
                      last edited by

                      If you are not getting crashes on cores 0-5 (assuming they are in use by your VMs) then its unlikely a physical problem.

                      The Ryzen 3600 is only a 6 core CPU, "cores" 8 & 11 are the SMT (Hyperthreaded) versions of cores 2 & 5

                      You could also try turning SMT off

                      A 1 Reply Last reply Reply Quote 0
                      • A Offline
                        AlbertK @Riven
                        last edited by

                        @Riven,

                        What I am not sure is how Xen arrange the CPU, is it core first followed by the SMT/HyperThread Core? or is it alternating ie RealCore, HyperThread Core.

                        1 Reply Last reply Reply Quote 0

                        Hello! It looks like you're interested in this conversation, but you don't have an account yet.

                        Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.

                        With your input, this post could be even better 💗

                        Register Login
                        • First post
                          Last post