XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    XCP-ng 8.3 with VM crashing

    Scheduled Pinned Locked Moved Hardware
    16 Posts 5 Posters 2.5k Views 4 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • A Offline
      andyhhp Xen Guru @AlbertK
      last edited by

      @AlbertK None of those commands are relevant in a Xen system. You want xe vm-param-list uuid=$VM

      A 1 Reply Last reply Reply Quote 0
      • A Offline
        AlbertK @andyhhp
        last edited by

        @andyhhp
        This the param list of one of the VM that is self rebooting.

        uuid ( RO)                                  : 1199c4b4-6072-7086-7286-7d7d1cad2c33
                                    name-label ( RW): K8s-node1
                              name-description ( RW):
                                  user-version ( RW): 1
                                 is-a-template ( RW): false
                           is-default-template ( RW): false
                                 is-a-snapshot ( RO): false
                                   snapshot-of ( RO): <not in database>
                                     snapshots ( RO):
                                 snapshot-time ( RO): 19700101T00:00:00Z
                                 snapshot-info ( RO):
                                        parent ( RO): <not in database>
                                      children ( RO):
                             is-control-domain ( RO): false
                                   power-state ( RO): running
                                 memory-actual ( RO): 4297039872
                                 memory-target ( RO): 4294967296
                               memory-overhead ( RO): 39845888
                             memory-static-max ( RW): 4294967296
                            memory-dynamic-max ( RW): 4294967296
                            memory-dynamic-min ( RW): 4294967296
                             memory-static-min ( RW): 1073741824
                              suspend-VDI-uuid ( RW): <not in database>
                               suspend-SR-uuid ( RW): <not in database>
                                  VCPUs-params (MRW):
                                     VCPUs-max ( RW): 4
                              VCPUs-at-startup ( RW): 4
                        actions-after-shutdown ( RW): Destroy
                      actions-after-softreboot ( RW): Soft reboot
                          actions-after-reboot ( RW): Restart
                           actions-after-crash ( RW): Restart
                                 console-uuids (SRO): 7c1c7058-8b18-06ca-60f5-9cbfedec2d11
                                           hvm ( RO): true
                                      platform (MRW): timeoffset: 0; nic_type: e1000; device-model: qemu-upstream-uefi; secureboot: false; vga: std; videoram: 8; viridian: false; device_id: 0001; nx: true; acpi: 1; apic: true; pae: true; hpet: true
                            allowed-operations (SRO): metadata_export; changing_VCPUs_live; changing_dynamic_range; migrate_send; pool_migrate; suspend; hard_reboot; hard_shutdown; clean_reboot; clean_shutdown; pause; checkpoint; snapshot
                            current-operations (SRO):
                            blocked-operations (MRW):
                           allowed-VBD-devices (SRO): 1; 2; 4; 5; 6; 7; 8; 9; 10; 11; 12; 13; 14; 15; 16; 17; 18; 19; 20; 21; 22; 23; 24; 25; 26; 27; 28; 29; 30; 31; 32; 33; 34; 35; 36; 37; 38; 39; 40; 41; 42; 43; 44; 45; 46; 47; 48; 49; 50; 51; 52; 53; 54; 55; 56; 57; 58; 59; 60; 61; 62; 63; 64; 65; 66; 67; 68; 69; 70; 71; 72; 73; 74; 75; 76; 77; 78; 79; 80; 81; 82; 83; 84; 85; 86; 87; 88; 89; 90; 91; 92; 93; 94; 95; 96; 97; 98; 99; 100; 101; 102; 103; 104; 105; 106; 107; 108; 109; 110; 111; 112; 113; 114; 115; 116; 117; 118; 119; 120; 121; 122; 123; 124; 125; 126; 127; 128; 129; 130; 131; 132; 133; 134; 135; 136; 137; 138; 139; 140; 141; 142; 143; 144; 145; 146; 147; 148; 149; 150; 151; 152; 153; 154; 155; 156; 157; 158; 159; 160; 161; 162; 163; 164; 165; 166; 167; 168; 169; 170; 171; 172; 173; 174; 175; 176; 177; 178; 179; 180; 181; 182; 183; 184; 185; 186; 187; 188; 189; 190; 191; 192; 193; 194; 195; 196; 197; 198; 199; 200; 201; 202; 203; 204; 205; 206; 207; 208; 209; 210; 211; 212; 213; 214; 215; 216; 217; 218; 219; 220; 221; 222; 223; 224; 225; 226; 227; 228; 229; 230; 231; 232; 233; 234; 235; 236; 237; 238; 239; 240; 241; 242; 243; 244; 245; 246; 247; 248; 249; 250; 251; 252; 253; 254
                           allowed-VIF-devices (SRO): 1; 2; 3; 4; 5; 6
                                possible-hosts ( RO): f8cc6a6c-8ff4-4e3b-9f92-b5f62bef04ed
                                   domain-type ( RW): hvm
                           current-domain-type ( RO): hvm
                               HVM-boot-policy ( RW): BIOS order
                               HVM-boot-params (MRW): order: cdn; firmware: uefi
                         HVM-shadow-multiplier ( RW): 1.000
                                     PV-kernel ( RW):
                                    PV-ramdisk ( RW):
                                       PV-args ( RW):
                                PV-legacy-args ( RW):
                                 PV-bootloader ( RW):
                            PV-bootloader-args ( RW):
                           last-boot-CPU-flags ( RO): vendor: AuthenticAMD; features: 178bfbff-f6f83203-2e500800-040001f3-0000000f-219c01a9-00400004-00000000-00101005-00000000-00000000-10000044-00000000-00000000-00000000-00000000-00000000-00000000-00000000-00000000-00000000-00000000
                              last-boot-record ( RO): '{"platformdata":{"timeoffset":"0","featureset":"178bfbff-f6f83203-2e500800-040001f3-0000000f-219c01a9-00400004-00000000-00101005-00000000-00000000-10000044-00000000-00000000-00000000-00000000-00000000-00000000-00000000-00000000-00000000-00000000","usb":"true","usb_tablet":"true","device-model":"qemu-upstream-uefi","secureboot":"false","vga":"std","videoram":"8","viridian":"false","device_id":"0001","nx":"true","acpi":"1","apic":"true","pae":"true","hpet":"true"},"xen_platform":[1,2],"pv_drivers_detected":true,"pci_power_mgmt":false,"pci_msitranslate":true,"qemu_vifs":[],"qemu_vbds":[],"suspend_memory_bytes":2149556224,"original_profile":"Qemu_upstream_uefi","profile":"Qemu_upstream_uefi","nested_virt":false,"nomigrate":false,"domain_config":["X86",{"misc_flags":[],"emulation_flags":["X86_EMU_LAPIC","X86_EMU_HPET","X86_EMU_PM","X86_EMU_RTC","X86_EMU_IOAPIC","X86_EMU_PIC","X86_EMU_VGA","X86_EMU_IOMMU","X86_EMU_PIT","X86_EMU_USE_PIRQ"]}],"last_start_time":1730178556.316762,"ty":["HVM",{"firmware":["Uefi",{"backend":"xapidb","on_boot":"Persist"}],"qemu_stubdom":false,"qemu_disk_cmdline":false,"boot_order":"cdn","pci_passthrough":false,"pci_emulations":[],"serial":"pty","acpi":true,"video":"Standard_VGA","video_mib":8,"timeoffset":"0","shadow_multiplier":1.0,"hap":true}],"build_info":{"has_hard_affinity":false,"priv":["BuildHVM",{"video_mib":8,"shadow_multiplier":1.0}],"vcpus":2,"kernel":"/usr/libexec/xen/boot/hvmloader","memory_target":2097152,"memory_max":2097152},"version":2}'
                                   resident-on ( RO): f8cc6a6c-8ff4-4e3b-9f92-b5f62bef04ed
                                      affinity ( RW): <not in database>
                                  other-config (MRW): auto_poweron: true; xo:1199c4b4: {"creation":{"date":"2024-10-28T05:11:47.183Z","template":"df1a0e64-3799-482b-aa9f-1ed713c7dac5","user":"98707372-26e6-4877-8a14-85064b5f853a"}}; base_template_name: Ubuntu Jammy Jellyfish 22.04; import_task: OpaqueRef:807d2f23-5607-4fc9-2e3f-a3e9f055e800; mac_seed: 38c38661-6a24-4b1b-63e2-86c3ff2035d3; linux_template: true; install-methods: cdrom,nfs,http,ftp
                                        dom-id ( RO): 2
                               recommendations ( RO): <restrictions><restriction field="memory-static-max" max="1649267441664"/><restriction field="vcpus-max" max="64"/><restriction field="has-vendor-device" value="false"/><restriction field="allow-gpu-passthrough" value="1"/><restriction field="allow-vgpu" value="1"/><restriction field="allow-network-sriov" value="1"/><restriction field="supports-bios" value="yes"/><restriction field="supports-uefi" value="yes"/><restriction field="supports-secure-boot" value="yes"/><restriction max="255" property="number-of-vbds"/><restriction max="7" property="number-of-vifs"/></restrictions>
                                 xenstore-data (MRW): vm-data/mmio-hole-size: 268435456; vm-data:
                    ha-always-run ( RW) [DEPRECATED]: false
                           ha-restart-priority ( RW):
                                         blobs ( RO):
                                    start-time ( RO): 20250320T19:19:07Z
                                  install-time ( RO): 20241028T05:11:47Z
                                  VCPUs-number ( RO): 4
                             VCPUs-utilisation (MRO): 0: 0.115; 1: 0.106; 2: 0.111; 3: 0.107
                                    os-version (MRO): name: Ubuntu 24.04; uname: 6.8.0-54-generic; distro: Ubuntu
                            PV-drivers-version (MRO): major: 1; minor: 0; micro: 0; build: proto-0.4.0
            PV-drivers-up-to-date ( RO) [DEPRECATED]: true
                                        memory (MRO):
                                         disks (MRO):
                                          VBDs (SRO): f0602bf4-1f5f-12f1-957b-f6c99669d98c; de3047f9-b097-e345-3a8a-77094f5f8de7
                                      networks (MRO): 0/ip: 192.168.8.86; 0/ipv4/0: 192.168.8.86; 0/ipv6/0: fe80::dc3b:cff:fef0:d3ed
                           PV-drivers-detected ( RO): true
                                         other (MRO): platform-feature-xs_reset_watches: 1; platform-feature-multiprocessor-suspend: 1; has-vendor-device: 0; feature-vcpu-hotplug: 1; feature-suspend: 1; feature-reboot: 1; feature-poweroff: 1; feature-balloon: 1
                                          live ( RO): true
                    guest-metrics-last-updated ( RO): 20250320T19:19:18Z
                           can-use-hotplug-vbd ( RO): unspecified
                           can-use-hotplug-vif ( RO): unspecified
                      cooperative ( RO) [DEPRECATED]: true
                                          tags (SRW):
                                     appliance ( RW): <not in database>
                                        groups ( RW):
                             snapshot-schedule ( RW): <not in database>
                              is-vmss-snapshot ( RO): false
                                   start-delay ( RW): 0
                                shutdown-delay ( RW): 0
                                         order ( RW): 0
                                       version ( RO): 0
                                 generation-id ( RO):
                     hardware-platform-version ( RO): 0
                             has-vendor-device ( RW): false
                               requires-reboot ( RO): false
                               reference-label ( RO): ubuntu-22.04
                                  bios-strings (MRO): bios-vendor: Xen; bios-version: ; system-manufacturer: Xen; system-product-name: HVM domU; system-version: ; system-serial-number: ; baseboard-manufacturer: ; baseboard-product-name: ; baseboard-version: ; baseboard-serial-number: ; baseboard-asset-tag: ; baseboard-location-in-chassis: ; enclosure-asset-tag: ; hp-rombios: ; oem-1: Xen; oem-2: MS_VM_CERT/SHA1/bdbeb6e0a816d43fa6d3fe8aaef04c2bad9d3e3d
                             pending-guidances ( RO):
                                         vtpms ( RO):
                 pending-guidances-recommended ( RO):
                        pending-guidances-full ( RO):
        
        A 1 Reply Last reply Reply Quote 0
        • A Offline
          andyhhp Xen Guru @AlbertK
          last edited by

          @AlbertK Thanks. There's no nested-virt configured there.

          I have to admit this is looking more and more like buggy CPU. Memory corruption is a possibility, but this is a clearly corrupt field in the middle of otherwise sane-looking fields in the VMCB.

          Do you have any other identical systems? Can you swap this CPU out for another one to see what happens?

          A 1 Reply Last reply Reply Quote 0
          • A Offline
            AlbertK @andyhhp
            last edited by

            @andyhhp Unfortunately no, I do not have another machine to test out the CPU. I have ordered another set of 2x16GB of RAM to test if it is RAM issue.

            Will report back.

            1 Reply Last reply Reply Quote 0
            • J Offline
              joebeasley @AlbertK
              last edited by

              @AlbertK I had a similar issue where the whole server would just reboot randomly. Turned out to be an option in the bios called "cstates". It has something to do with processor power saving. I disabled any mention of cstates and have not had the reboot problems.

              A 1 Reply Last reply Reply Quote 0
              • A Offline
                AlbertK @joebeasley
                last edited by AlbertK

                @joebeasley Mine is more of one or more VM will auto reboot and sometime one VM will be not be accessible (cannot ssh or console from XO) (CPU 99%, no network or disk activity as seen in XO and need to force reboot). After that a few hours later the Host will reboot. This is happening every day now.

                I am seeing a lot of this in the host dmesg.

                [105679.203854] vif vif-6-0 vif6.0: Guest Rx stalled
                [105689.395996] vif vif-6-0 vif6.0: Guest Rx ready
                [105707.532509] vif vif-6-0 vif6.0: Guest Rx stalled
                [105717.555832] vif vif-6-0 vif6.0: Guest Rx ready
                [105744.154415] vif vif-6-0 vif6.0: Guest Rx stalled
                [105754.163666] vif vif-6-0 vif6.0: Guest Rx ready
                
                A 1 Reply Last reply Reply Quote 0
                • A Offline
                  AlbertK @AlbertK
                  last edited by

                  I have installed a fresh set of RAM and still the system crash randomly with some of the crashes with crash log but some does not.

                  This happen on a daily basis it is either the VM reboots or Host. I notice that with the crash log there is a consistent pattern of SVM error in CPU8 and once on CPU11.

                  I then tried to disable the CPU8 and CPU11 from the cpu pool. There is no reboot from VM or Host for the last 7 days. Any ideas on why?.

                  xl cpupool-cpu-remove 8,11
                  
                  A 1 Reply Last reply Reply Quote 0
                  • A Offline
                    andyhhp Xen Guru @AlbertK
                    last edited by

                    As I said before, this is looking like a buggy CPU, and you've proved it, given a week with no incident if CPU8 is excluded.

                    1 Reply Last reply Reply Quote 0
                    • olivierlambertO Offline
                      olivierlambert Vates 🪐 Co-Founder CEO
                      last edited by

                      Clearly, there's one or 2 damaged core(s). Likely faulty CPU I'm afraid 😞

                      1 Reply Last reply Reply Quote 0
                      • R Offline
                        Riven
                        last edited by

                        If you are not getting crashes on cores 0-5 (assuming they are in use by your VMs) then its unlikely a physical problem.

                        The Ryzen 3600 is only a 6 core CPU, "cores" 8 & 11 are the SMT (Hyperthreaded) versions of cores 2 & 5

                        You could also try turning SMT off

                        A 1 Reply Last reply Reply Quote 0
                        • A Offline
                          AlbertK @Riven
                          last edited by

                          @Riven,

                          What I am not sure is how Xen arrange the CPU, is it core first followed by the SMT/HyperThread Core? or is it alternating ie RealCore, HyperThread Core.

                          1 Reply Last reply Reply Quote 0

                          Hello! It looks like you're interested in this conversation, but you don't have an account yet.

                          Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.

                          With your input, this post could be even better 💗

                          Register Login
                          • First post
                            Last post