XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    XCP-ng 8.3 with VM crashing

    Scheduled Pinned Locked Moved Hardware
    16 Posts 5 Posters 2.3k Views 4 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • A Offline
      AlbertK @andyhhp
      last edited by

      It is a default install of the Ubuntu OS. I tried the commands below and it is negative.

      lsmod | grep kvm
      
      egrep -c '(vmx|svm)'  /proc/cpuinfo
      

      0

      A 1 Reply Last reply Reply Quote 0
      • A Offline
        andyhhp Xen Guru @AlbertK
        last edited by

        @AlbertK None of those commands are relevant in a Xen system. You want xe vm-param-list uuid=$VM

        A 1 Reply Last reply Reply Quote 0
        • A Offline
          AlbertK @andyhhp
          last edited by

          @andyhhp
          This the param list of one of the VM that is self rebooting.

          uuid ( RO)                                  : 1199c4b4-6072-7086-7286-7d7d1cad2c33
                                      name-label ( RW): K8s-node1
                                name-description ( RW):
                                    user-version ( RW): 1
                                   is-a-template ( RW): false
                             is-default-template ( RW): false
                                   is-a-snapshot ( RO): false
                                     snapshot-of ( RO): <not in database>
                                       snapshots ( RO):
                                   snapshot-time ( RO): 19700101T00:00:00Z
                                   snapshot-info ( RO):
                                          parent ( RO): <not in database>
                                        children ( RO):
                               is-control-domain ( RO): false
                                     power-state ( RO): running
                                   memory-actual ( RO): 4297039872
                                   memory-target ( RO): 4294967296
                                 memory-overhead ( RO): 39845888
                               memory-static-max ( RW): 4294967296
                              memory-dynamic-max ( RW): 4294967296
                              memory-dynamic-min ( RW): 4294967296
                               memory-static-min ( RW): 1073741824
                                suspend-VDI-uuid ( RW): <not in database>
                                 suspend-SR-uuid ( RW): <not in database>
                                    VCPUs-params (MRW):
                                       VCPUs-max ( RW): 4
                                VCPUs-at-startup ( RW): 4
                          actions-after-shutdown ( RW): Destroy
                        actions-after-softreboot ( RW): Soft reboot
                            actions-after-reboot ( RW): Restart
                             actions-after-crash ( RW): Restart
                                   console-uuids (SRO): 7c1c7058-8b18-06ca-60f5-9cbfedec2d11
                                             hvm ( RO): true
                                        platform (MRW): timeoffset: 0; nic_type: e1000; device-model: qemu-upstream-uefi; secureboot: false; vga: std; videoram: 8; viridian: false; device_id: 0001; nx: true; acpi: 1; apic: true; pae: true; hpet: true
                              allowed-operations (SRO): metadata_export; changing_VCPUs_live; changing_dynamic_range; migrate_send; pool_migrate; suspend; hard_reboot; hard_shutdown; clean_reboot; clean_shutdown; pause; checkpoint; snapshot
                              current-operations (SRO):
                              blocked-operations (MRW):
                             allowed-VBD-devices (SRO): 1; 2; 4; 5; 6; 7; 8; 9; 10; 11; 12; 13; 14; 15; 16; 17; 18; 19; 20; 21; 22; 23; 24; 25; 26; 27; 28; 29; 30; 31; 32; 33; 34; 35; 36; 37; 38; 39; 40; 41; 42; 43; 44; 45; 46; 47; 48; 49; 50; 51; 52; 53; 54; 55; 56; 57; 58; 59; 60; 61; 62; 63; 64; 65; 66; 67; 68; 69; 70; 71; 72; 73; 74; 75; 76; 77; 78; 79; 80; 81; 82; 83; 84; 85; 86; 87; 88; 89; 90; 91; 92; 93; 94; 95; 96; 97; 98; 99; 100; 101; 102; 103; 104; 105; 106; 107; 108; 109; 110; 111; 112; 113; 114; 115; 116; 117; 118; 119; 120; 121; 122; 123; 124; 125; 126; 127; 128; 129; 130; 131; 132; 133; 134; 135; 136; 137; 138; 139; 140; 141; 142; 143; 144; 145; 146; 147; 148; 149; 150; 151; 152; 153; 154; 155; 156; 157; 158; 159; 160; 161; 162; 163; 164; 165; 166; 167; 168; 169; 170; 171; 172; 173; 174; 175; 176; 177; 178; 179; 180; 181; 182; 183; 184; 185; 186; 187; 188; 189; 190; 191; 192; 193; 194; 195; 196; 197; 198; 199; 200; 201; 202; 203; 204; 205; 206; 207; 208; 209; 210; 211; 212; 213; 214; 215; 216; 217; 218; 219; 220; 221; 222; 223; 224; 225; 226; 227; 228; 229; 230; 231; 232; 233; 234; 235; 236; 237; 238; 239; 240; 241; 242; 243; 244; 245; 246; 247; 248; 249; 250; 251; 252; 253; 254
                             allowed-VIF-devices (SRO): 1; 2; 3; 4; 5; 6
                                  possible-hosts ( RO): f8cc6a6c-8ff4-4e3b-9f92-b5f62bef04ed
                                     domain-type ( RW): hvm
                             current-domain-type ( RO): hvm
                                 HVM-boot-policy ( RW): BIOS order
                                 HVM-boot-params (MRW): order: cdn; firmware: uefi
                           HVM-shadow-multiplier ( RW): 1.000
                                       PV-kernel ( RW):
                                      PV-ramdisk ( RW):
                                         PV-args ( RW):
                                  PV-legacy-args ( RW):
                                   PV-bootloader ( RW):
                              PV-bootloader-args ( RW):
                             last-boot-CPU-flags ( RO): vendor: AuthenticAMD; features: 178bfbff-f6f83203-2e500800-040001f3-0000000f-219c01a9-00400004-00000000-00101005-00000000-00000000-10000044-00000000-00000000-00000000-00000000-00000000-00000000-00000000-00000000-00000000-00000000
                                last-boot-record ( RO): '{"platformdata":{"timeoffset":"0","featureset":"178bfbff-f6f83203-2e500800-040001f3-0000000f-219c01a9-00400004-00000000-00101005-00000000-00000000-10000044-00000000-00000000-00000000-00000000-00000000-00000000-00000000-00000000-00000000-00000000","usb":"true","usb_tablet":"true","device-model":"qemu-upstream-uefi","secureboot":"false","vga":"std","videoram":"8","viridian":"false","device_id":"0001","nx":"true","acpi":"1","apic":"true","pae":"true","hpet":"true"},"xen_platform":[1,2],"pv_drivers_detected":true,"pci_power_mgmt":false,"pci_msitranslate":true,"qemu_vifs":[],"qemu_vbds":[],"suspend_memory_bytes":2149556224,"original_profile":"Qemu_upstream_uefi","profile":"Qemu_upstream_uefi","nested_virt":false,"nomigrate":false,"domain_config":["X86",{"misc_flags":[],"emulation_flags":["X86_EMU_LAPIC","X86_EMU_HPET","X86_EMU_PM","X86_EMU_RTC","X86_EMU_IOAPIC","X86_EMU_PIC","X86_EMU_VGA","X86_EMU_IOMMU","X86_EMU_PIT","X86_EMU_USE_PIRQ"]}],"last_start_time":1730178556.316762,"ty":["HVM",{"firmware":["Uefi",{"backend":"xapidb","on_boot":"Persist"}],"qemu_stubdom":false,"qemu_disk_cmdline":false,"boot_order":"cdn","pci_passthrough":false,"pci_emulations":[],"serial":"pty","acpi":true,"video":"Standard_VGA","video_mib":8,"timeoffset":"0","shadow_multiplier":1.0,"hap":true}],"build_info":{"has_hard_affinity":false,"priv":["BuildHVM",{"video_mib":8,"shadow_multiplier":1.0}],"vcpus":2,"kernel":"/usr/libexec/xen/boot/hvmloader","memory_target":2097152,"memory_max":2097152},"version":2}'
                                     resident-on ( RO): f8cc6a6c-8ff4-4e3b-9f92-b5f62bef04ed
                                        affinity ( RW): <not in database>
                                    other-config (MRW): auto_poweron: true; xo:1199c4b4: {"creation":{"date":"2024-10-28T05:11:47.183Z","template":"df1a0e64-3799-482b-aa9f-1ed713c7dac5","user":"98707372-26e6-4877-8a14-85064b5f853a"}}; base_template_name: Ubuntu Jammy Jellyfish 22.04; import_task: OpaqueRef:807d2f23-5607-4fc9-2e3f-a3e9f055e800; mac_seed: 38c38661-6a24-4b1b-63e2-86c3ff2035d3; linux_template: true; install-methods: cdrom,nfs,http,ftp
                                          dom-id ( RO): 2
                                 recommendations ( RO): <restrictions><restriction field="memory-static-max" max="1649267441664"/><restriction field="vcpus-max" max="64"/><restriction field="has-vendor-device" value="false"/><restriction field="allow-gpu-passthrough" value="1"/><restriction field="allow-vgpu" value="1"/><restriction field="allow-network-sriov" value="1"/><restriction field="supports-bios" value="yes"/><restriction field="supports-uefi" value="yes"/><restriction field="supports-secure-boot" value="yes"/><restriction max="255" property="number-of-vbds"/><restriction max="7" property="number-of-vifs"/></restrictions>
                                   xenstore-data (MRW): vm-data/mmio-hole-size: 268435456; vm-data:
                      ha-always-run ( RW) [DEPRECATED]: false
                             ha-restart-priority ( RW):
                                           blobs ( RO):
                                      start-time ( RO): 20250320T19:19:07Z
                                    install-time ( RO): 20241028T05:11:47Z
                                    VCPUs-number ( RO): 4
                               VCPUs-utilisation (MRO): 0: 0.115; 1: 0.106; 2: 0.111; 3: 0.107
                                      os-version (MRO): name: Ubuntu 24.04; uname: 6.8.0-54-generic; distro: Ubuntu
                              PV-drivers-version (MRO): major: 1; minor: 0; micro: 0; build: proto-0.4.0
              PV-drivers-up-to-date ( RO) [DEPRECATED]: true
                                          memory (MRO):
                                           disks (MRO):
                                            VBDs (SRO): f0602bf4-1f5f-12f1-957b-f6c99669d98c; de3047f9-b097-e345-3a8a-77094f5f8de7
                                        networks (MRO): 0/ip: 192.168.8.86; 0/ipv4/0: 192.168.8.86; 0/ipv6/0: fe80::dc3b:cff:fef0:d3ed
                             PV-drivers-detected ( RO): true
                                           other (MRO): platform-feature-xs_reset_watches: 1; platform-feature-multiprocessor-suspend: 1; has-vendor-device: 0; feature-vcpu-hotplug: 1; feature-suspend: 1; feature-reboot: 1; feature-poweroff: 1; feature-balloon: 1
                                            live ( RO): true
                      guest-metrics-last-updated ( RO): 20250320T19:19:18Z
                             can-use-hotplug-vbd ( RO): unspecified
                             can-use-hotplug-vif ( RO): unspecified
                        cooperative ( RO) [DEPRECATED]: true
                                            tags (SRW):
                                       appliance ( RW): <not in database>
                                          groups ( RW):
                               snapshot-schedule ( RW): <not in database>
                                is-vmss-snapshot ( RO): false
                                     start-delay ( RW): 0
                                  shutdown-delay ( RW): 0
                                           order ( RW): 0
                                         version ( RO): 0
                                   generation-id ( RO):
                       hardware-platform-version ( RO): 0
                               has-vendor-device ( RW): false
                                 requires-reboot ( RO): false
                                 reference-label ( RO): ubuntu-22.04
                                    bios-strings (MRO): bios-vendor: Xen; bios-version: ; system-manufacturer: Xen; system-product-name: HVM domU; system-version: ; system-serial-number: ; baseboard-manufacturer: ; baseboard-product-name: ; baseboard-version: ; baseboard-serial-number: ; baseboard-asset-tag: ; baseboard-location-in-chassis: ; enclosure-asset-tag: ; hp-rombios: ; oem-1: Xen; oem-2: MS_VM_CERT/SHA1/bdbeb6e0a816d43fa6d3fe8aaef04c2bad9d3e3d
                               pending-guidances ( RO):
                                           vtpms ( RO):
                   pending-guidances-recommended ( RO):
                          pending-guidances-full ( RO):
          
          A 1 Reply Last reply Reply Quote 0
          • A Offline
            andyhhp Xen Guru @AlbertK
            last edited by

            @AlbertK Thanks. There's no nested-virt configured there.

            I have to admit this is looking more and more like buggy CPU. Memory corruption is a possibility, but this is a clearly corrupt field in the middle of otherwise sane-looking fields in the VMCB.

            Do you have any other identical systems? Can you swap this CPU out for another one to see what happens?

            A 1 Reply Last reply Reply Quote 0
            • A Offline
              AlbertK @andyhhp
              last edited by

              @andyhhp Unfortunately no, I do not have another machine to test out the CPU. I have ordered another set of 2x16GB of RAM to test if it is RAM issue.

              Will report back.

              1 Reply Last reply Reply Quote 0
              • J Offline
                joebeasley @AlbertK
                last edited by

                @AlbertK I had a similar issue where the whole server would just reboot randomly. Turned out to be an option in the bios called "cstates". It has something to do with processor power saving. I disabled any mention of cstates and have not had the reboot problems.

                A 1 Reply Last reply Reply Quote 0
                • A Offline
                  AlbertK @joebeasley
                  last edited by AlbertK

                  @joebeasley Mine is more of one or more VM will auto reboot and sometime one VM will be not be accessible (cannot ssh or console from XO) (CPU 99%, no network or disk activity as seen in XO and need to force reboot). After that a few hours later the Host will reboot. This is happening every day now.

                  I am seeing a lot of this in the host dmesg.

                  [105679.203854] vif vif-6-0 vif6.0: Guest Rx stalled
                  [105689.395996] vif vif-6-0 vif6.0: Guest Rx ready
                  [105707.532509] vif vif-6-0 vif6.0: Guest Rx stalled
                  [105717.555832] vif vif-6-0 vif6.0: Guest Rx ready
                  [105744.154415] vif vif-6-0 vif6.0: Guest Rx stalled
                  [105754.163666] vif vif-6-0 vif6.0: Guest Rx ready
                  
                  A 1 Reply Last reply Reply Quote 0
                  • A Offline
                    AlbertK @AlbertK
                    last edited by

                    I have installed a fresh set of RAM and still the system crash randomly with some of the crashes with crash log but some does not.

                    This happen on a daily basis it is either the VM reboots or Host. I notice that with the crash log there is a consistent pattern of SVM error in CPU8 and once on CPU11.

                    I then tried to disable the CPU8 and CPU11 from the cpu pool. There is no reboot from VM or Host for the last 7 days. Any ideas on why?.

                    xl cpupool-cpu-remove 8,11
                    
                    A 1 Reply Last reply Reply Quote 0
                    • A Offline
                      andyhhp Xen Guru @AlbertK
                      last edited by

                      As I said before, this is looking like a buggy CPU, and you've proved it, given a week with no incident if CPU8 is excluded.

                      1 Reply Last reply Reply Quote 0
                      • olivierlambertO Offline
                        olivierlambert Vates 🪐 Co-Founder CEO
                        last edited by

                        Clearly, there's one or 2 damaged core(s). Likely faulty CPU I'm afraid 😞

                        1 Reply Last reply Reply Quote 0
                        • R Offline
                          Riven
                          last edited by

                          If you are not getting crashes on cores 0-5 (assuming they are in use by your VMs) then its unlikely a physical problem.

                          The Ryzen 3600 is only a 6 core CPU, "cores" 8 & 11 are the SMT (Hyperthreaded) versions of cores 2 & 5

                          You could also try turning SMT off

                          A 1 Reply Last reply Reply Quote 0
                          • A Offline
                            AlbertK @Riven
                            last edited by

                            @Riven,

                            What I am not sure is how Xen arrange the CPU, is it core first followed by the SMT/HyperThread Core? or is it alternating ie RealCore, HyperThread Core.

                            1 Reply Last reply Reply Quote 0

                            Hello! It looks like you're interested in this conversation, but you don't have an account yet.

                            Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.

                            With your input, this post could be even better 💗

                            Register Login
                            • First post
                              Last post