XCP-ng 8.3 with VM crashing
-
It is a default install of the Ubuntu OS. I tried the commands below and it is negative.
lsmod | grep kvmegrep -c '(vmx|svm)' /proc/cpuinfo0
-
@AlbertK None of those commands are relevant in a Xen system. You want
xe vm-param-list uuid=$VM -
@andyhhp
This the param list of one of the VM that is self rebooting.uuid ( RO) : 1199c4b4-6072-7086-7286-7d7d1cad2c33 name-label ( RW): K8s-node1 name-description ( RW): user-version ( RW): 1 is-a-template ( RW): false is-default-template ( RW): false is-a-snapshot ( RO): false snapshot-of ( RO): <not in database> snapshots ( RO): snapshot-time ( RO): 19700101T00:00:00Z snapshot-info ( RO): parent ( RO): <not in database> children ( RO): is-control-domain ( RO): false power-state ( RO): running memory-actual ( RO): 4297039872 memory-target ( RO): 4294967296 memory-overhead ( RO): 39845888 memory-static-max ( RW): 4294967296 memory-dynamic-max ( RW): 4294967296 memory-dynamic-min ( RW): 4294967296 memory-static-min ( RW): 1073741824 suspend-VDI-uuid ( RW): <not in database> suspend-SR-uuid ( RW): <not in database> VCPUs-params (MRW): VCPUs-max ( RW): 4 VCPUs-at-startup ( RW): 4 actions-after-shutdown ( RW): Destroy actions-after-softreboot ( RW): Soft reboot actions-after-reboot ( RW): Restart actions-after-crash ( RW): Restart console-uuids (SRO): 7c1c7058-8b18-06ca-60f5-9cbfedec2d11 hvm ( RO): true platform (MRW): timeoffset: 0; nic_type: e1000; device-model: qemu-upstream-uefi; secureboot: false; vga: std; videoram: 8; viridian: false; device_id: 0001; nx: true; acpi: 1; apic: true; pae: true; hpet: true allowed-operations (SRO): metadata_export; changing_VCPUs_live; changing_dynamic_range; migrate_send; pool_migrate; suspend; hard_reboot; hard_shutdown; clean_reboot; clean_shutdown; pause; checkpoint; snapshot current-operations (SRO): blocked-operations (MRW): allowed-VBD-devices (SRO): 1; 2; 4; 5; 6; 7; 8; 9; 10; 11; 12; 13; 14; 15; 16; 17; 18; 19; 20; 21; 22; 23; 24; 25; 26; 27; 28; 29; 30; 31; 32; 33; 34; 35; 36; 37; 38; 39; 40; 41; 42; 43; 44; 45; 46; 47; 48; 49; 50; 51; 52; 53; 54; 55; 56; 57; 58; 59; 60; 61; 62; 63; 64; 65; 66; 67; 68; 69; 70; 71; 72; 73; 74; 75; 76; 77; 78; 79; 80; 81; 82; 83; 84; 85; 86; 87; 88; 89; 90; 91; 92; 93; 94; 95; 96; 97; 98; 99; 100; 101; 102; 103; 104; 105; 106; 107; 108; 109; 110; 111; 112; 113; 114; 115; 116; 117; 118; 119; 120; 121; 122; 123; 124; 125; 126; 127; 128; 129; 130; 131; 132; 133; 134; 135; 136; 137; 138; 139; 140; 141; 142; 143; 144; 145; 146; 147; 148; 149; 150; 151; 152; 153; 154; 155; 156; 157; 158; 159; 160; 161; 162; 163; 164; 165; 166; 167; 168; 169; 170; 171; 172; 173; 174; 175; 176; 177; 178; 179; 180; 181; 182; 183; 184; 185; 186; 187; 188; 189; 190; 191; 192; 193; 194; 195; 196; 197; 198; 199; 200; 201; 202; 203; 204; 205; 206; 207; 208; 209; 210; 211; 212; 213; 214; 215; 216; 217; 218; 219; 220; 221; 222; 223; 224; 225; 226; 227; 228; 229; 230; 231; 232; 233; 234; 235; 236; 237; 238; 239; 240; 241; 242; 243; 244; 245; 246; 247; 248; 249; 250; 251; 252; 253; 254 allowed-VIF-devices (SRO): 1; 2; 3; 4; 5; 6 possible-hosts ( RO): f8cc6a6c-8ff4-4e3b-9f92-b5f62bef04ed domain-type ( RW): hvm current-domain-type ( RO): hvm HVM-boot-policy ( RW): BIOS order HVM-boot-params (MRW): order: cdn; firmware: uefi HVM-shadow-multiplier ( RW): 1.000 PV-kernel ( RW): PV-ramdisk ( RW): PV-args ( RW): PV-legacy-args ( RW): PV-bootloader ( RW): PV-bootloader-args ( RW): last-boot-CPU-flags ( RO): vendor: AuthenticAMD; features: 178bfbff-f6f83203-2e500800-040001f3-0000000f-219c01a9-00400004-00000000-00101005-00000000-00000000-10000044-00000000-00000000-00000000-00000000-00000000-00000000-00000000-00000000-00000000-00000000 last-boot-record ( RO): '{"platformdata":{"timeoffset":"0","featureset":"178bfbff-f6f83203-2e500800-040001f3-0000000f-219c01a9-00400004-00000000-00101005-00000000-00000000-10000044-00000000-00000000-00000000-00000000-00000000-00000000-00000000-00000000-00000000-00000000","usb":"true","usb_tablet":"true","device-model":"qemu-upstream-uefi","secureboot":"false","vga":"std","videoram":"8","viridian":"false","device_id":"0001","nx":"true","acpi":"1","apic":"true","pae":"true","hpet":"true"},"xen_platform":[1,2],"pv_drivers_detected":true,"pci_power_mgmt":false,"pci_msitranslate":true,"qemu_vifs":[],"qemu_vbds":[],"suspend_memory_bytes":2149556224,"original_profile":"Qemu_upstream_uefi","profile":"Qemu_upstream_uefi","nested_virt":false,"nomigrate":false,"domain_config":["X86",{"misc_flags":[],"emulation_flags":["X86_EMU_LAPIC","X86_EMU_HPET","X86_EMU_PM","X86_EMU_RTC","X86_EMU_IOAPIC","X86_EMU_PIC","X86_EMU_VGA","X86_EMU_IOMMU","X86_EMU_PIT","X86_EMU_USE_PIRQ"]}],"last_start_time":1730178556.316762,"ty":["HVM",{"firmware":["Uefi",{"backend":"xapidb","on_boot":"Persist"}],"qemu_stubdom":false,"qemu_disk_cmdline":false,"boot_order":"cdn","pci_passthrough":false,"pci_emulations":[],"serial":"pty","acpi":true,"video":"Standard_VGA","video_mib":8,"timeoffset":"0","shadow_multiplier":1.0,"hap":true}],"build_info":{"has_hard_affinity":false,"priv":["BuildHVM",{"video_mib":8,"shadow_multiplier":1.0}],"vcpus":2,"kernel":"/usr/libexec/xen/boot/hvmloader","memory_target":2097152,"memory_max":2097152},"version":2}' resident-on ( RO): f8cc6a6c-8ff4-4e3b-9f92-b5f62bef04ed affinity ( RW): <not in database> other-config (MRW): auto_poweron: true; xo:1199c4b4: {"creation":{"date":"2024-10-28T05:11:47.183Z","template":"df1a0e64-3799-482b-aa9f-1ed713c7dac5","user":"98707372-26e6-4877-8a14-85064b5f853a"}}; base_template_name: Ubuntu Jammy Jellyfish 22.04; import_task: OpaqueRef:807d2f23-5607-4fc9-2e3f-a3e9f055e800; mac_seed: 38c38661-6a24-4b1b-63e2-86c3ff2035d3; linux_template: true; install-methods: cdrom,nfs,http,ftp dom-id ( RO): 2 recommendations ( RO): <restrictions><restriction field="memory-static-max" max="1649267441664"/><restriction field="vcpus-max" max="64"/><restriction field="has-vendor-device" value="false"/><restriction field="allow-gpu-passthrough" value="1"/><restriction field="allow-vgpu" value="1"/><restriction field="allow-network-sriov" value="1"/><restriction field="supports-bios" value="yes"/><restriction field="supports-uefi" value="yes"/><restriction field="supports-secure-boot" value="yes"/><restriction max="255" property="number-of-vbds"/><restriction max="7" property="number-of-vifs"/></restrictions> xenstore-data (MRW): vm-data/mmio-hole-size: 268435456; vm-data: ha-always-run ( RW) [DEPRECATED]: false ha-restart-priority ( RW): blobs ( RO): start-time ( RO): 20250320T19:19:07Z install-time ( RO): 20241028T05:11:47Z VCPUs-number ( RO): 4 VCPUs-utilisation (MRO): 0: 0.115; 1: 0.106; 2: 0.111; 3: 0.107 os-version (MRO): name: Ubuntu 24.04; uname: 6.8.0-54-generic; distro: Ubuntu PV-drivers-version (MRO): major: 1; minor: 0; micro: 0; build: proto-0.4.0 PV-drivers-up-to-date ( RO) [DEPRECATED]: true memory (MRO): disks (MRO): VBDs (SRO): f0602bf4-1f5f-12f1-957b-f6c99669d98c; de3047f9-b097-e345-3a8a-77094f5f8de7 networks (MRO): 0/ip: 192.168.8.86; 0/ipv4/0: 192.168.8.86; 0/ipv6/0: fe80::dc3b:cff:fef0:d3ed PV-drivers-detected ( RO): true other (MRO): platform-feature-xs_reset_watches: 1; platform-feature-multiprocessor-suspend: 1; has-vendor-device: 0; feature-vcpu-hotplug: 1; feature-suspend: 1; feature-reboot: 1; feature-poweroff: 1; feature-balloon: 1 live ( RO): true guest-metrics-last-updated ( RO): 20250320T19:19:18Z can-use-hotplug-vbd ( RO): unspecified can-use-hotplug-vif ( RO): unspecified cooperative ( RO) [DEPRECATED]: true tags (SRW): appliance ( RW): <not in database> groups ( RW): snapshot-schedule ( RW): <not in database> is-vmss-snapshot ( RO): false start-delay ( RW): 0 shutdown-delay ( RW): 0 order ( RW): 0 version ( RO): 0 generation-id ( RO): hardware-platform-version ( RO): 0 has-vendor-device ( RW): false requires-reboot ( RO): false reference-label ( RO): ubuntu-22.04 bios-strings (MRO): bios-vendor: Xen; bios-version: ; system-manufacturer: Xen; system-product-name: HVM domU; system-version: ; system-serial-number: ; baseboard-manufacturer: ; baseboard-product-name: ; baseboard-version: ; baseboard-serial-number: ; baseboard-asset-tag: ; baseboard-location-in-chassis: ; enclosure-asset-tag: ; hp-rombios: ; oem-1: Xen; oem-2: MS_VM_CERT/SHA1/bdbeb6e0a816d43fa6d3fe8aaef04c2bad9d3e3d pending-guidances ( RO): vtpms ( RO): pending-guidances-recommended ( RO): pending-guidances-full ( RO): -
@AlbertK Thanks. There's no nested-virt configured there.
I have to admit this is looking more and more like buggy CPU. Memory corruption is a possibility, but this is a clearly corrupt field in the middle of otherwise sane-looking fields in the VMCB.
Do you have any other identical systems? Can you swap this CPU out for another one to see what happens?
-
@andyhhp Unfortunately no, I do not have another machine to test out the CPU. I have ordered another set of 2x16GB of RAM to test if it is RAM issue.
Will report back.
-
@AlbertK I had a similar issue where the whole server would just reboot randomly. Turned out to be an option in the bios called "cstates". It has something to do with processor power saving. I disabled any mention of cstates and have not had the reboot problems.
-
@joebeasley Mine is more of one or more VM will auto reboot and sometime one VM will be not be accessible (cannot ssh or console from XO) (CPU 99%, no network or disk activity as seen in XO and need to force reboot). After that a few hours later the Host will reboot. This is happening every day now.
I am seeing a lot of this in the host dmesg.
[105679.203854] vif vif-6-0 vif6.0: Guest Rx stalled [105689.395996] vif vif-6-0 vif6.0: Guest Rx ready [105707.532509] vif vif-6-0 vif6.0: Guest Rx stalled [105717.555832] vif vif-6-0 vif6.0: Guest Rx ready [105744.154415] vif vif-6-0 vif6.0: Guest Rx stalled [105754.163666] vif vif-6-0 vif6.0: Guest Rx ready -
I have installed a fresh set of RAM and still the system crash randomly with some of the crashes with crash log but some does not.
This happen on a daily basis it is either the VM reboots or Host. I notice that with the crash log there is a consistent pattern of SVM error in CPU8 and once on CPU11.
I then tried to disable the CPU8 and CPU11 from the cpu pool. There is no reboot from VM or Host for the last 7 days. Any ideas on why?.
xl cpupool-cpu-remove 8,11 -
As I said before, this is looking like a buggy CPU, and you've proved it, given a week with no incident if CPU8 is excluded.
-
Clearly, there's one or 2 damaged core(s). Likely faulty CPU I'm afraid

-
If you are not getting crashes on cores 0-5 (assuming they are in use by your VMs) then its unlikely a physical problem.
The Ryzen 3600 is only a 6 core CPU, "cores" 8 & 11 are the SMT (Hyperthreaded) versions of cores 2 & 5
You could also try turning SMT off
-
What I am not sure is how Xen arrange the CPU, is it core first followed by the SMT/HyperThread Core? or is it alternating ie RealCore, HyperThread Core.
Hello! It looks like you're interested in this conversation, but you don't have an account yet.
Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.
With your input, this post could be even better 💗
Register Login