@dinhngtu we couldn't delay any longer and had to deploy the cluster so we swapped the drives with some CD6-R instead. We'll most likely re-explore this later as we have 3 other nodes to deploy soon-ish, and I'll get some logs then.

Posts
-
RE: Kioxia CM7 PCIe pass-through crash
-
RE: Native Ceph RBD SM driver for XCP-ng
@olivierlambert you basically replied just after that I noticed that and deleted my message...
-
RE: Kioxia CM7 PCIe pass-through crash
@dinhngtu The only output in
hypervisor.log
file is what I sent earlier.Here is
daemon.log
:Oct 10 16:30:01 lab-pprod-xen03 systemd[1]: Started Session c18 of user root. Oct 10 16:30:01 lab-pprod-xen03 systemd[1]: Starting Session c18 of user root. Oct 10 16:30:01 lab-pprod-xen03 systemd[1]: Started Session c19 of user root. Oct 10 16:30:01 lab-pprod-xen03 systemd[1]: Starting Session c19 of user root. Oct 10 16:30:13 lab-pprod-xen03 tapdisk[20267]: received 'sring disconnect' message (uuid = 0) Oct 10 16:30:13 lab-pprod-xen03 tapdisk[20267]: disconnecting domid=6, devid=768 Oct 10 16:30:13 lab-pprod-xen03 tapdisk[20267]: sending 'sring disconnect rsp' message (uuid = 0) Oct 10 16:30:13 lab-pprod-xen03 systemd[1]: Stopping transient unit for varstored-6... Oct 10 16:30:13 lab-pprod-xen03 systemd[1]: Stopped transient unit for varstored-6. Oct 10 16:30:13 lab-pprod-xen03 qemu-dm-6[20398]: qemu-dm-6: terminating on signal 15 from pid 2169 (/usr/sbin/xenopsd-xc) Oct 10 16:30:14 lab-pprod-xen03 /opt/xensource/libexec/xcp-clipboardd[20392]: poll failed because revents=0x11 (qemu socket) Oct 10 16:30:14 lab-pprod-xen03 ovs-ofctl: ovs|00001|ofp_port|WARN|Negative value -1 is not a valid port number. Oct 10 16:30:14 lab-pprod-xen03 tapdisk[20267]: received 'close' message (uuid = 0) Oct 10 16:30:14 lab-pprod-xen03 tapdisk[20267]: nbd: NBD server pause(0x198d410) Oct 10 16:30:14 lab-pprod-xen03 tapdisk[20267]: nbd: NBD server pause(0x198d610) Oct 10 16:30:14 lab-pprod-xen03 ovs-vsctl: ovs|00001|vsctl|INFO|Called as ovs-vsctl --timeout=30 -- --if-exists del-port vif6.0 Oct 10 16:30:14 lab-pprod-xen03 ovs-ofctl: ovs|00001|ofp_port|WARN|Negative value -1 is not a valid port number. Oct 10 16:30:14 lab-pprod-xen03 ovs-vsctl: ovs|00001|vsctl|INFO|Called as ovs-vsctl --timeout=30 -- --if-exists del-port vif6.0 Oct 10 16:30:14 lab-pprod-xen03 ovs-ofctl: ovs|00001|ofp_port|WARN|Negative value -1 is not a valid port number. Oct 10 16:30:14 lab-pprod-xen03 ovs-vsctl: ovs|00001|vsctl|INFO|Called as ovs-vsctl --timeout=30 -- --if-exists del-port vif6.1 Oct 10 16:30:14 lab-pprod-xen03 tapdisk[20267]: nbd: NBD server free(0x198d410) Oct 10 16:30:14 lab-pprod-xen03 tapdisk[20267]: nbd: NBD server free(0x198d610) Oct 10 16:30:14 lab-pprod-xen03 tapdisk[20267]: gaps written/skipped: 444/0 Oct 10 16:30:14 lab-pprod-xen03 tapdisk[20267]: /var/run/sr-mount/5bd2dee1-cfb7-be70-0326-3f9070c4ca2d/721646c6-7a3f-4909-bde8-70dac75f5361.vhd: b: 25600, a: 2686, f: 2658, n: 11023552 Oct 10 16:30:14 lab-pprod-xen03 tapdisk[20267]: closed image /var/run/sr-mount/5bd2dee1-cfb7-be70-0326-3f9070c4ca2d/721646c6-7a3f-4909-bde8-70dac75f5361.vhd (0 users, state: 0x00000000, ty$ Oct 10 16:30:14 lab-pprod-xen03 tapdisk[20267]: sending 'close response' message (uuid = 0) Oct 10 16:30:14 lab-pprod-xen03 tapdisk[20267]: received 'detach' message (uuid = 0) Oct 10 16:30:14 lab-pprod-xen03 tapdisk[20267]: sending 'detach response' message (uuid = 0) Oct 10 16:30:14 lab-pprod-xen03 tapdisk[20267]: tapdisk-log: closing after 0 errors Oct 10 16:30:14 lab-pprod-xen03 tapdisk[20267]: tapdisk-syslog: 32 messages, 2739 bytes, xmits: 33, failed: 0, dropped: 0 Oct 10 16:30:14 lab-pprod-xen03 tapdisk[20267]: tapdisk-control: draining 1 connections Oct 10 16:30:14 lab-pprod-xen03 tapdisk[20267]: tapdisk-control: done Oct 10 16:30:16 lab-pprod-xen03 tapback[20277]: backend.c:1246 domain removed, exit Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: Command line: -controloutfd 8 -controlinfd 9 -mode hvm_build -image /usr/libexec/xen/boot/hvmloader -domid 7 -store_port 5 -store_d$ Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: Domain Properties: Type HVM, hap 1 Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: Determined the following parameters from xenstore: Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: vcpu/number:4 vcpu/weight:256 vcpu/cap:0 Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: nx: 1, pae 1, cores-per-socket 0, x86-fip-width 0, nested 0 Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: apic: 1 acpi: 1 acpi_s4: 0 acpi_s3: 0 tsc_mode: 0 hpet: 1 Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: nomigrate 0, timeoffset 0 mmio_hole_size 0 Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: viridian: 0, time_ref_count: 0, reference_tsc: 0 hcall_remote_tlb_flush: 0 apic_assist: 0 crash_ctl: 0 stimer: 0 hcall_ipi: 0 Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: vcpu/0/affinity:1111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111$ Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: vcpu/1/affinity:1111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111$ Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: vcpu/2/affinity:1111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111$ Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: vcpu/3/affinity:1111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111$ Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: domainbuilder: detail: xc_dom_allocate: cmdline="", features="" Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: domainbuilder: detail: xc_dom_kernel_file: filename="/usr/libexec/xen/boot/hvmloader" Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: domainbuilder: detail: xc_dom_malloc_filemap : 631 kB Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: domainbuilder: detail: xc_dom_module_file: filename="/usr/share/ipxe/ipxe.bin" Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: domainbuilder: detail: xc_dom_malloc_filemap : 132 kB Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: domainbuilder: detail: xc_dom_boot_xen_init: ver 4.17, caps xen-3.0-x86_64 hvm-3.0-x86_32 hvm-3.0-x86_32p hvm-3.0-x86_64 Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: domainbuilder: detail: xc_dom_parse_image: called Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: domainbuilder: detail: xc_dom_find_loader: trying multiboot-binary loader ... Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: domainbuilder: detail: loader probe failed Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: domainbuilder: detail: xc_dom_find_loader: trying HVM-generic loader ... Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: domainbuilder: detail: loader probe OK Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: xc: detail: ELF: phdr: paddr=0x100000 memsz=0x57e24 Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: xc: detail: ELF: memory: 0x100000 -> 0x157e24 Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: domainbuilder: detail: xc_dom_compat_check: supported guest type: xen-3.0-x86_64 Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: domainbuilder: detail: xc_dom_compat_check: supported guest type: hvm-3.0-x86_32 <= matches Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: domainbuilder: detail: xc_dom_compat_check: supported guest type: hvm-3.0-x86_32p Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: domainbuilder: detail: xc_dom_compat_check: supported guest type: hvm-3.0-x86_64 Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: Getting RMRRs for device '0000:f1:00.0' Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: Getting total MMIO space occupied for device '0000:f1:00.0' Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: Getting RMRRs for device '0000:f3:00.0' Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: Getting total MMIO space occupied for device '0000:f3:00.0' Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: Getting RMRRs for device '0000:21:00.0' Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: Getting total MMIO space occupied for device '0000:f3:00.0' Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: Getting RMRRs for device '0000:21:00.0' Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: Getting total MMIO space occupied for device '0000:21:00.0' Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: Getting RMRRs for device '0000:64:00.0' Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: Getting total MMIO space occupied for device '0000:64:00.0' Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: Getting RMRRs for device '0000:63:00.0' Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: Getting total MMIO space occupied for device '0000:63:00.0' Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: Getting RMRRs for device '0000:23:00.0' Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: Getting total MMIO space occupied for device '0000:23:00.0' Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: Getting RMRRs for device '0000:22:00.0' Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: Getting total MMIO space occupied for device '0000:22:00.0' Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: Calculated provisional MMIO hole size as 0x20000000 Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: Loaded OVMF from /usr/share/edk2/OVMF-release.fd Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: domainbuilder: detail: xc_dom_mem_init: mem 8184 MB, pages 0x1ff800 pages, 4k each Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: domainbuilder: detail: xc_dom_mem_init: 0x1ff800 pages Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: domainbuilder: detail: xc_dom_boot_mem_init: called Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: domainbuilder: detail: range: start=0x0 end=0xe0000000 Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: domainbuilder: detail: range: start=0x100000000 end=0x21f800000 Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: xc: detail: PHYSICAL MEMORY ALLOCATION: Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: xc: detail: 4KB PAGES: 0x0000000000000200 Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: xc: detail: 2MB PAGES: 0x00000000000003fb Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: xc: detail: 1GB PAGES: 0x0000000000000006 Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: Final lower MMIO hole size is 0x20000000 Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: domainbuilder: detail: xc_dom_build_image: called Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: domainbuilder: detail: xc_dom_pfn_to_ptr_retcount: domU mapping: pfn 0x100+0x58 at 0x7f6d1170f000 Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: domainbuilder: detail: xc_dom_alloc_segment: kernel : 0x100000 -> 0x157e24 (pfn 0x100 + 0x58 pages) Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: xc: detail: ELF: phdr 0 at 0x7f6d0fb1e000 -> 0x7f6d0fb6f200 Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: domainbuilder: detail: xc_dom_pfn_to_ptr_retcount: domU mapping: pfn 0x158+0x200 at 0x7f6d0f976000 Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: domainbuilder: detail: xc_dom_alloc_segment: System Firmware module : 0x158000 -> 0x358000 (pfn 0x158 + 0x200 pages) Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: domainbuilder: detail: xc_dom_pfn_to_ptr_retcount: domU mapping: pfn 0x358+0x22 at 0x7f6d116ed000 Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: domainbuilder: detail: xc_dom_alloc_segment: module0 : 0x358000 -> 0x379200 (pfn 0x358 + 0x22 pages) Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: domainbuilder: detail: xc_dom_pfn_to_ptr_retcount: domU mapping: pfn 0x37a+0x1 at 0x7f6d118cd000 Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: domainbuilder: detail: xc_dom_alloc_segment: HVM start info : 0x37a000 -> 0x37a878 (pfn 0x37a + 0x1 pages) Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: domainbuilder: detail: xc_dom_build_image : virt_alloc_end : 0x37b000 Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: domainbuilder: detail: xc_dom_build_image : virt_pgtab_end : 0x0 Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: domainbuilder: detail: xc_dom_boot_image: called Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: domainbuilder: detail: domain builder memory footprint Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: domainbuilder: detail: allocated Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: domainbuilder: detail: malloc : 18525 bytes Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: domainbuilder: detail: anon mmap : 0 bytes Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: domainbuilder: detail: mapped Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: domainbuilder: detail: file mmap : 764 kB Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: domainbuilder: detail: domU mmap : 2540 kB Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: domainbuilder: detail: Adding module 0 guest_addr 358000 len 135680 Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: domainbuilder: detail: vcpu_hvm: called Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: domainbuilder: detail: xc_dom_set_gnttab_entry: d7 gnt[0] -> d0 0xfefff Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: domainbuilder: detail: xc_dom_set_gnttab_entry: d7 gnt[1] -> d0 0xfeffc Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: Parsing '178bfbff-f6fa3203-2e500800-040001f3-0000000f-219c07a9-0040060c-00000000-311ed005-00000010-00000000-18000064-00000000-00000$ Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: domainbuilder: detail: xc_dom_release: called Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: Writing to control: 'result:1044476 1044479#012' Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: domainbuilder: detail: xc_dom_release: called Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: Writing to control: 'result:1044476 1044479#012' Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: All done Oct 10 16:30:17 lab-pprod-xen03 ovs-vsctl: ovs|00001|db_ctl_base|ERR|no row "vif7.0" in table Interface Oct 10 16:30:17 lab-pprod-xen03 ovs-vsctl: ovs|00001|db_ctl_base|ERR|no row "vif7.1" in table Interface Oct 10 16:30:17 lab-pprod-xen03 ovs-vsctl: ovs|00001|vsctl|INFO|Called as ovs-vsctl --timeout=30 -- --if-exists del-port vif7.0 Oct 10 16:30:17 lab-pprod-xen03 ovs-vsctl: ovs|00001|vsctl|INFO|Called as ovs-vsctl --timeout=30 -- --if-exists del-port vif7.1 Oct 10 16:30:17 lab-pprod-xen03 ovs-vsctl: ovs|00001|vsctl|INFO|Called as ovs-vsctl --timeout=30 add-port xapi0 vif7.0 -- set interface vif7.0 "external-ids:\"xs-vm-uuid\"=\"ab6fa81f-59d2-$ Oct 10 16:30:17 lab-pprod-xen03 ovs-vsctl: ovs|00001|vsctl|INFO|Called as ovs-vsctl --timeout=30 add-port xapi3 vif7.1 -- set interface vif7.1 "external-ids:\"xs-vm-uuid\"=\"ab6fa81f-59d2-$ Oct 10 16:30:18 lab-pprod-xen03 tapdisk[48097]: tapdisk-control: init, 10 x 4k buffers Oct 10 16:30:18 lab-pprod-xen03 tapdisk[48097]: I/O queue driver: lio Oct 10 16:30:18 lab-pprod-xen03 tapdisk[48097]: I/O queue driver: lio Oct 10 16:30:18 lab-pprod-xen03 tapdisk[48097]: tapdisk-log: started, level 0 Oct 10 16:30:18 lab-pprod-xen03 tapdisk[48097]: Tapdisk running, control on /var/run/blktap-control/ctl48097 Oct 10 16:30:18 lab-pprod-xen03 tapdisk[48097]: nbd: Set up local unix domain socket on path '/var/run/blktap-control/nbdclient48097' Oct 10 16:30:18 lab-pprod-xen03 tapdisk[48097]: received 'attach' message (uuid = 0) Oct 10 16:30:18 lab-pprod-xen03 tapdisk[48097]: sending 'attach response' message (uuid = 0) Oct 10 16:30:18 lab-pprod-xen03 tapdisk[48097]: received 'open' message (uuid = 0) Oct 10 16:30:18 lab-pprod-xen03 tapdisk[48097]: /var/run/sr-mount/5bd2dee1-cfb7-be70-0326-3f9070c4ca2d/721646c6-7a3f-4909-bde8-70dac75f5361.vhd version: tap 0x00010003, b: 25600, a: 2686, $ Oct 10 16:30:18 lab-pprod-xen03 tapdisk[48097]: opened image /var/run/sr-mount/5bd2dee1-cfb7-be70-0326-3f9070c4ca2d/721646c6-7a3f-4909-bde8-70dac75f5361.vhd (1 users, state: 0x00000001, ty$ Oct 10 16:30:18 lab-pprod-xen03 tapdisk[48097]: VBD CHAIN: Oct 10 16:30:18 lab-pprod-xen03 tapdisk[48097]: /var/run/sr-mount/5bd2dee1-cfb7-be70-0326-3f9070c4ca2d/721646c6-7a3f-4909-bde8-70dac75f5361.vhd: type:vhd(4) storage:ext(2) Oct 10 16:30:18 lab-pprod-xen03 tapdisk[48097]: bdev: capacity=104857600 sector_size=512/512 flags=0 Oct 10 16:30:18 lab-pprod-xen03 tapdisk[48097]: nbd: Set up local unix domain socket on path '/var/run/blktap-control/nbdserver48097.0' Oct 10 16:30:18 lab-pprod-xen03 tapdisk[48097]: nbd: registering for unix_listening_fd Oct 10 16:30:18 lab-pprod-xen03 tapdisk[48097]: nbd: Successfully started NBD server on /var/run/blktap-control/nbd-old48097.0 Oct 10 16:30:18 lab-pprod-xen03 tapdisk[48097]: nbd: Set up local unix domain socket on path '/var/run/blktap-control/nbdserver-new48097.0' Oct 10 16:30:18 lab-pprod-xen03 tapdisk[48097]: nbd: registering for unix_listening_fd Oct 10 16:30:18 lab-pprod-xen03 tapdisk[48097]: nbd: Successfully started NBD server on /var/run/blktap-control/nbd48097.0 Oct 10 16:30:18 lab-pprod-xen03 tapdisk[48097]: sending 'open response' message (uuid = 0) Oct 10 16:30:18 lab-pprod-xen03 tapback[48107]: tapback.c:445 slave tapback daemon started, only serving domain 7 Oct 10 16:30:18 lab-pprod-xen03 tapback[48107]: backend.c:406 768 physical_device_changed Oct 10 16:30:18 lab-pprod-xen03 tapback[48107]: backend.c:406 768 physical_device_changed Oct 10 16:30:18 lab-pprod-xen03 tapback[48107]: backend.c:492 768 found tapdisk[48097], for 254:0 Oct 10 16:30:18 lab-pprod-xen03 tapdisk[48097]: received 'disk info' message (uuid = 0) Oct 10 16:30:18 lab-pprod-xen03 tapdisk[48097]: VBD 0 got disk info: sectors=104857600 sector size=512, info=0 Oct 10 16:30:18 lab-pprod-xen03 tapdisk[48097]: sending 'disk info rsp' message (uuid = 0) Oct 10 16:30:18 lab-pprod-xen03 systemd[1]: Started transient unit for varstored-7. Oct 10 16:30:18 lab-pprod-xen03 systemd[1]: Starting transient unit for varstored-7... Oct 10 16:30:18 lab-pprod-xen03 varstored-7[48166]: main: --domain = '7' Oct 10 16:30:18 lab-pprod-xen03 varstored-7[48166]: main: --chroot = '/var/run/xen/varstored-root-7' Oct 10 16:30:18 lab-pprod-xen03 varstored-7[48166]: main: --depriv = '(null)' Oct 10 16:30:18 lab-pprod-xen03 varstored-7[48166]: main: --uid = '65542' Oct 10 16:30:18 lab-pprod-xen03 varstored-7[48166]: main: --gid = '998' Oct 10 16:30:18 lab-pprod-xen03 varstored-7[48166]: main: --backend = 'xapidb' Oct 10 16:30:18 lab-pprod-xen03 varstored-7[48166]: main: --arg = 'socket:/xapi-depriv-socket' Oct 10 16:30:18 lab-pprod-xen03 varstored-7[48166]: main: --pidfile = '/var/run/xen/varstored-7.pid' Oct 10 16:30:18 lab-pprod-xen03 varstored-7[48166]: main: --arg = 'uuid:ab6fa81f-59d2-8bb1-fdf8-35969838ec7a' Oct 10 16:30:18 lab-pprod-xen03 varstored-7[48166]: main: --arg = 'save:/efi-vars-save.dat' Oct 10 16:30:18 lab-pprod-xen03 varstored-7[48166]: varstored_initialize: 4 vCPU(s) Oct 10 16:30:18 lab-pprod-xen03 varstored-7[48166]: main: --arg = 'save:/efi-vars-save.dat' Oct 10 16:30:18 lab-pprod-xen03 varstored-7[48166]: varstored_initialize: 4 vCPU(s) Oct 10 16:30:18 lab-pprod-xen03 varstored-7[48166]: varstored_initialize: ioservid = 0 Oct 10 16:30:18 lab-pprod-xen03 varstored-7[48166]: varstored_initialize: iopage = 0x7f5b175d1000 Oct 10 16:30:18 lab-pprod-xen03 varstored-7[48166]: varstored_initialize: VCPU0: 7 -> 356 Oct 10 16:30:18 lab-pprod-xen03 varstored-7[48166]: varstored_initialize: VCPU1: 8 -> 357 Oct 10 16:30:18 lab-pprod-xen03 varstored-7[48166]: varstored_initialize: VCPU2: 9 -> 358 Oct 10 16:30:18 lab-pprod-xen03 varstored-7[48166]: varstored_initialize: VCPU3: 10 -> 359 Oct 10 16:30:18 lab-pprod-xen03 varstored-7[48166]: load_one_auth_data: Auth file '/var/lib/varstored/dbx.auth' is missing! Oct 10 16:30:18 lab-pprod-xen03 varstored-7[48166]: load_one_auth_data: Auth file '/var/lib/varstored/db.auth' is missing! Oct 10 16:30:18 lab-pprod-xen03 varstored-7[48166]: load_one_auth_data: Auth file '/var/lib/varstored/KEK.auth' is missing! Oct 10 16:30:18 lab-pprod-xen03 varstored-7[48166]: initialize_settings: Secure boot enable: false Oct 10 16:30:18 lab-pprod-xen03 varstored-7[48166]: initialize_settings: Authenticated variables: enforcing Oct 10 16:30:18 lab-pprod-xen03 varstored-7[48166]: IO request not ready Oct 10 16:30:18 lab-pprod-xen03 varstored-7[48166]: message repeated 3 times: [ IO request not ready] Oct 10 16:30:18 lab-pprod-xen03 forkexecd: [ info||0 ||forkexecd] qemu-dm-7[48182]: Arguments: 7 --syslog -std-vga -videoram 8 -vnc unix:/var/run/xen/vnc-7,lock-key-sync=off -acpi -priv -m$ Oct 10 16:30:18 lab-pprod-xen03 forkexecd: [ info||0 ||forkexecd] qemu-dm-7[48182]: Exec: /usr/lib64/xen/bin/qemu-system-i386 qemu-dm-7 -machine pc-i440fx-2.10,accel=xen,max-ram-below-4g=3$ Oct 10 16:30:18 lab-pprod-xen03 qemu-dm-7[48225]: Moving to cgroup slice 'vm.slice' Oct 10 16:30:18 lab-pprod-xen03 qemu-dm-7[48225]: core dump limit: 67108864 Oct 10 16:30:18 lab-pprod-xen03 qemu-dm-7[48225]: char device redirected to /dev/pts/2 (label serial0) Oct 10 16:30:18 lab-pprod-xen03 ovs-vsctl: ovs|00001|db_ctl_base|ERR|no row "tap7.0" in table Interface Oct 10 16:30:18 lab-pprod-xen03 ovs-vsctl: ovs|00001|db_ctl_base|ERR|no row "tap7.1" in table Interface Oct 10 16:30:18 lab-pprod-xen03 ovs-vsctl: ovs|00001|vsctl|INFO|Called as ovs-vsctl --timeout=30 -- --if-exists del-port tap7.0 Oct 10 16:30:18 lab-pprod-xen03 ovs-vsctl: ovs|00001|vsctl|INFO|Called as ovs-vsctl --timeout=30 -- --if-exists del-port tap7.1 Oct 10 16:30:18 lab-pprod-xen03 ovs-vsctl: ovs|00001|vsctl|INFO|Called as ovs-vsctl --timeout=30 add-port xapi0 tap7.0 -- set interface tap7.0 "external-ids:\"xs-vm-uuid\"=\"ab6fa81f-59d2-$ Oct 10 16:30:18 lab-pprod-xen03 ovs-vsctl: ovs|00001|vsctl|INFO|Called as ovs-vsctl --timeout=30 add-port xapi3 tap7.1 -- set interface tap7.1 "external-ids:\"xs-vm-uuid\"=\"ab6fa81f-59d2-$ Oct 10 16:30:19 lab-pprod-xen03 qemu-dm-7[48225]: [00:08.0] xen_pt_region_update: Error: create new mem mapping failed! (err: 1) Oct 10 16:30:33 lab-pprod-xen03 qemu-dm-7[48225]: Detected Xen version 4.17 Oct 10 16:30:34 lab-pprod-xen03 qemu-dm-7[48225]: [00:08.0] xen_pt_region_update: Error: remove old mem mapping failed! (err: 1) Oct 10 16:30:35 lab-pprod-xen03 qemu-dm-7[48225]: [00:08.0] xen_pt_region_update: Error: create new mem mapping failed! (err: 1) Oct 10 16:30:35 lab-pprod-xen03 qemu-dm-7[48225]: [00:08.0] xen_pt_region_update: Error: remove old mem mapping failed! (err: 1) Oct 10 16:30:36 lab-pprod-xen03 qemu-dm-7[48225]: [00:08.0] xen_pt_region_update: Error: create new mem mapping failed! (err: 1) Oct 10 16:30:37 lab-pprod-xen03 tapdisk[48097]: received 'sring connect' message (uuid = 0) Oct 10 16:30:37 lab-pprod-xen03 tapdisk[48097]: connecting VBD 0 domid=7, devid=768, pool (null), evt 16, poll duration 1000, poll idle threshold 50 Oct 10 16:30:37 lab-pprod-xen03 tapdisk[48097]: ring 0xbed010 connected Oct 10 16:30:37 lab-pprod-xen03 tapdisk[48097]: sending 'sring connect rsp' message (uuid = 0) Oct 10 16:30:37 lab-pprod-xen03 qemu-dm-7[48225]: XenPvBlk: New disk with 104857600 sectors of 512 bytes Oct 10 16:30:38 lab-pprod-xen03 qemu-dm-7[48225]: About to call StartImage (0xDEC16D18) Oct 10 16:30:40 lab-pprod-xen03 qemu-dm-7[48225]: ExitBootServices -> (0xDEC16D18, 0xD9D) Oct 10 16:30:40 lab-pprod-xen03 tapdisk[48097]: received 'sring disconnect' message (uuid = 0) Oct 10 16:30:40 lab-pprod-xen03 tapdisk[48097]: disconnecting domid=7, devid=768 Oct 10 16:30:40 lab-pprod-xen03 tapdisk[48097]: sending 'sring disconnect rsp' message (uuid = 0) Oct 10 16:30:40 lab-pprod-xen03 qemu-dm-7[48225]: ExitBootServices <- (Success) Oct 10 16:30:41 lab-pprod-xen03 qemu-dm-7[48225]: SetVirtualAddressMap -> (0x4B0, 0x30, 0x1) Oct 10 16:30:41 lab-pprod-xen03 qemu-dm-7[48225]: SetVirtualAddressMap <- (Success) Oct 10 16:30:41 lab-pprod-xen03 ovs-ofctl: ovs|00001|ofp_port|WARN|Negative value -1 is not a valid port number. Oct 10 16:30:41 lab-pprod-xen03 ovs-vsctl: ovs|00001|vsctl|INFO|Called as ovs-vsctl --timeout=30 -- --if-exists del-port tap7.1 Oct 10 16:30:41 lab-pprod-xen03 ovs-ofctl: ovs|00001|ofp_port|WARN|Negative value -1 is not a valid port number. Oct 10 16:30:41 lab-pprod-xen03 ovs-vsctl: ovs|00001|vsctl|INFO|Called as ovs-vsctl --timeout=30 -- --if-exists del-port tap7.0 Oct 10 16:30:41 lab-pprod-xen03 qemu-dm-7[48225]: [00:08.0] xen_pt_region_update: Error: remove old mem mapping failed! (err: 1) Oct 10 16:30:41 lab-pprod-xen03 ovs-vsctl: ovs|00001|vsctl|INFO|Called as ovs-vsctl --timeout=30 -- --if-exists del-port tap7.0 Oct 10 16:30:41 lab-pprod-xen03 qemu-dm-7[48225]: [00:08.0] xen_pt_region_update: Error: remove old mem mapping failed! (err: 1) Oct 10 16:30:41 lab-pprod-xen03 qemu-dm-7[48225]: [00:08.0] xen_pt_region_update: Error: create new mem mapping failed! (err: 1) Oct 10 16:30:43 lab-pprod-xen03 tapback[48107]: frontend.c:216 768 front-end supports persistent grants but we don't Oct 10 16:30:43 lab-pprod-xen03 tapdisk[48097]: received 'sring connect' message (uuid = 0) Oct 10 16:30:43 lab-pprod-xen03 tapdisk[48097]: connecting VBD 0 domid=7, devid=768, pool (null), evt 49, poll duration 1000, poll idle threshold 50 Oct 10 16:30:43 lab-pprod-xen03 tapdisk[48097]: ring 0xbee810 connected Oct 10 16:30:43 lab-pprod-xen03 tapdisk[48097]: sending 'sring connect rsp' message (uuid = 0) Oct 10 16:30:47 lab-pprod-xen03 systemd[1]: Stopping transient unit for varstored-7... Oct 10 16:30:47 lab-pprod-xen03 systemd[1]: Stopped transient unit for varstored-7. Oct 10 16:30:47 lab-pprod-xen03 qemu-dm-7[48225]: qemu-dm-7: terminating on signal 15 from pid 2169 (/usr/sbin/xenopsd-xc) Oct 10 16:30:47 lab-pprod-xen03 /opt/xensource/libexec/xcp-clipboardd[48221]: poll failed because revents=0x11 (qemu socket) Oct 10 16:30:47 lab-pprod-xen03 tapdisk[48097]: received 'sring disconnect' message (uuid = 0) Oct 10 16:30:47 lab-pprod-xen03 tapdisk[48097]: disconnecting domid=7, devid=768 Oct 10 16:30:47 lab-pprod-xen03 tapdisk[48097]: sending 'sring disconnect rsp' message (uuid = 0) Oct 10 16:30:47 lab-pprod-xen03 tapdisk[48097]: received 'close' message (uuid = 0) Oct 10 16:30:47 lab-pprod-xen03 tapdisk[48097]: nbd: NBD server pause(0xbfe410) Oct 10 16:30:47 lab-pprod-xen03 tapdisk[48097]: nbd: NBD server pause(0xbfe610) Oct 10 16:30:48 lab-pprod-xen03 tapdisk[48097]: nbd: NBD server free(0xbfe410) Oct 10 16:30:48 lab-pprod-xen03 tapdisk[48097]: nbd: NBD server free(0xbfe610) Oct 10 16:30:48 lab-pprod-xen03 tapdisk[48097]: gaps written/skipped: 2/0 Oct 10 16:30:48 lab-pprod-xen03 tapdisk[48097]: /var/run/sr-mount/5bd2dee1-cfb7-be70-0326-3f9070c4ca2d/721646c6-7a3f-4909-bde8-70dac75f5361.vhd: b: 25600, a: 2686, f: 2658, n: 11023552 Oct 10 16:30:48 lab-pprod-xen03 tapdisk[48097]: closed image /var/run/sr-mount/5bd2dee1-cfb7-be70-0326-3f9070c4ca2d/721646c6-7a3f-4909-bde8-70dac75f5361.vhd (0 users, state: 0x00000000, ty$ Oct 10 16:30:48 lab-pprod-xen03 tapdisk[48097]: sending 'close response' message (uuid = 0) Oct 10 16:30:48 lab-pprod-xen03 tapdisk[48097]: received 'detach' message (uuid = 0) Oct 10 16:30:48 lab-pprod-xen03 tapdisk[48097]: sending 'detach response' message (uuid = 0) Oct 10 16:30:48 lab-pprod-xen03 tapdisk[48097]: tapdisk-log: closing after 0 errors Oct 10 16:30:48 lab-pprod-xen03 tapdisk[48097]: tapdisk-syslog: 32 messages, 2735 bytes, xmits: 33, failed: 0, dropped: 0 Oct 10 16:30:48 lab-pprod-xen03 ovs-ofctl: ovs|00001|ofp_port|WARN|Negative value -1 is not a valid port number. Oct 10 16:30:48 lab-pprod-xen03 ovs-vsctl: ovs|00001|vsctl|INFO|Called as ovs-vsctl --timeout=30 -- --if-exists del-port vif7.1 Oct 10 16:30:48 lab-pprod-xen03 tapdisk[48097]: tapdisk-control: draining 1 connections Oct 10 16:30:48 lab-pprod-xen03 tapdisk[48097]: tapdisk-control: done Oct 10 16:30:48 lab-pprod-xen03 ovs-ofctl: ovs|00001|ofp_port|WARN|Negative value -1 is not a valid port number. Oct 10 16:30:48 lab-pprod-xen03 ovs-vsctl: ovs|00001|vsctl|INFO|Called as ovs-vsctl --timeout=30 -- --if-exists del-port vif7.0 Oct 10 16:30:49 lab-pprod-xen03 tapback[48107]: backend.c:1246 domain removed, exit
@olivierlambert SKU is KCMYXRUG15T3
-
RE: Kioxia CM7 PCIe pass-through crash
I've checked an I do not have any patch available for the host:
[16:06 lab-pprod-xen03 ~]# yum update Loaded plugins: fastestmirror Loading mirror speeds from cached hostfile Excluding mirror: updates.xcp-ng.org * xcp-ng-base: mirrors.xcp-ng.org Excluding mirror: updates.xcp-ng.org * xcp-ng-updates: mirrors.xcp-ng.org No packages marked for update [16:06 lab-pprod-xen03 ~]# yum upgrade Loaded plugins: fastestmirror Loading mirror speeds from cached hostfile Excluding mirror: updates.xcp-ng.org * xcp-ng-base: mirrors.xcp-ng.org Excluding mirror: updates.xcp-ng.org * xcp-ng-updates: mirrors.xcp-ng.org No packages marked for update [16:06 lab-pprod-xen03 ~]#
I've run the commands, but there doesn't seem to be additionnal logs. Should I be looking elsewhere?
[2025-10-10 16:28:15] (XEN) [ 4959.062050] 'G' pressed -> guest log level adjustments enabled [2025-10-10 16:28:16] (XEN) [ 4960.384372] '+' pressed -> guest log level: Errors (rate limited Errors and warnings) [2025-10-10 16:30:47] (XEN) [ 5110.535350] domain_crash called from svm_vmexit_handler+0x129f/0x1480 [2025-10-10 16:30:47] (XEN) [ 5110.535351] Domain 7 (vcpu#3) crashed on cpu#3: [2025-10-10 16:30:47] (XEN) [ 5110.535354] ----[ Xen-4.17.5-15 x86_64 debug=n Not tainted ]---- [2025-10-10 16:30:47] (XEN) [ 5110.535354] CPU: 3 [2025-10-10 16:30:47] (XEN) [ 5110.535355] RIP: 0010:[<ffffffffa6d1b5d0>] [2025-10-10 16:30:47] (XEN) [ 5110.535356] RFLAGS: 0000000000000286 CONTEXT: hvm guest (d7v3) [2025-10-10 16:30:47] (XEN) [ 5110.535358] rax: ffff9e0280081200 rbx: ffff9e028007fba4 rcx: 0000000000000000 [2025-10-10 16:30:47] (XEN) [ 5110.535359] rdx: 00000000fee97000 rsi: 0000000000000000 rdi: 0000000000000000 [2025-10-10 16:30:47] (XEN) [ 5110.535359] rbp: ffff8dc094984780 rsp: ffff9e028007fb58 r8: 0000000000000000 [2025-10-10 16:30:47] (XEN) [ 5110.535360] r9: 0000000000000000 r10: ffff9e028007fb18 r11: 0000000000000000 [2025-10-10 16:30:47] (XEN) [ 5110.535361] r12: 0000000000000197 r13: ffff8dc040bd40c8 r14: 0000000000000011 [2025-10-10 16:30:47] (XEN) [ 5110.535361] r15: 0000000000000001 cr0: 0000000080050033 cr4: 0000000000770ef0 [2025-10-10 16:30:47] (XEN) [ 5110.535362] cr3: 000000010200c006 cr2: 0000000000000000 [2025-10-10 16:30:47] (XEN) [ 5110.535362] fsb: 0000000000000000 gsb: ffff8dc157580000 gss: 0000000000000000 [2025-10-10 16:30:47] (XEN) [ 5110.535363] ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0018 cs: 0010
-
RE: Kioxia CM7 PCIe pass-through crash
@TeddyAstie It's a newly deployed Rocky Linux 9.6 with all the latest updates applied to it.
Nested virtualization is disabled.
-
RE: Coral TPU PCI Passthrough
@andSmv Any news on a test build with the patch? I'm wondering if this issue is related and would love to be able to test.
-
Kioxia CM7 PCIe pass-through crash
I'm having a weird issue with PCIe pass-through for our KIOXIA CM7 drives. We have a bunch of KIOXIA CX6 drives and those are being pass-through without any issue.
First thing I've noticed is that the device ID is not being properly recognized, and instead of showing the CM7 name, it's only displaying the device as
Device 0013
.I've tried raising the IRQ limit as explained in the doc.
/opt/xensource/libexec/xen-cmdline --set-xen "extra_guest_irqs=128"
Here are some logs from hypervisor.log when the VM crashes during boot.
[2025-10-08 19:15:52] (XEN) [ 93.430177] d[IDLE]v14: Unsupported MSI delivery mode 7 for Dom1 [2025-10-08 19:15:52] (XEN) [ 93.436563] d[IDLE]v14: Unsupported MSI delivery mode 7 for Dom1 [2025-10-08 19:15:52] (XEN) [ 93.439733] d1v0: Unsupported MSI delivery mode 7 for Dom1 [2025-10-08 19:15:52] (XEN) [ 93.448323] d1v0: Unsupported MSI delivery mode 7 for Dom1 [2025-10-08 19:15:52] (XEN) [ 93.448801] d1v0: Unsupported MSI delivery mode 7 for Dom1 [2025-10-08 19:15:52] (XEN) [ 93.457235] d1v0: Unsupported MSI delivery mode 7 for Dom1 [2025-10-08 19:27:23] (XEN) [ 784.468669] domain_crash called from svm_vmexit_handler+0x129f/0x1480 [2025-10-08 19:27:23] (XEN) [ 784.468671] Domain 3 (vcpu#1) crashed on cpu#1: [2025-10-08 19:27:23] (XEN) [ 784.468673] ----[ Xen-4.17.5-15 x86_64 debug=n Not tainted ]---- [2025-10-08 19:27:23] (XEN) [ 784.468674] CPU: 1 [2025-10-08 19:27:23] (XEN) [ 784.468674] RIP: 0010:[<ffffffffa751b5d0>] [2025-10-08 19:27:23] (XEN) [ 784.468675] RFLAGS: 0000000000000286 CONTEXT: hvm guest (d3v1) [2025-10-08 19:27:23] (XEN) [ 784.468676] rax: ffffb90bc0071200 rbx: ffffb90bc007fba4 rcx: 0000000000000000 [2025-10-08 19:27:23] (XEN) [ 784.468677] rdx: 00000000fee97000 rsi: 0000000000000000 rdi: 0000000000000000 [2025-10-08 19:27:23] (XEN) [ 784.468678] rbp: ffff944cba5c9a80 rsp: ffffb90bc007fb58 r8: 0000000000000000 [2025-10-08 19:27:23] (XEN) [ 784.468678] r9: 0000000000000000 r10: ffffb90bc007fb18 r11: 0000000000000000 [2025-10-08 19:27:23] (XEN) [ 784.468679] r12: 0000000000000197 r13: ffff944c80c830c8 r14: 0000000000000011 [2025-10-08 19:27:23] (XEN) [ 784.468679] r15: 0000000000000001 cr0: 0000000080050033 cr4: 0000000000770ef0 [2025-10-08 19:27:23] (XEN) [ 784.468680] cr3: 00000001105dc006 cr2: 0000000000000000 [2025-10-08 19:27:23] (XEN) [ 784.468680] fsb: 0000000000000000 gsb: ffff944d97480000 gss: 0000000000000000 [2025-10-08 19:27:23] (XEN) [ 784.468681] ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0018 cs: 0010
The unsupported MSI delivery mode seems to be from our CX6 drives, but they seem to be working fine.
Here is a lspci output for one of the drive:
lspci -s e1:00.0 -vv e1:00.0 Non-Volatile memory controller: KIOXIA Corporation Device 0013 (rev 01) (prog-if 02 [NVM Express]) Subsystem: KIOXIA Corporation Device 0043 Physical Slot: 0-2 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 182 Region 0: Memory at f2810000 (64-bit, non-prefetchable) [size=64K] Expansion ROM at f2800000 [disabled] [size=64K] Capabilities: [40] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [70] Express (v2) Endpoint, MSI 00 DevCap: MaxPayload 1024 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 75.000W DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported- RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset- MaxPayload 512 bytes, MaxReadReq 512 bytes DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend- LnkCap: Port #0, Speed unknown, Width x4, ASPM not supported, Exit Latency L0s <2us, L1 <64us ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+ LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 16GT/s, Width x4, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR-, OBFF Not Supported DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled LnkCtl2: Target Link Speed: Unknown, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+, EqualizationPhase1+ EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest- Capabilities: [b0] MSI-X: Enable- Count=129 Masked- Vector table: BAR=0 offset=00005200 PBA: BAR=0 offset=0000d600 Capabilities: [d0] Vital Product Data Product Name: KIOXIA ESSD Read-only fields: [PN] Part number: KIOXIA KCMYXRUG15T3 [EC] Engineering changes: 0001 [SN] Serial number: 3DH0A00A0LP1 [MN] Manufacture ID: 31 45 30 46 [RV] Reserved: checksum good, 26 byte(s) reserved End Capabilities: [100 v2] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol- UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO+ CmpltAbrt- UnxCmplt+ RxOF+ MalfTLP+ ECRC+ UnsupReq- ACSViol- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr- AERCap: First Error Pointer: 00, GenCap+ CGenEn+ ChkCap+ ChkEn+ Capabilities: [148 v1] Device Serial Number 8c-e3-8e-e3-00-32-1f-01 Capabilities: [168 v1] Alternative Routing-ID Interpretation (ARI) ARICap: MFVC- ACS-, Next Function: 0 ARICtl: MFVC- ACS-, Function Group: 0 Capabilities: [178 v1] #19 Capabilities: [198 v1] #26 Capabilities: [1c0 v1] #27 Capabilities: [1e8 v1] #2a Capabilities: [210 v1] Single Root I/O Virtualization (SR-IOV) IOVCap: Migration-, Interrupt Message Number: 000 IOVCtl: Enable- Migration- Interrupt- MSE- ARIHierarchy+ IOVSta: Migration- Initial VFs: 32, Total VFs: 32, Number of VFs: 0, Function Dependency Link: 00 VF offset: 1, stride: 1, Device ID: 0013 Supported Page Size: 00000553, System Page Size: 00000001 Region 0: Memory at 00000000f2600000 (64-bit, non-prefetchable) VF Migration: offset: 00000000, BIR: 0 Kernel driver in use: pciback Kernel modules: nvme
The only thing I haven't tried to do yet is enable SR-IOV, but I don't think that would really change anything.
Thanks in advance for anyone chiming in!
-
RE: Icon appears in the XOA interface
@olivierlambert Thanks for the suggestion! The problem is I have no idea where I would start to build it for Slackware. I'll see if I can figure it out but with my research, I'm not quite sure I'll be able to.
-
RE: Icon appears in the XOA interface
@eurodrigolira Sorry to revive this topic, but do you have pointers on how to build a slackware package for xe-guest-utilities? I'm trying to add the VM guest tools to UnRAID and I'm not having much luck.
-
RE: Three-node Networking for XOSTOR
@T3CCH What you might be looking for: https://xcp-ng.org/docs/networking.html#full-mesh-network
-
RE: XOSTOR hyperconvergence preview
@ronan-a Thanks a lot for that procedure.
Ended up needing to do a little bit more, since for some reason, "evacuate" failed. I deleted the node and then went and just manually recreated my resources using:
linstor resource create --auto-place +1 <resource_name>
Which didn't work at first because the new node didn't have a storage-pool configured, which required this command to work (NOTE - This is only valid if your SR was setup as thin):
linstor storage-pool create lvmthin <node_name> xcp-sr-linstor_group_thin_device linstor_group/thin_device
Also, worth nothing that before actually re-creating the resources, you might want to manually clean up the lingering Logical Volumes that weren't cleaned up if evacuate failed.
Find volumes with:
lvdisplay
and then delete them with:
lvremove <LV Path>
example:
lvremove /dev/linstor_group/xcp-persistent-database_00000
-
RE: XOSTOR hyperconvergence preview
@ronan-a Do you know of a way to update a node name in Linstor? I've tried to look in their documentation and checked through CLI commands but couldn't find a way.
-
RE: XOSTOR hyperconvergence preview
@ronan-a I will be testing my theory a little bit later today, but I believe it might be a hostname mismatch between the node name it expects in linstor and what it set to now on Dom0. We had the hostname of the node updated before the cluster was spinned up, but I think it still had the previous name active when the linstor SR was created.
This means that the node name doesn't match here:
https://github.com/xcp-ng/sm/blob/e951676098c80e6da6de4d4653f496b15f5a8cb9/drivers/linstorvolumemanager.py#L2641C21-L2641C41I will try to revert the hostname and see if it fixes everything.
Edit: Just tested and reverted the hostname to the default one, which matches what's in linstor, and it works again. So seems like changing a hostname after the cluster is provisionned is a no-no.
-
RE: XOSTOR hyperconvergence preview
@ronan-a said in XOSTOR hyperconvergence preview:
drbdsetup events2
Host1:
[09:49 xcp-ng-labs-host01 ~]# systemctl status linstor-controller ● linstor-controller.service - drbd-reactor controlled linstor-controller Loaded: loaded (/usr/lib/systemd/system/linstor-controller.service; disabled; vendor preset: disabled) Drop-In: /run/systemd/system/linstor-controller.service.d └─reactor.conf Active: active (running) since Thu 2024-05-02 13:24:32 PDT; 20h ago Main PID: 21340 (java) CGroup: /system.slice/linstor-controller.service └─21340 /usr/lib/jvm/jre-11/bin/java -Xms32M -classpath /usr/share/linstor-server/lib/conf:/usr/share/linstor-server/lib/* com.linbit.linstor.core.Controller --logs=/var/log/linstor-controller --config-directory=/etc/linstor [09:49 xcp-ng-labs-host01 ~]# systemctl status linstor-satellite ● linstor-satellite.service - LINSTOR Satellite Service Loaded: loaded (/usr/lib/systemd/system/linstor-satellite.service; enabled; vendor preset: disabled) Drop-In: /etc/systemd/system/linstor-satellite.service.d └─override.conf Active: active (running) since Wed 2024-05-01 16:04:05 PDT; 1 day 17h ago Main PID: 1947 (java) CGroup: /system.slice/linstor-satellite.service ├─1947 /usr/lib/jvm/jre-11/bin/java -Xms32M -classpath /usr/share/linstor-server/lib/conf:/usr/share/linstor-server/lib/* com.linbit.linstor.core.Satellite --logs=/var/log/linstor-satellite --config-directory=/etc/linstor ├─2109 drbdsetup events2 all └─2347 /usr/sbin/dmeventd [09:49 xcp-ng-labs-host01 ~]# systemctl status drbd-reactor ● drbd-reactor.service - DRBD-Reactor Service Loaded: loaded (/usr/lib/systemd/system/drbd-reactor.service; enabled; vendor preset: disabled) Drop-In: /etc/systemd/system/drbd-reactor.service.d └─override.conf Active: active (running) since Wed 2024-05-01 16:04:11 PDT; 1 day 17h ago Docs: man:drbd-reactor man:drbd-reactorctl man:drbd-reactor.toml Main PID: 1950 (drbd-reactor) CGroup: /system.slice/drbd-reactor.service ├─1950 /usr/sbin/drbd-reactor └─1976 drbdsetup events2 --full --poll [09:49 xcp-ng-labs-host01 ~]# mountpoint /var/lib/linstor /var/lib/linstor is a mountpoint [09:49 xcp-ng-labs-host01 ~]# drbdsetup events2 exists resource name:xcp-persistent-database role:Primary suspended:no force-io-failures:no may_promote:no promotion_score:10103 exists connection name:xcp-persistent-database peer-node-id:1 conn-name:xcp-ng-labs-host03 connection:Connected role:Secondary exists connection name:xcp-persistent-database peer-node-id:2 conn-name:xcp-ng-labs-host02 connection:Connected role:Secondary exists device name:xcp-persistent-database volume:0 minor:1000 backing_dev:/dev/linstor_group/xcp-persistent-database_00000 disk:UpToDate client:no quorum:yes exists peer-device name:xcp-persistent-database peer-node-id:1 conn-name:xcp-ng-labs-host03 volume:0 replication:Established peer-disk:UpToDate peer-client:no resync-suspended:no exists path name:xcp-persistent-database peer-node-id:1 conn-name:xcp-ng-labs-host03 local:ipv4:10.100.0.200:7000 peer:ipv4:10.100.0.202:7000 established:yes exists peer-device name:xcp-persistent-database peer-node-id:2 conn-name:xcp-ng-labs-host02 volume:0 replication:Established peer-disk:UpToDate peer-client:no resync-suspended:no exists path name:xcp-persistent-database peer-node-id:2 conn-name:xcp-ng-labs-host02 local:ipv4:10.100.0.200:7000 peer:ipv4:10.100.0.201:7000 established:yes exists resource name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 role:Secondary suspended:no force-io-failures:no may_promote:no promotion_score:10103 exists connection name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 peer-node-id:1 conn-name:xcp-ng-labs-host03 connection:Connected role:Secondary exists connection name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 peer-node-id:0 conn-name:xcp-ng-labs-host02 connection:Connected role:Primary exists device name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 volume:0 minor:1001 backing_dev:/dev/linstor_group/xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0_00000 disk:UpToDate client:no quorum:yes exists peer-device name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 peer-node-id:1 conn-name:xcp-ng-labs-host03 volume:0 replication:Established peer-disk:UpToDate peer-client:no resync-suspended:no exists path name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 peer-node-id:1 conn-name:xcp-ng-labs-host03 local:ipv4:10.100.0.200:7001 peer:ipv4:10.100.0.202:7001 established:yes exists peer-device name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 peer-node-id:0 conn-name:xcp-ng-labs-host02 volume:0 replication:Established peer-disk:UpToDate peer-client:no resync-suspended:no exists path name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 peer-node-id:0 conn-name:xcp-ng-labs-host02 local:ipv4:10.100.0.200:7001 peer:ipv4:10.100.0.201:7001 established:yes exists -
Host2:
[09:51 xcp-ng-labs-host02 ~]# systemctl status linstor-controller ● linstor-controller.service - drbd-reactor controlled linstor-controller Loaded: loaded (/usr/lib/systemd/system/linstor-controller.service; disabled; vendor preset: disabled) Drop-In: /run/systemd/system/linstor-controller.service.d └─reactor.conf Active: inactive (dead) [09:51 xcp-ng-labs-host02 ~]# systemctl status linstor-satellite ● linstor-satellite.service - LINSTOR Satellite Service Loaded: loaded (/usr/lib/systemd/system/linstor-satellite.service; enabled; vendor preset: disabled) Drop-In: /etc/systemd/system/linstor-satellite.service.d └─override.conf Active: active (running) since Thu 2024-05-02 10:26:59 PDT; 23h ago Main PID: 1990 (java) CGroup: /system.slice/linstor-satellite.service ├─1990 /usr/lib/jvm/jre-11/bin/java -Xms32M -classpath /usr/share/linstor-server/lib/conf:/usr/share/linstor-server/lib/* com.linbit.linstor.core.Satellite --logs=/var/log/linstor-satellite --config-directory=/etc/linstor ├─2128 drbdsetup events2 all └─2552 /usr/sbin/dmeventd [09:51 xcp-ng-labs-host02 ~]# systemctl status drbd-reactor ● drbd-reactor.service - DRBD-Reactor Service Loaded: loaded (/usr/lib/systemd/system/drbd-reactor.service; enabled; vendor preset: disabled) Drop-In: /etc/systemd/system/drbd-reactor.service.d └─override.conf Active: active (running) since Thu 2024-05-02 10:27:07 PDT; 23h ago Docs: man:drbd-reactor man:drbd-reactorctl man:drbd-reactor.toml Main PID: 1989 (drbd-reactor) CGroup: /system.slice/drbd-reactor.service ├─1989 /usr/sbin/drbd-reactor └─2035 drbdsetup events2 --full --poll [09:51 xcp-ng-labs-host02 ~]# mountpoint /var/lib/linstor /var/lib/linstor is not a mountpoint [09:51 xcp-ng-labs-host02 ~]# drbdsetup events2 exists resource name:xcp-persistent-database role:Secondary suspended:no force-io-failures:no may_promote:no promotion_score:10103 exists connection name:xcp-persistent-database peer-node-id:0 conn-name:xcp-ng-labs-host01 connection:Connected role:Primary exists connection name:xcp-persistent-database peer-node-id:1 conn-name:xcp-ng-labs-host03 connection:Connected role:Secondary exists device name:xcp-persistent-database volume:0 minor:1000 backing_dev:/dev/linstor_group/xcp-persistent-database_00000 disk:UpToDate client:no quorum:yes exists peer-device name:xcp-persistent-database peer-node-id:0 conn-name:xcp-ng-labs-host01 volume:0 replication:Established peer-disk:UpToDate peer-client:no resync-suspended:no exists path name:xcp-persistent-database peer-node-id:0 conn-name:xcp-ng-labs-host01 local:ipv4:10.100.0.201:7000 peer:ipv4:10.100.0.200:7000 established:yes exists peer-device name:xcp-persistent-database peer-node-id:1 conn-name:xcp-ng-labs-host03 volume:0 replication:Established peer-disk:UpToDate peer-client:no resync-suspended:no exists path name:xcp-persistent-database peer-node-id:1 conn-name:xcp-ng-labs-host03 local:ipv4:10.100.0.201:7000 peer:ipv4:10.100.0.202:7000 established:yes exists resource name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 role:Primary suspended:no force-io-failures:no may_promote:no promotion_score:10103 exists connection name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 peer-node-id:2 conn-name:xcp-ng-labs-host01 connection:Connected role:Secondary exists connection name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 peer-node-id:1 conn-name:xcp-ng-labs-host03 connection:Connected role:Secondary exists device name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 volume:0 minor:1001 backing_dev:/dev/linstor_group/xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0_00000 disk:UpToDate client:no quorum:yes exists peer-device name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 peer-node-id:2 conn-name:xcp-ng-labs-host01 volume:0 replication:Established peer-disk:UpToDate peer-client:no resync-suspended:no exists path name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 peer-node-id:2 conn-name:xcp-ng-labs-host01 local:ipv4:10.100.0.201:7001 peer:ipv4:10.100.0.200:7001 established:yes exists peer-device name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 peer-node-id:1 conn-name:xcp-ng-labs-host03 volume:0 replication:Established peer-disk:UpToDate peer-client:no resync-suspended:no exists path name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 peer-node-id:1 conn-name:xcp-ng-labs-host03 local:ipv4:10.100.0.201:7001 peer:ipv4:10.100.0.202:7001 established:yes exists -
Host3:
[09:51 xcp-ng-labs-host03 ~]# systemctl status linstor-controller ● linstor-controller.service - drbd-reactor controlled linstor-controller Loaded: loaded (/usr/lib/systemd/system/linstor-controller.service; disabled; vendor preset: disabled) Drop-In: /run/systemd/system/linstor-controller.service.d └─reactor.conf Active: inactive (dead) [09:52 xcp-ng-labs-host03 ~]# systemctl status linstor-satellite ● linstor-satellite.service - LINSTOR Satellite Service Loaded: loaded (/usr/lib/systemd/system/linstor-satellite.service; enabled; vendor preset: disabled) Drop-In: /etc/systemd/system/linstor-satellite.service.d └─override.conf Active: active (running) since Thu 2024-05-02 10:10:16 PDT; 23h ago Main PID: 1937 (java) CGroup: /system.slice/linstor-satellite.service ├─1937 /usr/lib/jvm/jre-11/bin/java -Xms32M -classpath /usr/share/linstor-server/lib/conf:/usr/share/linstor-server/lib/* com.linbit.linstor.core.Satellite --logs=/var/log/linstor-satellite --config-directory=/etc/linstor ├─2151 drbdsetup events2 all └─2435 /usr/sbin/dmeventd [09:52 xcp-ng-labs-host03 ~]# systemctl status drbd-reactor ● drbd-reactor.service - DRBD-Reactor Service Loaded: loaded (/usr/lib/systemd/system/drbd-reactor.service; enabled; vendor preset: disabled) Drop-In: /etc/systemd/system/drbd-reactor.service.d └─override.conf Active: active (running) since Thu 2024-05-02 10:10:26 PDT; 23h ago Docs: man:drbd-reactor man:drbd-reactorctl man:drbd-reactor.toml Main PID: 1939 (drbd-reactor) CGroup: /system.slice/drbd-reactor.service ├─1939 /usr/sbin/drbd-reactor └─1981 drbdsetup events2 --full --poll [09:52 xcp-ng-labs-host03 ~]# mountpoint /var/lib/linstor /var/lib/linstor is not a mountpoint [09:52 xcp-ng-labs-host03 ~]# drbdsetup events2 exists resource name:xcp-persistent-database role:Secondary suspended:no force-io-failures:no may_promote:no promotion_score:10103 exists connection name:xcp-persistent-database peer-node-id:0 conn-name:xcp-ng-labs-host01 connection:Connected role:Primary exists connection name:xcp-persistent-database peer-node-id:2 conn-name:xcp-ng-labs-host02 connection:Connected role:Secondary exists device name:xcp-persistent-database volume:0 minor:1000 backing_dev:/dev/linstor_group/xcp-persistent-database_00000 disk:UpToDate client:no quorum:yes exists peer-device name:xcp-persistent-database peer-node-id:0 conn-name:xcp-ng-labs-host01 volume:0 replication:Established peer-disk:UpToDate peer-client:no resync-suspended:no exists path name:xcp-persistent-database peer-node-id:0 conn-name:xcp-ng-labs-host01 local:ipv4:10.100.0.202:7000 peer:ipv4:10.100.0.200:7000 established:yes exists peer-device name:xcp-persistent-database peer-node-id:2 conn-name:xcp-ng-labs-host02 volume:0 replication:Established peer-disk:UpToDate peer-client:no resync-suspended:no exists path name:xcp-persistent-database peer-node-id:2 conn-name:xcp-ng-labs-host02 local:ipv4:10.100.0.202:7000 peer:ipv4:10.100.0.201:7000 established:yes exists resource name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 role:Secondary suspended:no force-io-failures:no may_promote:no promotion_score:10103 exists connection name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 peer-node-id:2 conn-name:xcp-ng-labs-host01 connection:Connected role:Secondary exists connection name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 peer-node-id:0 conn-name:xcp-ng-labs-host02 connection:Connected role:Primary exists device name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 volume:0 minor:1001 backing_dev:/dev/linstor_group/xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0_00000 disk:UpToDate client:no quorum:yes exists peer-device name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 peer-node-id:2 conn-name:xcp-ng-labs-host01 volume:0 replication:Established peer-disk:UpToDate peer-client:no resync-suspended:no exists path name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 peer-node-id:2 conn-name:xcp-ng-labs-host01 local:ipv4:10.100.0.202:7001 peer:ipv4:10.100.0.200:7001 established:yes exists peer-device name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 peer-node-id:0 conn-name:xcp-ng-labs-host02 volume:0 replication:Established peer-disk:UpToDate peer-client:no resync-suspended:no exists path name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 peer-node-id:0 conn-name:xcp-ng-labs-host02 local:ipv4:10.100.0.202:7001 peer:ipv4:10.100.0.201:7001 established:yes exists -
Will be sending the debug file as a DM.
Edit: Just as a sanity check, I tried to reboot the master instead of just restarting the toolstack, and the linstor SR seems to be working as expected again. The XOSTOR tab in XOA now populates (it just errored out before) and the SR scan now goes through.
Edit2: Was able to move a VDI, but then, the same exact error started to happen again. No idea why.
-
RE: XOSTOR hyperconvergence preview
@ronan-a Since XOSTOR is supposed to be stable now, I figured I would try it out with a new setup of 3 newly installed 8.2 nodes.
I used the CLI to deploy it. It all went well, and the SR was quickly ready. I was even able to migrate a disk to the Linstor SR and boot the VM. However, after rebooting the master, it seems like the SR doesn't want to allow any disk migration, and manual Scan are failing. I've tried unmounting/remounting the SR fully, restarting the toolstack, but nothing seems to help. The disk that was on Linstor is still accessible and the VM is able to boot.
Here is the error I'm getting:
sr.scan { "id": "e1a9bf4d-26ad-3ef6-b4a5-db98d012e0d9" } { "code": "SR_BACKEND_FAILURE_47", "params": [ "", "The SR is not available [opterr=Database is not mounted]", "" ], "task": { "uuid": "a467bd90-8d47-09cc-b8ac-afa35056ff25", "name_label": "Async.SR.scan", "name_description": "", "allowed_operations": [], "current_operations": {}, "created": "20240502T21:40:00Z", "finished": "20240502T21:40:01Z", "status": "failure", "resident_on": "OpaqueRef:b3e2f390-f45f-4614-a150-1eee53f204e1", "progress": 1, "type": "<none/>", "result": "", "error_info": [ "SR_BACKEND_FAILURE_47", "", "The SR is not available [opterr=Database is not mounted]", "" ], "other_config": {}, "subtask_of": "OpaqueRef:NULL", "subtasks": [], "backtrace": "(((process xapi)(filename lib/backtrace.ml)(line 210))((process xapi)(filename ocaml/xapi/storage_access.ml)(line 32))((process xapi)(filename lib/xapi-stdext-pervasives/pervasiveext.ml)(line 24))((process xapi)(filename lib/xapi-stdext-pervasives/pervasiveext.ml)(line 35))((process xapi)(filename ocaml/xapi/message_forwarding.ml)(line 131))((process xapi)(filename lib/xapi-stdext-pervasives/pervasiveext.ml)(line 24))((process xapi)(filename ocaml/xapi/rbac.ml)(line 205))((process xapi)(filename ocaml/xapi/server_helpers.ml)(line 95)))" }, "message": "SR_BACKEND_FAILURE_47(, The SR is not available [opterr=Database is not mounted], )", "name": "XapiError", "stack": "XapiError: SR_BACKEND_FAILURE_47(, The SR is not available [opterr=Database is not mounted], ) at Function.wrap (file:///opt/xo/xo-builds/xen-orchestra-202404270302/packages/xen-api/_XapiError.mjs:16:12) at default (file:///opt/xo/xo-builds/xen-orchestra-202404270302/packages/xen-api/_getTaskResult.mjs:11:29) at Xapi._addRecordToCache (file:///opt/xo/xo-builds/xen-orchestra-202404270302/packages/xen-api/index.mjs:1029:24) at file:///opt/xo/xo-builds/xen-orchestra-202404270302/packages/xen-api/index.mjs:1063:14 at Array.forEach (<anonymous>) at Xapi._processEvents (file:///opt/xo/xo-builds/xen-orchestra-202404270302/packages/xen-api/index.mjs:1053:12) at Xapi._watchEvents (file:///opt/xo/xo-builds/xen-orchestra-202404270302/packages/xen-api/index.mjs:1226:14)" }
I quickly glanced over the source code and the SM logs to see if I could identify what was going on but it doesn't seem to be a simple thing.
Logs from SM:
May 2 13:22:02 xcp-ng-labs-host01 SM: [19242] LinstorSR.scan for e1a9bf4d-26ad-3ef6-b4a5-db98d012e0d9 May 2 13:22:02 xcp-ng-labs-host01 SM: [19242] Raising exception [47, The SR is not available [opterr=Database is not mounted]] May 2 13:22:02 xcp-ng-labs-host01 SM: [19242] lock: released /var/lock/sm/e1a9bf4d-26ad-3ef6-b4a5-db98d012e0d9/sr May 2 13:22:02 xcp-ng-labs-host01 SM: [19242] ***** generic exception: sr_scan: EXCEPTION <class 'SR.SROSError'>, The SR is not available [opterr=Database is not mounted] May 2 13:22:02 xcp-ng-labs-host01 SM: [19242] File "/opt/xensource/sm/SRCommand.py", line 110, in run May 2 13:22:02 xcp-ng-labs-host01 SM: [19242] return self._run_locked(sr) May 2 13:22:02 xcp-ng-labs-host01 SM: [19242] File "/opt/xensource/sm/SRCommand.py", line 159, in _run_locked May 2 13:22:02 xcp-ng-labs-host01 SM: [19242] rv = self._run(sr, target) May 2 13:22:02 xcp-ng-labs-host01 SM: [19242] File "/opt/xensource/sm/SRCommand.py", line 364, in _run May 2 13:22:02 xcp-ng-labs-host01 SM: [19242] return sr.scan(self.params['sr_uuid']) May 2 13:22:02 xcp-ng-labs-host01 SM: [19242] File "/opt/xensource/sm/LinstorSR", line 536, in wrap May 2 13:22:02 xcp-ng-labs-host01 SM: [19242] return load(self, *args, **kwargs) May 2 13:22:02 xcp-ng-labs-host01 SM: [19242] File "/opt/xensource/sm/LinstorSR", line 521, in load May 2 13:22:02 xcp-ng-labs-host01 SM: [19242] return wrapped_method(self, *args, **kwargs) May 2 13:22:02 xcp-ng-labs-host01 SM: [19242] File "/opt/xensource/sm/LinstorSR", line 381, in wrapped_method May 2 13:22:02 xcp-ng-labs-host01 SM: [19242] return method(self, *args, **kwargs) May 2 13:22:02 xcp-ng-labs-host01 SM: [19242] File "/opt/xensource/sm/LinstorSR", line 777, in scan May 2 13:22:02 xcp-ng-labs-host01 SM: [19242] opterr='Database is not mounted' May 2 13:22:02 xcp-ng-labs-host01 SM: [19242]
-
RE: XOSTOR hyperconvergence preview
@ronan-a said in XOSTOR hyperconvergence preview:
@Maelstrom96 We must update our documentation for that, This will probably require executing commands manually during an upgrade.
Any news on that? We're still pretty much blocked until that's figured out.
Also, any news on when it will be officially released?
-
RE: XOSTOR hyperconvergence preview
@Maelstrom96 said in XOSTOR hyperconvergence preview:
Is there a procedure on how we can update our current 8.2 XCP-ng cluster to 8.3? My undertanding is that if I update the host using the ISO, it will effectively wipe all changes that were made to DOM0, including the linstor/sm-linstor packages.
Any input on this @ronan-a?
-
RE: XOSTOR hyperconvergence preview
Is there a procedure on how we can update our current 8.2 XCP-ng cluster to 8.3? My undertanding is that if I update the host using the ISO, it will effectively wipe all changes that were made to DOM0, including the linstor/sm-linstor packages.
-
RE: XOSTOR hyperconvergence preview
@gb-123 said in XOSTOR hyperconvergence preview:
VMs would be using LUKS encryption.
So if only VDI is replicated and hypothetically, if I loose the master node or any other node actually having the VM, then I will have to create the VM again using the replicated disk? Or would it be something like DRBD where there are actually 2 VMs running in Active/Passive mode and there is an automatic switchover ? Or would it be that One VM is running and the second gets automatically started when 1st is down ?
Sorry for the noob questions. I just wanted to be sure of the implementation.
The VM metadata is at the pool level, meaning that you wouldn't have to re-create the VM if the current VM host has a failure. However, memory can't/isn't replicated in the cluster, unless you're doing a live migration which would temporarily replicate the VM memory to the new host, so it can be moved.
DRBD only replicates the VDI, or in other terms, the disk data across the active Linstor members. If the VM is stopped or is terminated because of host failure, you should be able to start it back up on another host in your pool, but by default, this will require manual intervention to start the VM, and will require you to input your encryption password since it will be a cold boot.
If you want the VM to automatically self-start in case of failure, you can use the HA feature of XCP-ng. This wouldn't solve your issue with having to input your encryption password since, like explain earlier, the memory isn't replicated, and it would cold boot from the replicated VDI. Also, keep in mind that enabling HA adds maintenance complexity and might not be worth it.