XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login
    1. Home
    2. Maelstrom96
    3. Posts
    Offline
    • Profile
    • Following 0
    • Followers 2
    • Topics 1
    • Posts 38
    • Groups 0

    Posts

    Recent Best Controversial
    • RE: Kioxia CM7 PCIe pass-through crash

      @dinhngtu we couldn't delay any longer and had to deploy the cluster so we swapped the drives with some CD6-R instead. We'll most likely re-explore this later as we have 3 other nodes to deploy soon-ish, and I'll get some logs then.

      posted in Compute
      Maelstrom96M
      Maelstrom96
    • RE: Native Ceph RBD SM driver for XCP-ng

      @olivierlambert you basically replied just after that I noticed that and deleted my message... 🙃

      posted in Development
      Maelstrom96M
      Maelstrom96
    • RE: Kioxia CM7 PCIe pass-through crash

      @dinhngtu The only output in hypervisor.log file is what I sent earlier.

      Here is daemon.log:

      Oct 10 16:30:01 lab-pprod-xen03 systemd[1]: Started Session c18 of user root.
      Oct 10 16:30:01 lab-pprod-xen03 systemd[1]: Starting Session c18 of user root.
      Oct 10 16:30:01 lab-pprod-xen03 systemd[1]: Started Session c19 of user root.
      Oct 10 16:30:01 lab-pprod-xen03 systemd[1]: Starting Session c19 of user root.
      Oct 10 16:30:13 lab-pprod-xen03 tapdisk[20267]: received 'sring disconnect' message (uuid = 0)
      Oct 10 16:30:13 lab-pprod-xen03 tapdisk[20267]: disconnecting domid=6, devid=768
      Oct 10 16:30:13 lab-pprod-xen03 tapdisk[20267]: sending 'sring disconnect rsp' message (uuid = 0)
      Oct 10 16:30:13 lab-pprod-xen03 systemd[1]: Stopping transient unit for varstored-6...
      Oct 10 16:30:13 lab-pprod-xen03 systemd[1]: Stopped transient unit for varstored-6.
      Oct 10 16:30:13 lab-pprod-xen03 qemu-dm-6[20398]: qemu-dm-6: terminating on signal 15 from pid 2169 (/usr/sbin/xenopsd-xc)
      Oct 10 16:30:14 lab-pprod-xen03 /opt/xensource/libexec/xcp-clipboardd[20392]: poll failed because revents=0x11 (qemu socket)
      Oct 10 16:30:14 lab-pprod-xen03 ovs-ofctl: ovs|00001|ofp_port|WARN|Negative value -1 is not a valid port number.
      Oct 10 16:30:14 lab-pprod-xen03 tapdisk[20267]: received 'close' message (uuid = 0)
      Oct 10 16:30:14 lab-pprod-xen03 tapdisk[20267]: nbd: NBD server pause(0x198d410)
      Oct 10 16:30:14 lab-pprod-xen03 tapdisk[20267]: nbd: NBD server pause(0x198d610)
      Oct 10 16:30:14 lab-pprod-xen03 ovs-vsctl: ovs|00001|vsctl|INFO|Called as ovs-vsctl --timeout=30 -- --if-exists del-port vif6.0
      Oct 10 16:30:14 lab-pprod-xen03 ovs-ofctl: ovs|00001|ofp_port|WARN|Negative value -1 is not a valid port number.
      Oct 10 16:30:14 lab-pprod-xen03 ovs-vsctl: ovs|00001|vsctl|INFO|Called as ovs-vsctl --timeout=30 -- --if-exists del-port vif6.0
      Oct 10 16:30:14 lab-pprod-xen03 ovs-ofctl: ovs|00001|ofp_port|WARN|Negative value -1 is not a valid port number.
      Oct 10 16:30:14 lab-pprod-xen03 ovs-vsctl: ovs|00001|vsctl|INFO|Called as ovs-vsctl --timeout=30 -- --if-exists del-port vif6.1
      Oct 10 16:30:14 lab-pprod-xen03 tapdisk[20267]: nbd: NBD server free(0x198d410)
      Oct 10 16:30:14 lab-pprod-xen03 tapdisk[20267]: nbd: NBD server free(0x198d610)
      Oct 10 16:30:14 lab-pprod-xen03 tapdisk[20267]: gaps written/skipped: 444/0
      Oct 10 16:30:14 lab-pprod-xen03 tapdisk[20267]: /var/run/sr-mount/5bd2dee1-cfb7-be70-0326-3f9070c4ca2d/721646c6-7a3f-4909-bde8-70dac75f5361.vhd: b: 25600, a: 2686, f: 2658, n: 11023552
      Oct 10 16:30:14 lab-pprod-xen03 tapdisk[20267]: closed image /var/run/sr-mount/5bd2dee1-cfb7-be70-0326-3f9070c4ca2d/721646c6-7a3f-4909-bde8-70dac75f5361.vhd (0 users, state: 0x00000000, ty$
      Oct 10 16:30:14 lab-pprod-xen03 tapdisk[20267]: sending 'close response' message (uuid = 0)
      Oct 10 16:30:14 lab-pprod-xen03 tapdisk[20267]: received 'detach' message (uuid = 0)
      Oct 10 16:30:14 lab-pprod-xen03 tapdisk[20267]: sending 'detach response' message (uuid = 0)
      Oct 10 16:30:14 lab-pprod-xen03 tapdisk[20267]: tapdisk-log: closing after 0 errors
      Oct 10 16:30:14 lab-pprod-xen03 tapdisk[20267]: tapdisk-syslog: 32 messages, 2739 bytes, xmits: 33, failed: 0, dropped: 0
      Oct 10 16:30:14 lab-pprod-xen03 tapdisk[20267]: tapdisk-control: draining 1 connections
      Oct 10 16:30:14 lab-pprod-xen03 tapdisk[20267]: tapdisk-control: done
      Oct 10 16:30:16 lab-pprod-xen03 tapback[20277]: backend.c:1246 domain removed, exit
      Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: Command line: -controloutfd 8 -controlinfd 9 -mode hvm_build -image /usr/libexec/xen/boot/hvmloader -domid 7 -store_port 5 -store_d$
      Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: Domain Properties: Type HVM, hap 1
      Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: Determined the following parameters from xenstore:
      Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: vcpu/number:4 vcpu/weight:256 vcpu/cap:0
      Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: nx: 1, pae 1, cores-per-socket 0, x86-fip-width 0, nested 0
      Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: apic: 1 acpi: 1 acpi_s4: 0 acpi_s3: 0 tsc_mode: 0 hpet: 1
      Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: nomigrate 0, timeoffset 0 mmio_hole_size 0
      Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: viridian: 0, time_ref_count: 0, reference_tsc: 0 hcall_remote_tlb_flush: 0 apic_assist: 0 crash_ctl: 0 stimer: 0 hcall_ipi: 0
      Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: vcpu/0/affinity:1111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111$
      Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: vcpu/1/affinity:1111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111$
      Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: vcpu/2/affinity:1111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111$
      Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: vcpu/3/affinity:1111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111$
      Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: domainbuilder: detail: xc_dom_allocate: cmdline="", features=""
      Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: domainbuilder: detail: xc_dom_kernel_file: filename="/usr/libexec/xen/boot/hvmloader"
      Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: domainbuilder: detail: xc_dom_malloc_filemap    : 631 kB
      Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: domainbuilder: detail: xc_dom_module_file: filename="/usr/share/ipxe/ipxe.bin"
      Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: domainbuilder: detail: xc_dom_malloc_filemap    : 132 kB
      Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: domainbuilder: detail: xc_dom_boot_xen_init: ver 4.17, caps xen-3.0-x86_64 hvm-3.0-x86_32 hvm-3.0-x86_32p hvm-3.0-x86_64
      Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: domainbuilder: detail: xc_dom_parse_image: called
      Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: domainbuilder: detail: xc_dom_find_loader: trying multiboot-binary loader ...
      Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: domainbuilder: detail: loader probe failed
      Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: domainbuilder: detail: xc_dom_find_loader: trying HVM-generic loader ...
      Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: domainbuilder: detail: loader probe OK
      Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: xc: detail: ELF: phdr: paddr=0x100000 memsz=0x57e24
      Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: xc: detail: ELF: memory: 0x100000 -> 0x157e24
      Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: domainbuilder: detail: xc_dom_compat_check: supported guest type: xen-3.0-x86_64
      Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: domainbuilder: detail: xc_dom_compat_check: supported guest type: hvm-3.0-x86_32 <= matches
      Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: domainbuilder: detail: xc_dom_compat_check: supported guest type: hvm-3.0-x86_32p
      Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: domainbuilder: detail: xc_dom_compat_check: supported guest type: hvm-3.0-x86_64
      Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: Getting RMRRs for device '0000:f1:00.0'
      Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: Getting total MMIO space occupied for device '0000:f1:00.0'
      Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: Getting RMRRs for device '0000:f3:00.0'
      Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: Getting total MMIO space occupied for device '0000:f3:00.0'
      Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: Getting RMRRs for device '0000:21:00.0'
      Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: Getting total MMIO space occupied for device '0000:f3:00.0'
      Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: Getting RMRRs for device '0000:21:00.0'
      Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: Getting total MMIO space occupied for device '0000:21:00.0'
      Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: Getting RMRRs for device '0000:64:00.0'
      Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: Getting total MMIO space occupied for device '0000:64:00.0'
      Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: Getting RMRRs for device '0000:63:00.0'
      Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: Getting total MMIO space occupied for device '0000:63:00.0'
      Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: Getting RMRRs for device '0000:23:00.0'
      Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: Getting total MMIO space occupied for device '0000:23:00.0'
      Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: Getting RMRRs for device '0000:22:00.0'
      Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: Getting total MMIO space occupied for device '0000:22:00.0'
      Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: Calculated provisional MMIO hole size as 0x20000000
      Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: Loaded OVMF from /usr/share/edk2/OVMF-release.fd
      Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: domainbuilder: detail: xc_dom_mem_init: mem 8184 MB, pages 0x1ff800 pages, 4k each
      Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: domainbuilder: detail: xc_dom_mem_init: 0x1ff800 pages
      Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: domainbuilder: detail: xc_dom_boot_mem_init: called
      Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: domainbuilder: detail: range: start=0x0 end=0xe0000000
      Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: domainbuilder: detail: range: start=0x100000000 end=0x21f800000
      Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: xc: detail: PHYSICAL MEMORY ALLOCATION:
      Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: xc: detail:   4KB PAGES: 0x0000000000000200
      Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: xc: detail:   2MB PAGES: 0x00000000000003fb
      Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: xc: detail:   1GB PAGES: 0x0000000000000006
      Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: Final lower MMIO hole size is 0x20000000
      Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: domainbuilder: detail: xc_dom_build_image: called
      Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: domainbuilder: detail: xc_dom_pfn_to_ptr_retcount: domU mapping: pfn 0x100+0x58 at 0x7f6d1170f000
      Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: domainbuilder: detail: xc_dom_alloc_segment:   kernel       : 0x100000 -> 0x157e24  (pfn 0x100 + 0x58 pages)
      Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: xc: detail: ELF: phdr 0 at 0x7f6d0fb1e000 -> 0x7f6d0fb6f200
      Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: domainbuilder: detail: xc_dom_pfn_to_ptr_retcount: domU mapping: pfn 0x158+0x200 at 0x7f6d0f976000
      Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: domainbuilder: detail: xc_dom_alloc_segment:   System Firmware module : 0x158000 -> 0x358000  (pfn 0x158 + 0x200 pages)
      Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: domainbuilder: detail: xc_dom_pfn_to_ptr_retcount: domU mapping: pfn 0x358+0x22 at 0x7f6d116ed000
      Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: domainbuilder: detail: xc_dom_alloc_segment:   module0      : 0x358000 -> 0x379200  (pfn 0x358 + 0x22 pages)
      Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: domainbuilder: detail: xc_dom_pfn_to_ptr_retcount: domU mapping: pfn 0x37a+0x1 at 0x7f6d118cd000
      Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: domainbuilder: detail: xc_dom_alloc_segment:   HVM start info : 0x37a000 -> 0x37a878  (pfn 0x37a + 0x1 pages)
      Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: domainbuilder: detail: xc_dom_build_image  : virt_alloc_end : 0x37b000
      Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: domainbuilder: detail: xc_dom_build_image  : virt_pgtab_end : 0x0
      Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: domainbuilder: detail: xc_dom_boot_image: called
      Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: domainbuilder: detail: domain builder memory footprint
      Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: domainbuilder: detail:    allocated
      Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: domainbuilder: detail:       malloc             : 18525 bytes
      Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: domainbuilder: detail:       anon mmap          : 0 bytes
      Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: domainbuilder: detail:    mapped
      Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: domainbuilder: detail:       file mmap          : 764 kB
      Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: domainbuilder: detail:       domU mmap          : 2540 kB
      Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: domainbuilder: detail: Adding module 0 guest_addr 358000 len 135680
      Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: domainbuilder: detail: vcpu_hvm: called
      Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: domainbuilder: detail: xc_dom_set_gnttab_entry: d7 gnt[0] -> d0 0xfefff
      Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: domainbuilder: detail: xc_dom_set_gnttab_entry: d7 gnt[1] -> d0 0xfeffc
      Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: Parsing '178bfbff-f6fa3203-2e500800-040001f3-0000000f-219c07a9-0040060c-00000000-311ed005-00000010-00000000-18000064-00000000-00000$
      Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: domainbuilder: detail: xc_dom_release: called
      Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: Writing to control: 'result:1044476 1044479#012'
      Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: domainbuilder: detail: xc_dom_release: called
      Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: Writing to control: 'result:1044476 1044479#012'
      Oct 10 16:30:16 lab-pprod-xen03 xenguest-7-build[47808]: All done
      Oct 10 16:30:17 lab-pprod-xen03 ovs-vsctl: ovs|00001|db_ctl_base|ERR|no row "vif7.0" in table Interface
      Oct 10 16:30:17 lab-pprod-xen03 ovs-vsctl: ovs|00001|db_ctl_base|ERR|no row "vif7.1" in table Interface
      Oct 10 16:30:17 lab-pprod-xen03 ovs-vsctl: ovs|00001|vsctl|INFO|Called as ovs-vsctl --timeout=30 -- --if-exists del-port vif7.0
      Oct 10 16:30:17 lab-pprod-xen03 ovs-vsctl: ovs|00001|vsctl|INFO|Called as ovs-vsctl --timeout=30 -- --if-exists del-port vif7.1
      Oct 10 16:30:17 lab-pprod-xen03 ovs-vsctl: ovs|00001|vsctl|INFO|Called as ovs-vsctl --timeout=30 add-port xapi0 vif7.0 -- set interface vif7.0 "external-ids:\"xs-vm-uuid\"=\"ab6fa81f-59d2-$
      Oct 10 16:30:17 lab-pprod-xen03 ovs-vsctl: ovs|00001|vsctl|INFO|Called as ovs-vsctl --timeout=30 add-port xapi3 vif7.1 -- set interface vif7.1 "external-ids:\"xs-vm-uuid\"=\"ab6fa81f-59d2-$
      Oct 10 16:30:18 lab-pprod-xen03 tapdisk[48097]: tapdisk-control: init, 10 x 4k buffers
      Oct 10 16:30:18 lab-pprod-xen03 tapdisk[48097]: I/O queue driver: lio
      Oct 10 16:30:18 lab-pprod-xen03 tapdisk[48097]: I/O queue driver: lio
      Oct 10 16:30:18 lab-pprod-xen03 tapdisk[48097]: tapdisk-log: started, level 0
      Oct 10 16:30:18 lab-pprod-xen03 tapdisk[48097]: Tapdisk running, control on /var/run/blktap-control/ctl48097
      Oct 10 16:30:18 lab-pprod-xen03 tapdisk[48097]: nbd: Set up local unix domain socket on path '/var/run/blktap-control/nbdclient48097'
      Oct 10 16:30:18 lab-pprod-xen03 tapdisk[48097]: received 'attach' message (uuid = 0)
      Oct 10 16:30:18 lab-pprod-xen03 tapdisk[48097]: sending 'attach response' message (uuid = 0)
      Oct 10 16:30:18 lab-pprod-xen03 tapdisk[48097]: received 'open' message (uuid = 0)
      Oct 10 16:30:18 lab-pprod-xen03 tapdisk[48097]: /var/run/sr-mount/5bd2dee1-cfb7-be70-0326-3f9070c4ca2d/721646c6-7a3f-4909-bde8-70dac75f5361.vhd version: tap 0x00010003, b: 25600, a: 2686, $
      Oct 10 16:30:18 lab-pprod-xen03 tapdisk[48097]: opened image /var/run/sr-mount/5bd2dee1-cfb7-be70-0326-3f9070c4ca2d/721646c6-7a3f-4909-bde8-70dac75f5361.vhd (1 users, state: 0x00000001, ty$
      Oct 10 16:30:18 lab-pprod-xen03 tapdisk[48097]: VBD CHAIN:
      Oct 10 16:30:18 lab-pprod-xen03 tapdisk[48097]: /var/run/sr-mount/5bd2dee1-cfb7-be70-0326-3f9070c4ca2d/721646c6-7a3f-4909-bde8-70dac75f5361.vhd: type:vhd(4) storage:ext(2)
      Oct 10 16:30:18 lab-pprod-xen03 tapdisk[48097]: bdev: capacity=104857600 sector_size=512/512 flags=0
      Oct 10 16:30:18 lab-pprod-xen03 tapdisk[48097]: nbd: Set up local unix domain socket on path '/var/run/blktap-control/nbdserver48097.0'
      Oct 10 16:30:18 lab-pprod-xen03 tapdisk[48097]: nbd: registering for unix_listening_fd
      Oct 10 16:30:18 lab-pprod-xen03 tapdisk[48097]: nbd: Successfully started NBD server on /var/run/blktap-control/nbd-old48097.0
      Oct 10 16:30:18 lab-pprod-xen03 tapdisk[48097]: nbd: Set up local unix domain socket on path '/var/run/blktap-control/nbdserver-new48097.0'
      Oct 10 16:30:18 lab-pprod-xen03 tapdisk[48097]: nbd: registering for unix_listening_fd
      Oct 10 16:30:18 lab-pprod-xen03 tapdisk[48097]: nbd: Successfully started NBD server on /var/run/blktap-control/nbd48097.0
      Oct 10 16:30:18 lab-pprod-xen03 tapdisk[48097]: sending 'open response' message (uuid = 0)
      Oct 10 16:30:18 lab-pprod-xen03 tapback[48107]: tapback.c:445 slave tapback daemon started, only serving domain 7
      Oct 10 16:30:18 lab-pprod-xen03 tapback[48107]: backend.c:406 768 physical_device_changed
      Oct 10 16:30:18 lab-pprod-xen03 tapback[48107]: backend.c:406 768 physical_device_changed
      Oct 10 16:30:18 lab-pprod-xen03 tapback[48107]: backend.c:492 768 found tapdisk[48097], for 254:0
      Oct 10 16:30:18 lab-pprod-xen03 tapdisk[48097]: received 'disk info' message (uuid = 0)
      Oct 10 16:30:18 lab-pprod-xen03 tapdisk[48097]: VBD 0 got disk info: sectors=104857600 sector size=512, info=0
      Oct 10 16:30:18 lab-pprod-xen03 tapdisk[48097]: sending 'disk info rsp' message (uuid = 0)
      Oct 10 16:30:18 lab-pprod-xen03 systemd[1]: Started transient unit for varstored-7.
      Oct 10 16:30:18 lab-pprod-xen03 systemd[1]: Starting transient unit for varstored-7...
      Oct 10 16:30:18 lab-pprod-xen03 varstored-7[48166]: main: --domain = '7'
      Oct 10 16:30:18 lab-pprod-xen03 varstored-7[48166]: main: --chroot = '/var/run/xen/varstored-root-7'
      Oct 10 16:30:18 lab-pprod-xen03 varstored-7[48166]: main: --depriv = '(null)'
      Oct 10 16:30:18 lab-pprod-xen03 varstored-7[48166]: main: --uid = '65542'
      Oct 10 16:30:18 lab-pprod-xen03 varstored-7[48166]: main: --gid = '998'
      Oct 10 16:30:18 lab-pprod-xen03 varstored-7[48166]: main: --backend = 'xapidb'
      Oct 10 16:30:18 lab-pprod-xen03 varstored-7[48166]: main: --arg = 'socket:/xapi-depriv-socket'
      Oct 10 16:30:18 lab-pprod-xen03 varstored-7[48166]: main: --pidfile = '/var/run/xen/varstored-7.pid'
      Oct 10 16:30:18 lab-pprod-xen03 varstored-7[48166]: main: --arg = 'uuid:ab6fa81f-59d2-8bb1-fdf8-35969838ec7a'
      Oct 10 16:30:18 lab-pprod-xen03 varstored-7[48166]: main: --arg = 'save:/efi-vars-save.dat'
      Oct 10 16:30:18 lab-pprod-xen03 varstored-7[48166]: varstored_initialize: 4 vCPU(s)
      Oct 10 16:30:18 lab-pprod-xen03 varstored-7[48166]: main: --arg = 'save:/efi-vars-save.dat'
      Oct 10 16:30:18 lab-pprod-xen03 varstored-7[48166]: varstored_initialize: 4 vCPU(s)
      Oct 10 16:30:18 lab-pprod-xen03 varstored-7[48166]: varstored_initialize: ioservid = 0
      Oct 10 16:30:18 lab-pprod-xen03 varstored-7[48166]: varstored_initialize: iopage = 0x7f5b175d1000
      Oct 10 16:30:18 lab-pprod-xen03 varstored-7[48166]: varstored_initialize: VCPU0: 7 -> 356
      Oct 10 16:30:18 lab-pprod-xen03 varstored-7[48166]: varstored_initialize: VCPU1: 8 -> 357
      Oct 10 16:30:18 lab-pprod-xen03 varstored-7[48166]: varstored_initialize: VCPU2: 9 -> 358
      Oct 10 16:30:18 lab-pprod-xen03 varstored-7[48166]: varstored_initialize: VCPU3: 10 -> 359
      Oct 10 16:30:18 lab-pprod-xen03 varstored-7[48166]: load_one_auth_data: Auth file '/var/lib/varstored/dbx.auth' is missing!
      Oct 10 16:30:18 lab-pprod-xen03 varstored-7[48166]: load_one_auth_data: Auth file '/var/lib/varstored/db.auth' is missing!
      Oct 10 16:30:18 lab-pprod-xen03 varstored-7[48166]: load_one_auth_data: Auth file '/var/lib/varstored/KEK.auth' is missing!
      Oct 10 16:30:18 lab-pprod-xen03 varstored-7[48166]: initialize_settings: Secure boot enable: false
      Oct 10 16:30:18 lab-pprod-xen03 varstored-7[48166]: initialize_settings: Authenticated variables: enforcing
      Oct 10 16:30:18 lab-pprod-xen03 varstored-7[48166]: IO request not ready
      Oct 10 16:30:18 lab-pprod-xen03 varstored-7[48166]: message repeated 3 times: [ IO request not ready]
      Oct 10 16:30:18 lab-pprod-xen03 forkexecd: [ info||0 ||forkexecd] qemu-dm-7[48182]: Arguments: 7 --syslog -std-vga -videoram 8 -vnc unix:/var/run/xen/vnc-7,lock-key-sync=off -acpi -priv -m$
      Oct 10 16:30:18 lab-pprod-xen03 forkexecd: [ info||0 ||forkexecd] qemu-dm-7[48182]: Exec: /usr/lib64/xen/bin/qemu-system-i386 qemu-dm-7 -machine pc-i440fx-2.10,accel=xen,max-ram-below-4g=3$
      Oct 10 16:30:18 lab-pprod-xen03 qemu-dm-7[48225]: Moving to cgroup slice 'vm.slice'
      Oct 10 16:30:18 lab-pprod-xen03 qemu-dm-7[48225]: core dump limit: 67108864
      Oct 10 16:30:18 lab-pprod-xen03 qemu-dm-7[48225]: char device redirected to /dev/pts/2 (label serial0)
      Oct 10 16:30:18 lab-pprod-xen03 ovs-vsctl: ovs|00001|db_ctl_base|ERR|no row "tap7.0" in table Interface
      Oct 10 16:30:18 lab-pprod-xen03 ovs-vsctl: ovs|00001|db_ctl_base|ERR|no row "tap7.1" in table Interface
      Oct 10 16:30:18 lab-pprod-xen03 ovs-vsctl: ovs|00001|vsctl|INFO|Called as ovs-vsctl --timeout=30 -- --if-exists del-port tap7.0
      Oct 10 16:30:18 lab-pprod-xen03 ovs-vsctl: ovs|00001|vsctl|INFO|Called as ovs-vsctl --timeout=30 -- --if-exists del-port tap7.1
      Oct 10 16:30:18 lab-pprod-xen03 ovs-vsctl: ovs|00001|vsctl|INFO|Called as ovs-vsctl --timeout=30 add-port xapi0 tap7.0 -- set interface tap7.0 "external-ids:\"xs-vm-uuid\"=\"ab6fa81f-59d2-$
      Oct 10 16:30:18 lab-pprod-xen03 ovs-vsctl: ovs|00001|vsctl|INFO|Called as ovs-vsctl --timeout=30 add-port xapi3 tap7.1 -- set interface tap7.1 "external-ids:\"xs-vm-uuid\"=\"ab6fa81f-59d2-$
      Oct 10 16:30:19 lab-pprod-xen03 qemu-dm-7[48225]: [00:08.0] xen_pt_region_update: Error: create new mem mapping failed! (err: 1)
      Oct 10 16:30:33 lab-pprod-xen03 qemu-dm-7[48225]: Detected Xen version 4.17
      Oct 10 16:30:34 lab-pprod-xen03 qemu-dm-7[48225]: [00:08.0] xen_pt_region_update: Error: remove old mem mapping failed! (err: 1)
      Oct 10 16:30:35 lab-pprod-xen03 qemu-dm-7[48225]: [00:08.0] xen_pt_region_update: Error: create new mem mapping failed! (err: 1)
      Oct 10 16:30:35 lab-pprod-xen03 qemu-dm-7[48225]: [00:08.0] xen_pt_region_update: Error: remove old mem mapping failed! (err: 1)
      Oct 10 16:30:36 lab-pprod-xen03 qemu-dm-7[48225]: [00:08.0] xen_pt_region_update: Error: create new mem mapping failed! (err: 1)
      Oct 10 16:30:37 lab-pprod-xen03 tapdisk[48097]: received 'sring connect' message (uuid = 0)
      Oct 10 16:30:37 lab-pprod-xen03 tapdisk[48097]: connecting VBD 0 domid=7, devid=768, pool (null), evt 16, poll duration 1000, poll idle threshold 50
      Oct 10 16:30:37 lab-pprod-xen03 tapdisk[48097]: ring 0xbed010 connected
      Oct 10 16:30:37 lab-pprod-xen03 tapdisk[48097]: sending 'sring connect rsp' message (uuid = 0)
      Oct 10 16:30:37 lab-pprod-xen03 qemu-dm-7[48225]: XenPvBlk: New disk with 104857600 sectors of 512 bytes
      Oct 10 16:30:38 lab-pprod-xen03 qemu-dm-7[48225]: About to call StartImage (0xDEC16D18)
      Oct 10 16:30:40 lab-pprod-xen03 qemu-dm-7[48225]: ExitBootServices -> (0xDEC16D18, 0xD9D)
      Oct 10 16:30:40 lab-pprod-xen03 tapdisk[48097]: received 'sring disconnect' message (uuid = 0)
      Oct 10 16:30:40 lab-pprod-xen03 tapdisk[48097]: disconnecting domid=7, devid=768
      Oct 10 16:30:40 lab-pprod-xen03 tapdisk[48097]: sending 'sring disconnect rsp' message (uuid = 0)
      Oct 10 16:30:40 lab-pprod-xen03 qemu-dm-7[48225]: ExitBootServices <- (Success)
      Oct 10 16:30:41 lab-pprod-xen03 qemu-dm-7[48225]: SetVirtualAddressMap -> (0x4B0, 0x30, 0x1)
      Oct 10 16:30:41 lab-pprod-xen03 qemu-dm-7[48225]: SetVirtualAddressMap <- (Success)
      Oct 10 16:30:41 lab-pprod-xen03 ovs-ofctl: ovs|00001|ofp_port|WARN|Negative value -1 is not a valid port number.
      Oct 10 16:30:41 lab-pprod-xen03 ovs-vsctl: ovs|00001|vsctl|INFO|Called as ovs-vsctl --timeout=30 -- --if-exists del-port tap7.1
      Oct 10 16:30:41 lab-pprod-xen03 ovs-ofctl: ovs|00001|ofp_port|WARN|Negative value -1 is not a valid port number.
      Oct 10 16:30:41 lab-pprod-xen03 ovs-vsctl: ovs|00001|vsctl|INFO|Called as ovs-vsctl --timeout=30 -- --if-exists del-port tap7.0
      Oct 10 16:30:41 lab-pprod-xen03 qemu-dm-7[48225]: [00:08.0] xen_pt_region_update: Error: remove old mem mapping failed! (err: 1)
      Oct 10 16:30:41 lab-pprod-xen03 ovs-vsctl: ovs|00001|vsctl|INFO|Called as ovs-vsctl --timeout=30 -- --if-exists del-port tap7.0
      Oct 10 16:30:41 lab-pprod-xen03 qemu-dm-7[48225]: [00:08.0] xen_pt_region_update: Error: remove old mem mapping failed! (err: 1)
      Oct 10 16:30:41 lab-pprod-xen03 qemu-dm-7[48225]: [00:08.0] xen_pt_region_update: Error: create new mem mapping failed! (err: 1)
      Oct 10 16:30:43 lab-pprod-xen03 tapback[48107]: frontend.c:216 768 front-end supports persistent grants but we don't
      Oct 10 16:30:43 lab-pprod-xen03 tapdisk[48097]: received 'sring connect' message (uuid = 0)
      Oct 10 16:30:43 lab-pprod-xen03 tapdisk[48097]: connecting VBD 0 domid=7, devid=768, pool (null), evt 49, poll duration 1000, poll idle threshold 50
      Oct 10 16:30:43 lab-pprod-xen03 tapdisk[48097]: ring 0xbee810 connected
      Oct 10 16:30:43 lab-pprod-xen03 tapdisk[48097]: sending 'sring connect rsp' message (uuid = 0)
      Oct 10 16:30:47 lab-pprod-xen03 systemd[1]: Stopping transient unit for varstored-7...
      Oct 10 16:30:47 lab-pprod-xen03 systemd[1]: Stopped transient unit for varstored-7.
      Oct 10 16:30:47 lab-pprod-xen03 qemu-dm-7[48225]: qemu-dm-7: terminating on signal 15 from pid 2169 (/usr/sbin/xenopsd-xc)
      Oct 10 16:30:47 lab-pprod-xen03 /opt/xensource/libexec/xcp-clipboardd[48221]: poll failed because revents=0x11 (qemu socket)
      Oct 10 16:30:47 lab-pprod-xen03 tapdisk[48097]: received 'sring disconnect' message (uuid = 0)
      Oct 10 16:30:47 lab-pprod-xen03 tapdisk[48097]: disconnecting domid=7, devid=768
      Oct 10 16:30:47 lab-pprod-xen03 tapdisk[48097]: sending 'sring disconnect rsp' message (uuid = 0)
      Oct 10 16:30:47 lab-pprod-xen03 tapdisk[48097]: received 'close' message (uuid = 0)
      Oct 10 16:30:47 lab-pprod-xen03 tapdisk[48097]: nbd: NBD server pause(0xbfe410)
      Oct 10 16:30:47 lab-pprod-xen03 tapdisk[48097]: nbd: NBD server pause(0xbfe610)
      Oct 10 16:30:48 lab-pprod-xen03 tapdisk[48097]: nbd: NBD server free(0xbfe410)
      Oct 10 16:30:48 lab-pprod-xen03 tapdisk[48097]: nbd: NBD server free(0xbfe610)
      Oct 10 16:30:48 lab-pprod-xen03 tapdisk[48097]: gaps written/skipped: 2/0
      Oct 10 16:30:48 lab-pprod-xen03 tapdisk[48097]: /var/run/sr-mount/5bd2dee1-cfb7-be70-0326-3f9070c4ca2d/721646c6-7a3f-4909-bde8-70dac75f5361.vhd: b: 25600, a: 2686, f: 2658, n: 11023552
      Oct 10 16:30:48 lab-pprod-xen03 tapdisk[48097]: closed image /var/run/sr-mount/5bd2dee1-cfb7-be70-0326-3f9070c4ca2d/721646c6-7a3f-4909-bde8-70dac75f5361.vhd (0 users, state: 0x00000000, ty$
      Oct 10 16:30:48 lab-pprod-xen03 tapdisk[48097]: sending 'close response' message (uuid = 0)
      Oct 10 16:30:48 lab-pprod-xen03 tapdisk[48097]: received 'detach' message (uuid = 0)
      Oct 10 16:30:48 lab-pprod-xen03 tapdisk[48097]: sending 'detach response' message (uuid = 0)
      Oct 10 16:30:48 lab-pprod-xen03 tapdisk[48097]: tapdisk-log: closing after 0 errors
      Oct 10 16:30:48 lab-pprod-xen03 tapdisk[48097]: tapdisk-syslog: 32 messages, 2735 bytes, xmits: 33, failed: 0, dropped: 0
      Oct 10 16:30:48 lab-pprod-xen03 ovs-ofctl: ovs|00001|ofp_port|WARN|Negative value -1 is not a valid port number.
      Oct 10 16:30:48 lab-pprod-xen03 ovs-vsctl: ovs|00001|vsctl|INFO|Called as ovs-vsctl --timeout=30 -- --if-exists del-port vif7.1
      Oct 10 16:30:48 lab-pprod-xen03 tapdisk[48097]: tapdisk-control: draining 1 connections
      Oct 10 16:30:48 lab-pprod-xen03 tapdisk[48097]: tapdisk-control: done
      Oct 10 16:30:48 lab-pprod-xen03 ovs-ofctl: ovs|00001|ofp_port|WARN|Negative value -1 is not a valid port number.
      Oct 10 16:30:48 lab-pprod-xen03 ovs-vsctl: ovs|00001|vsctl|INFO|Called as ovs-vsctl --timeout=30 -- --if-exists del-port vif7.0
      Oct 10 16:30:49 lab-pprod-xen03 tapback[48107]: backend.c:1246 domain removed, exit
      

      @olivierlambert SKU is KCMYXRUG15T3

      posted in Compute
      Maelstrom96M
      Maelstrom96
    • RE: Kioxia CM7 PCIe pass-through crash

      @TeddyAstie

      I've checked an I do not have any patch available for the host:

      [16:06 lab-pprod-xen03 ~]# yum update
      Loaded plugins: fastestmirror
      Loading mirror speeds from cached hostfile
      Excluding mirror: updates.xcp-ng.org
       * xcp-ng-base: mirrors.xcp-ng.org
      Excluding mirror: updates.xcp-ng.org
       * xcp-ng-updates: mirrors.xcp-ng.org
      No packages marked for update
      [16:06 lab-pprod-xen03 ~]# yum upgrade
      Loaded plugins: fastestmirror
      Loading mirror speeds from cached hostfile
      Excluding mirror: updates.xcp-ng.org
       * xcp-ng-base: mirrors.xcp-ng.org
      Excluding mirror: updates.xcp-ng.org
       * xcp-ng-updates: mirrors.xcp-ng.org
      No packages marked for update
      [16:06 lab-pprod-xen03 ~]#
      

      I've run the commands, but there doesn't seem to be additionnal logs. Should I be looking elsewhere?

      [2025-10-10 16:28:15] (XEN) [ 4959.062050] 'G' pressed -> guest log level adjustments enabled
      [2025-10-10 16:28:16] (XEN) [ 4960.384372] '+' pressed -> guest log level: Errors (rate limited Errors and warnings)
      [2025-10-10 16:30:47] (XEN) [ 5110.535350] domain_crash called from svm_vmexit_handler+0x129f/0x1480
      [2025-10-10 16:30:47] (XEN) [ 5110.535351] Domain 7 (vcpu#3) crashed on cpu#3:
      [2025-10-10 16:30:47] (XEN) [ 5110.535354] ----[ Xen-4.17.5-15  x86_64  debug=n  Not tainted ]----
      [2025-10-10 16:30:47] (XEN) [ 5110.535354] CPU:    3
      [2025-10-10 16:30:47] (XEN) [ 5110.535355] RIP:    0010:[<ffffffffa6d1b5d0>]
      [2025-10-10 16:30:47] (XEN) [ 5110.535356] RFLAGS: 0000000000000286   CONTEXT: hvm guest (d7v3)
      [2025-10-10 16:30:47] (XEN) [ 5110.535358] rax: ffff9e0280081200   rbx: ffff9e028007fba4   rcx: 0000000000000000
      [2025-10-10 16:30:47] (XEN) [ 5110.535359] rdx: 00000000fee97000   rsi: 0000000000000000   rdi: 0000000000000000
      [2025-10-10 16:30:47] (XEN) [ 5110.535359] rbp: ffff8dc094984780   rsp: ffff9e028007fb58   r8:  0000000000000000
      [2025-10-10 16:30:47] (XEN) [ 5110.535360] r9:  0000000000000000   r10: ffff9e028007fb18   r11: 0000000000000000
      [2025-10-10 16:30:47] (XEN) [ 5110.535361] r12: 0000000000000197   r13: ffff8dc040bd40c8   r14: 0000000000000011
      [2025-10-10 16:30:47] (XEN) [ 5110.535361] r15: 0000000000000001   cr0: 0000000080050033   cr4: 0000000000770ef0
      [2025-10-10 16:30:47] (XEN) [ 5110.535362] cr3: 000000010200c006   cr2: 0000000000000000
      [2025-10-10 16:30:47] (XEN) [ 5110.535362] fsb: 0000000000000000   gsb: ffff8dc157580000   gss: 0000000000000000
      [2025-10-10 16:30:47] (XEN) [ 5110.535363] ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0018   cs: 0010
      
      
      
      posted in Compute
      Maelstrom96M
      Maelstrom96
    • RE: Kioxia CM7 PCIe pass-through crash

      @TeddyAstie It's a newly deployed Rocky Linux 9.6 with all the latest updates applied to it.

      Nested virtualization is disabled.

      posted in Compute
      Maelstrom96M
      Maelstrom96
    • RE: Coral TPU PCI Passthrough

      @andSmv Any news on a test build with the patch? I'm wondering if this issue is related and would love to be able to test.

      posted in Compute
      Maelstrom96M
      Maelstrom96
    • Kioxia CM7 PCIe pass-through crash

      I'm having a weird issue with PCIe pass-through for our KIOXIA CM7 drives. We have a bunch of KIOXIA CX6 drives and those are being pass-through without any issue.

      First thing I've noticed is that the device ID is not being properly recognized, and instead of showing the CM7 name, it's only displaying the device as Device 0013.

      I've tried raising the IRQ limit as explained in the doc. /opt/xensource/libexec/xen-cmdline --set-xen "extra_guest_irqs=128"

      Here are some logs from hypervisor.log when the VM crashes during boot.

      [2025-10-08 19:15:52] (XEN) [   93.430177] d[IDLE]v14: Unsupported MSI delivery mode 7 for Dom1
      [2025-10-08 19:15:52] (XEN) [   93.436563] d[IDLE]v14: Unsupported MSI delivery mode 7 for Dom1
      [2025-10-08 19:15:52] (XEN) [   93.439733] d1v0: Unsupported MSI delivery mode 7 for Dom1
      [2025-10-08 19:15:52] (XEN) [   93.448323] d1v0: Unsupported MSI delivery mode 7 for Dom1
      [2025-10-08 19:15:52] (XEN) [   93.448801] d1v0: Unsupported MSI delivery mode 7 for Dom1
      [2025-10-08 19:15:52] (XEN) [   93.457235] d1v0: Unsupported MSI delivery mode 7 for Dom1
      [2025-10-08 19:27:23] (XEN) [  784.468669] domain_crash called from svm_vmexit_handler+0x129f/0x1480
      [2025-10-08 19:27:23] (XEN) [  784.468671] Domain 3 (vcpu#1) crashed on cpu#1:
      [2025-10-08 19:27:23] (XEN) [  784.468673] ----[ Xen-4.17.5-15  x86_64  debug=n  Not tainted ]----
      [2025-10-08 19:27:23] (XEN) [  784.468674] CPU:    1
      [2025-10-08 19:27:23] (XEN) [  784.468674] RIP:    0010:[<ffffffffa751b5d0>]
      [2025-10-08 19:27:23] (XEN) [  784.468675] RFLAGS: 0000000000000286   CONTEXT: hvm guest (d3v1)
      [2025-10-08 19:27:23] (XEN) [  784.468676] rax: ffffb90bc0071200   rbx: ffffb90bc007fba4   rcx: 0000000000000000
      [2025-10-08 19:27:23] (XEN) [  784.468677] rdx: 00000000fee97000   rsi: 0000000000000000   rdi: 0000000000000000
      [2025-10-08 19:27:23] (XEN) [  784.468678] rbp: ffff944cba5c9a80   rsp: ffffb90bc007fb58   r8:  0000000000000000
      [2025-10-08 19:27:23] (XEN) [  784.468678] r9:  0000000000000000   r10: ffffb90bc007fb18   r11: 0000000000000000
      [2025-10-08 19:27:23] (XEN) [  784.468679] r12: 0000000000000197   r13: ffff944c80c830c8   r14: 0000000000000011
      [2025-10-08 19:27:23] (XEN) [  784.468679] r15: 0000000000000001   cr0: 0000000080050033   cr4: 0000000000770ef0
      [2025-10-08 19:27:23] (XEN) [  784.468680] cr3: 00000001105dc006   cr2: 0000000000000000
      [2025-10-08 19:27:23] (XEN) [  784.468680] fsb: 0000000000000000   gsb: ffff944d97480000   gss: 0000000000000000
      [2025-10-08 19:27:23] (XEN) [  784.468681] ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0018   cs: 0010
      

      The unsupported MSI delivery mode seems to be from our CX6 drives, but they seem to be working fine.

      Here is a lspci output for one of the drive:

      lspci -s e1:00.0 -vv
      e1:00.0 Non-Volatile memory controller: KIOXIA Corporation Device 0013 (rev 01) (prog-if 02 [NVM Express])
              Subsystem: KIOXIA Corporation Device 0043
              Physical Slot: 0-2
              Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
              Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
              Latency: 0, Cache Line Size: 64 bytes
              Interrupt: pin A routed to IRQ 182
              Region 0: Memory at f2810000 (64-bit, non-prefetchable) [size=64K]
              Expansion ROM at f2800000 [disabled] [size=64K]
              Capabilities: [40] Power Management version 3
                      Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                      Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
              Capabilities: [70] Express (v2) Endpoint, MSI 00
                      DevCap: MaxPayload 1024 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
                              ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 75.000W
                      DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported-
                              RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
                              MaxPayload 512 bytes, MaxReadReq 512 bytes
                      DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
                      LnkCap: Port #0, Speed unknown, Width x4, ASPM not supported, Exit Latency L0s <2us, L1 <64us
                              ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
                      LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
                              ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                      LnkSta: Speed 16GT/s, Width x4, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                      DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR-, OBFF Not Supported
                      DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
                      LnkCtl2: Target Link Speed: Unknown, EnterCompliance- SpeedDis-
                               Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                               Compliance De-emphasis: -6dB
                      LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+, EqualizationPhase1+
                               EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
              Capabilities: [b0] MSI-X: Enable- Count=129 Masked-
                      Vector table: BAR=0 offset=00005200
                      PBA: BAR=0 offset=0000d600
              Capabilities: [d0] Vital Product Data
                      Product Name: KIOXIA ESSD
                      Read-only fields:
                              [PN] Part number: KIOXIA KCMYXRUG15T3
                              [EC] Engineering changes: 0001
                              [SN] Serial number: 3DH0A00A0LP1
                              [MN] Manufacture ID: 31 45 30 46
                              [RV] Reserved: checksum good, 26 byte(s) reserved
                      End
              Capabilities: [100 v2] Advanced Error Reporting
                      UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
                      UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
                      UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO+ CmpltAbrt- UnxCmplt+ RxOF+ MalfTLP+ ECRC+ UnsupReq- ACSViol-
                      CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                      CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
                      AERCap: First Error Pointer: 00, GenCap+ CGenEn+ ChkCap+ ChkEn+
              Capabilities: [148 v1] Device Serial Number 8c-e3-8e-e3-00-32-1f-01
              Capabilities: [168 v1] Alternative Routing-ID Interpretation (ARI)
                      ARICap: MFVC- ACS-, Next Function: 0
                      ARICtl: MFVC- ACS-, Function Group: 0
              Capabilities: [178 v1] #19
              Capabilities: [198 v1] #26
              Capabilities: [1c0 v1] #27
              Capabilities: [1e8 v1] #2a
              Capabilities: [210 v1] Single Root I/O Virtualization (SR-IOV)
                      IOVCap: Migration-, Interrupt Message Number: 000
                      IOVCtl: Enable- Migration- Interrupt- MSE- ARIHierarchy+
                      IOVSta: Migration-
                      Initial VFs: 32, Total VFs: 32, Number of VFs: 0, Function Dependency Link: 00
                      VF offset: 1, stride: 1, Device ID: 0013
                      Supported Page Size: 00000553, System Page Size: 00000001
                      Region 0: Memory at 00000000f2600000 (64-bit, non-prefetchable)
                      VF Migration: offset: 00000000, BIR: 0
              Kernel driver in use: pciback
              Kernel modules: nvme
      

      The only thing I haven't tried to do yet is enable SR-IOV, but I don't think that would really change anything.

      Thanks in advance for anyone chiming in!

      posted in Compute
      Maelstrom96M
      Maelstrom96
    • RE: Icon appears in the XOA interface

      @olivierlambert Thanks for the suggestion! The problem is I have no idea where I would start to build it for Slackware. I'll see if I can figure it out but with my research, I'm not quite sure I'll be able to.

      posted in Management
      Maelstrom96M
      Maelstrom96
    • RE: Icon appears in the XOA interface

      @eurodrigolira Sorry to revive this topic, but do you have pointers on how to build a slackware package for xe-guest-utilities? I'm trying to add the VM guest tools to UnRAID and I'm not having much luck.

      posted in Management
      Maelstrom96M
      Maelstrom96
    • RE: Three-node Networking for XOSTOR

      @T3CCH What you might be looking for: https://xcp-ng.org/docs/networking.html#full-mesh-network

      posted in XOSTOR
      Maelstrom96M
      Maelstrom96
    • RE: XOSTOR hyperconvergence preview

      @ronan-a Thanks a lot for that procedure.

      Ended up needing to do a little bit more, since for some reason, "evacuate" failed. I deleted the node and then went and just manually recreated my resources using:

      linstor resource create --auto-place +1 <resource_name>
      

      Which didn't work at first because the new node didn't have a storage-pool configured, which required this command to work (NOTE - This is only valid if your SR was setup as thin):

      linstor storage-pool create lvmthin <node_name> xcp-sr-linstor_group_thin_device linstor_group/thin_device
      

      Also, worth nothing that before actually re-creating the resources, you might want to manually clean up the lingering Logical Volumes that weren't cleaned up if evacuate failed.

      Find volumes with:

      lvdisplay
      

      and then delete them with:

      lvremove <LV Path>
      

      example:

      lvremove /dev/linstor_group/xcp-persistent-database_00000
      
      posted in XOSTOR
      Maelstrom96M
      Maelstrom96
    • RE: XOSTOR hyperconvergence preview

      @ronan-a Do you know of a way to update a node name in Linstor? I've tried to look in their documentation and checked through CLI commands but couldn't find a way.

      posted in XOSTOR
      Maelstrom96M
      Maelstrom96
    • RE: XOSTOR hyperconvergence preview

      @ronan-a I will be testing my theory a little bit later today, but I believe it might be a hostname mismatch between the node name it expects in linstor and what it set to now on Dom0. We had the hostname of the node updated before the cluster was spinned up, but I think it still had the previous name active when the linstor SR was created.

      This means that the node name doesn't match here:
      https://github.com/xcp-ng/sm/blob/e951676098c80e6da6de4d4653f496b15f5a8cb9/drivers/linstorvolumemanager.py#L2641C21-L2641C41

      I will try to revert the hostname and see if it fixes everything.

      Edit: Just tested and reverted the hostname to the default one, which matches what's in linstor, and it works again. So seems like changing a hostname after the cluster is provisionned is a no-no.

      posted in XOSTOR
      Maelstrom96M
      Maelstrom96
    • RE: XOSTOR hyperconvergence preview

      @ronan-a said in XOSTOR hyperconvergence preview:

      drbdsetup events2

      Host1:

      [09:49 xcp-ng-labs-host01 ~]# systemctl status linstor-controller
      ● linstor-controller.service - drbd-reactor controlled linstor-controller
         Loaded: loaded (/usr/lib/systemd/system/linstor-controller.service; disabled; vendor preset: disabled)
        Drop-In: /run/systemd/system/linstor-controller.service.d
                 └─reactor.conf
         Active: active (running) since Thu 2024-05-02 13:24:32 PDT; 20h ago
       Main PID: 21340 (java)
         CGroup: /system.slice/linstor-controller.service
                 └─21340 /usr/lib/jvm/jre-11/bin/java -Xms32M -classpath /usr/share/linstor-server/lib/conf:/usr/share/linstor-server/lib/* com.linbit.linstor.core.Controller --logs=/var/log/linstor-controller --config-directory=/etc/linstor
      [09:49 xcp-ng-labs-host01 ~]# systemctl status linstor-satellite
      ● linstor-satellite.service - LINSTOR Satellite Service
         Loaded: loaded (/usr/lib/systemd/system/linstor-satellite.service; enabled; vendor preset: disabled)
        Drop-In: /etc/systemd/system/linstor-satellite.service.d
                 └─override.conf
         Active: active (running) since Wed 2024-05-01 16:04:05 PDT; 1 day 17h ago
       Main PID: 1947 (java)
         CGroup: /system.slice/linstor-satellite.service
                 ├─1947 /usr/lib/jvm/jre-11/bin/java -Xms32M -classpath /usr/share/linstor-server/lib/conf:/usr/share/linstor-server/lib/* com.linbit.linstor.core.Satellite --logs=/var/log/linstor-satellite --config-directory=/etc/linstor
                 ├─2109 drbdsetup events2 all
                 └─2347 /usr/sbin/dmeventd
      [09:49 xcp-ng-labs-host01 ~]# systemctl status drbd-reactor
      ● drbd-reactor.service - DRBD-Reactor Service
         Loaded: loaded (/usr/lib/systemd/system/drbd-reactor.service; enabled; vendor preset: disabled)
        Drop-In: /etc/systemd/system/drbd-reactor.service.d
                 └─override.conf
         Active: active (running) since Wed 2024-05-01 16:04:11 PDT; 1 day 17h ago
           Docs: man:drbd-reactor
                 man:drbd-reactorctl
                 man:drbd-reactor.toml
       Main PID: 1950 (drbd-reactor)
         CGroup: /system.slice/drbd-reactor.service
                 ├─1950 /usr/sbin/drbd-reactor
                 └─1976 drbdsetup events2 --full --poll
      [09:49 xcp-ng-labs-host01 ~]# mountpoint /var/lib/linstor
      /var/lib/linstor is a mountpoint
      [09:49 xcp-ng-labs-host01 ~]# drbdsetup events2
      exists resource name:xcp-persistent-database role:Primary suspended:no force-io-failures:no may_promote:no promotion_score:10103
      exists connection name:xcp-persistent-database peer-node-id:1 conn-name:xcp-ng-labs-host03 connection:Connected role:Secondary
      exists connection name:xcp-persistent-database peer-node-id:2 conn-name:xcp-ng-labs-host02 connection:Connected role:Secondary
      exists device name:xcp-persistent-database volume:0 minor:1000 backing_dev:/dev/linstor_group/xcp-persistent-database_00000 disk:UpToDate client:no quorum:yes
      exists peer-device name:xcp-persistent-database peer-node-id:1 conn-name:xcp-ng-labs-host03 volume:0 replication:Established peer-disk:UpToDate peer-client:no resync-suspended:no
      exists path name:xcp-persistent-database peer-node-id:1 conn-name:xcp-ng-labs-host03 local:ipv4:10.100.0.200:7000 peer:ipv4:10.100.0.202:7000 established:yes
      exists peer-device name:xcp-persistent-database peer-node-id:2 conn-name:xcp-ng-labs-host02 volume:0 replication:Established peer-disk:UpToDate peer-client:no resync-suspended:no
      exists path name:xcp-persistent-database peer-node-id:2 conn-name:xcp-ng-labs-host02 local:ipv4:10.100.0.200:7000 peer:ipv4:10.100.0.201:7000 established:yes
      exists resource name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 role:Secondary suspended:no force-io-failures:no may_promote:no promotion_score:10103
      exists connection name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 peer-node-id:1 conn-name:xcp-ng-labs-host03 connection:Connected role:Secondary
      exists connection name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 peer-node-id:0 conn-name:xcp-ng-labs-host02 connection:Connected role:Primary
      exists device name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 volume:0 minor:1001 backing_dev:/dev/linstor_group/xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0_00000 disk:UpToDate client:no quorum:yes
      exists peer-device name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 peer-node-id:1 conn-name:xcp-ng-labs-host03 volume:0 replication:Established peer-disk:UpToDate peer-client:no resync-suspended:no
      exists path name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 peer-node-id:1 conn-name:xcp-ng-labs-host03 local:ipv4:10.100.0.200:7001 peer:ipv4:10.100.0.202:7001 established:yes
      exists peer-device name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 peer-node-id:0 conn-name:xcp-ng-labs-host02 volume:0 replication:Established peer-disk:UpToDate peer-client:no resync-suspended:no
      exists path name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 peer-node-id:0 conn-name:xcp-ng-labs-host02 local:ipv4:10.100.0.200:7001 peer:ipv4:10.100.0.201:7001 established:yes
      exists -
      

      Host2:

      [09:51 xcp-ng-labs-host02 ~]# systemctl status linstor-controller
      ● linstor-controller.service - drbd-reactor controlled linstor-controller
         Loaded: loaded (/usr/lib/systemd/system/linstor-controller.service; disabled; vendor preset: disabled)
        Drop-In: /run/systemd/system/linstor-controller.service.d
                 └─reactor.conf
         Active: inactive (dead)
      [09:51 xcp-ng-labs-host02 ~]# systemctl status linstor-satellite
      ● linstor-satellite.service - LINSTOR Satellite Service
         Loaded: loaded (/usr/lib/systemd/system/linstor-satellite.service; enabled; vendor preset: disabled)
        Drop-In: /etc/systemd/system/linstor-satellite.service.d
                 └─override.conf
         Active: active (running) since Thu 2024-05-02 10:26:59 PDT; 23h ago
       Main PID: 1990 (java)
         CGroup: /system.slice/linstor-satellite.service
                 ├─1990 /usr/lib/jvm/jre-11/bin/java -Xms32M -classpath /usr/share/linstor-server/lib/conf:/usr/share/linstor-server/lib/* com.linbit.linstor.core.Satellite --logs=/var/log/linstor-satellite --config-directory=/etc/linstor
                 ├─2128 drbdsetup events2 all
                 └─2552 /usr/sbin/dmeventd
      [09:51 xcp-ng-labs-host02 ~]# systemctl status drbd-reactor
      ● drbd-reactor.service - DRBD-Reactor Service
         Loaded: loaded (/usr/lib/systemd/system/drbd-reactor.service; enabled; vendor preset: disabled)
        Drop-In: /etc/systemd/system/drbd-reactor.service.d
                 └─override.conf
         Active: active (running) since Thu 2024-05-02 10:27:07 PDT; 23h ago
           Docs: man:drbd-reactor
                 man:drbd-reactorctl
                 man:drbd-reactor.toml
       Main PID: 1989 (drbd-reactor)
         CGroup: /system.slice/drbd-reactor.service
                 ├─1989 /usr/sbin/drbd-reactor
                 └─2035 drbdsetup events2 --full --poll
      [09:51 xcp-ng-labs-host02 ~]# mountpoint /var/lib/linstor
      /var/lib/linstor is not a mountpoint
      [09:51 xcp-ng-labs-host02 ~]# drbdsetup events2
      exists resource name:xcp-persistent-database role:Secondary suspended:no force-io-failures:no may_promote:no promotion_score:10103
      exists connection name:xcp-persistent-database peer-node-id:0 conn-name:xcp-ng-labs-host01 connection:Connected role:Primary
      exists connection name:xcp-persistent-database peer-node-id:1 conn-name:xcp-ng-labs-host03 connection:Connected role:Secondary
      exists device name:xcp-persistent-database volume:0 minor:1000 backing_dev:/dev/linstor_group/xcp-persistent-database_00000 disk:UpToDate client:no quorum:yes
      exists peer-device name:xcp-persistent-database peer-node-id:0 conn-name:xcp-ng-labs-host01 volume:0 replication:Established peer-disk:UpToDate peer-client:no resync-suspended:no
      exists path name:xcp-persistent-database peer-node-id:0 conn-name:xcp-ng-labs-host01 local:ipv4:10.100.0.201:7000 peer:ipv4:10.100.0.200:7000 established:yes
      exists peer-device name:xcp-persistent-database peer-node-id:1 conn-name:xcp-ng-labs-host03 volume:0 replication:Established peer-disk:UpToDate peer-client:no resync-suspended:no
      exists path name:xcp-persistent-database peer-node-id:1 conn-name:xcp-ng-labs-host03 local:ipv4:10.100.0.201:7000 peer:ipv4:10.100.0.202:7000 established:yes
      exists resource name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 role:Primary suspended:no force-io-failures:no may_promote:no promotion_score:10103
      exists connection name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 peer-node-id:2 conn-name:xcp-ng-labs-host01 connection:Connected role:Secondary
      exists connection name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 peer-node-id:1 conn-name:xcp-ng-labs-host03 connection:Connected role:Secondary
      exists device name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 volume:0 minor:1001 backing_dev:/dev/linstor_group/xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0_00000 disk:UpToDate client:no quorum:yes
      exists peer-device name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 peer-node-id:2 conn-name:xcp-ng-labs-host01 volume:0 replication:Established peer-disk:UpToDate peer-client:no resync-suspended:no
      exists path name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 peer-node-id:2 conn-name:xcp-ng-labs-host01 local:ipv4:10.100.0.201:7001 peer:ipv4:10.100.0.200:7001 established:yes
      exists peer-device name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 peer-node-id:1 conn-name:xcp-ng-labs-host03 volume:0 replication:Established peer-disk:UpToDate peer-client:no resync-suspended:no
      exists path name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 peer-node-id:1 conn-name:xcp-ng-labs-host03 local:ipv4:10.100.0.201:7001 peer:ipv4:10.100.0.202:7001 established:yes
      exists -
      

      Host3:

      [09:51 xcp-ng-labs-host03 ~]# systemctl status linstor-controller
      ● linstor-controller.service - drbd-reactor controlled linstor-controller
         Loaded: loaded (/usr/lib/systemd/system/linstor-controller.service; disabled; vendor preset: disabled)
        Drop-In: /run/systemd/system/linstor-controller.service.d
                 └─reactor.conf
         Active: inactive (dead)
      [09:52 xcp-ng-labs-host03 ~]# systemctl status linstor-satellite
      ● linstor-satellite.service - LINSTOR Satellite Service
         Loaded: loaded (/usr/lib/systemd/system/linstor-satellite.service; enabled; vendor preset: disabled)
        Drop-In: /etc/systemd/system/linstor-satellite.service.d
                 └─override.conf
         Active: active (running) since Thu 2024-05-02 10:10:16 PDT; 23h ago
       Main PID: 1937 (java)
         CGroup: /system.slice/linstor-satellite.service
                 ├─1937 /usr/lib/jvm/jre-11/bin/java -Xms32M -classpath /usr/share/linstor-server/lib/conf:/usr/share/linstor-server/lib/* com.linbit.linstor.core.Satellite --logs=/var/log/linstor-satellite --config-directory=/etc/linstor
                 ├─2151 drbdsetup events2 all
                 └─2435 /usr/sbin/dmeventd
      [09:52 xcp-ng-labs-host03 ~]# systemctl status drbd-reactor
      ● drbd-reactor.service - DRBD-Reactor Service
         Loaded: loaded (/usr/lib/systemd/system/drbd-reactor.service; enabled; vendor preset: disabled)
        Drop-In: /etc/systemd/system/drbd-reactor.service.d
                 └─override.conf
         Active: active (running) since Thu 2024-05-02 10:10:26 PDT; 23h ago
           Docs: man:drbd-reactor
                 man:drbd-reactorctl
                 man:drbd-reactor.toml
       Main PID: 1939 (drbd-reactor)
         CGroup: /system.slice/drbd-reactor.service
                 ├─1939 /usr/sbin/drbd-reactor
                 └─1981 drbdsetup events2 --full --poll
      [09:52 xcp-ng-labs-host03 ~]# mountpoint /var/lib/linstor
      /var/lib/linstor is not a mountpoint
      [09:52 xcp-ng-labs-host03 ~]# drbdsetup events2
      exists resource name:xcp-persistent-database role:Secondary suspended:no force-io-failures:no may_promote:no promotion_score:10103
      exists connection name:xcp-persistent-database peer-node-id:0 conn-name:xcp-ng-labs-host01 connection:Connected role:Primary
      exists connection name:xcp-persistent-database peer-node-id:2 conn-name:xcp-ng-labs-host02 connection:Connected role:Secondary
      exists device name:xcp-persistent-database volume:0 minor:1000 backing_dev:/dev/linstor_group/xcp-persistent-database_00000 disk:UpToDate client:no quorum:yes
      exists peer-device name:xcp-persistent-database peer-node-id:0 conn-name:xcp-ng-labs-host01 volume:0 replication:Established peer-disk:UpToDate peer-client:no resync-suspended:no
      exists path name:xcp-persistent-database peer-node-id:0 conn-name:xcp-ng-labs-host01 local:ipv4:10.100.0.202:7000 peer:ipv4:10.100.0.200:7000 established:yes
      exists peer-device name:xcp-persistent-database peer-node-id:2 conn-name:xcp-ng-labs-host02 volume:0 replication:Established peer-disk:UpToDate peer-client:no resync-suspended:no
      exists path name:xcp-persistent-database peer-node-id:2 conn-name:xcp-ng-labs-host02 local:ipv4:10.100.0.202:7000 peer:ipv4:10.100.0.201:7000 established:yes
      exists resource name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 role:Secondary suspended:no force-io-failures:no may_promote:no promotion_score:10103
      exists connection name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 peer-node-id:2 conn-name:xcp-ng-labs-host01 connection:Connected role:Secondary
      exists connection name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 peer-node-id:0 conn-name:xcp-ng-labs-host02 connection:Connected role:Primary
      exists device name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 volume:0 minor:1001 backing_dev:/dev/linstor_group/xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0_00000 disk:UpToDate client:no quorum:yes
      exists peer-device name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 peer-node-id:2 conn-name:xcp-ng-labs-host01 volume:0 replication:Established peer-disk:UpToDate peer-client:no resync-suspended:no
      exists path name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 peer-node-id:2 conn-name:xcp-ng-labs-host01 local:ipv4:10.100.0.202:7001 peer:ipv4:10.100.0.200:7001 established:yes
      exists peer-device name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 peer-node-id:0 conn-name:xcp-ng-labs-host02 volume:0 replication:Established peer-disk:UpToDate peer-client:no resync-suspended:no
      exists path name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 peer-node-id:0 conn-name:xcp-ng-labs-host02 local:ipv4:10.100.0.202:7001 peer:ipv4:10.100.0.201:7001 established:yes
      exists -
      
      

      Will be sending the debug file as a DM.

      Edit: Just as a sanity check, I tried to reboot the master instead of just restarting the toolstack, and the linstor SR seems to be working as expected again. The XOSTOR tab in XOA now populates (it just errored out before) and the SR scan now goes through.

      Edit2: Was able to move a VDI, but then, the same exact error started to happen again. No idea why.

      posted in XOSTOR
      Maelstrom96M
      Maelstrom96
    • RE: XOSTOR hyperconvergence preview

      @ronan-a Since XOSTOR is supposed to be stable now, I figured I would try it out with a new setup of 3 newly installed 8.2 nodes.

      I used the CLI to deploy it. It all went well, and the SR was quickly ready. I was even able to migrate a disk to the Linstor SR and boot the VM. However, after rebooting the master, it seems like the SR doesn't want to allow any disk migration, and manual Scan are failing. I've tried unmounting/remounting the SR fully, restarting the toolstack, but nothing seems to help. The disk that was on Linstor is still accessible and the VM is able to boot.

      Here is the error I'm getting:

      sr.scan
      {
        "id": "e1a9bf4d-26ad-3ef6-b4a5-db98d012e0d9"
      }
      {
        "code": "SR_BACKEND_FAILURE_47",
        "params": [
          "",
          "The SR is not available [opterr=Database is not mounted]",
          ""
        ],
        "task": {
          "uuid": "a467bd90-8d47-09cc-b8ac-afa35056ff25",
          "name_label": "Async.SR.scan",
          "name_description": "",
          "allowed_operations": [],
          "current_operations": {},
          "created": "20240502T21:40:00Z",
          "finished": "20240502T21:40:01Z",
          "status": "failure",
          "resident_on": "OpaqueRef:b3e2f390-f45f-4614-a150-1eee53f204e1",
          "progress": 1,
          "type": "<none/>",
          "result": "",
          "error_info": [
            "SR_BACKEND_FAILURE_47",
            "",
            "The SR is not available [opterr=Database is not mounted]",
            ""
          ],
          "other_config": {},
          "subtask_of": "OpaqueRef:NULL",
          "subtasks": [],
          "backtrace": "(((process xapi)(filename lib/backtrace.ml)(line 210))((process xapi)(filename ocaml/xapi/storage_access.ml)(line 32))((process xapi)(filename lib/xapi-stdext-pervasives/pervasiveext.ml)(line 24))((process xapi)(filename lib/xapi-stdext-pervasives/pervasiveext.ml)(line 35))((process xapi)(filename ocaml/xapi/message_forwarding.ml)(line 131))((process xapi)(filename lib/xapi-stdext-pervasives/pervasiveext.ml)(line 24))((process xapi)(filename ocaml/xapi/rbac.ml)(line 205))((process xapi)(filename ocaml/xapi/server_helpers.ml)(line 95)))"
        },
        "message": "SR_BACKEND_FAILURE_47(, The SR is not available [opterr=Database is not mounted], )",
        "name": "XapiError",
        "stack": "XapiError: SR_BACKEND_FAILURE_47(, The SR is not available [opterr=Database is not mounted], )
          at Function.wrap (file:///opt/xo/xo-builds/xen-orchestra-202404270302/packages/xen-api/_XapiError.mjs:16:12)
          at default (file:///opt/xo/xo-builds/xen-orchestra-202404270302/packages/xen-api/_getTaskResult.mjs:11:29)
          at Xapi._addRecordToCache (file:///opt/xo/xo-builds/xen-orchestra-202404270302/packages/xen-api/index.mjs:1029:24)
          at file:///opt/xo/xo-builds/xen-orchestra-202404270302/packages/xen-api/index.mjs:1063:14
          at Array.forEach (<anonymous>)
          at Xapi._processEvents (file:///opt/xo/xo-builds/xen-orchestra-202404270302/packages/xen-api/index.mjs:1053:12)
          at Xapi._watchEvents (file:///opt/xo/xo-builds/xen-orchestra-202404270302/packages/xen-api/index.mjs:1226:14)"
      }
      

      I quickly glanced over the source code and the SM logs to see if I could identify what was going on but it doesn't seem to be a simple thing.

      Logs from SM:

      May  2 13:22:02 xcp-ng-labs-host01 SM: [19242] LinstorSR.scan for e1a9bf4d-26ad-3ef6-b4a5-db98d012e0d9
      May  2 13:22:02 xcp-ng-labs-host01 SM: [19242] Raising exception [47, The SR is not available [opterr=Database is not mounted]]
      May  2 13:22:02 xcp-ng-labs-host01 SM: [19242] lock: released /var/lock/sm/e1a9bf4d-26ad-3ef6-b4a5-db98d012e0d9/sr
      May  2 13:22:02 xcp-ng-labs-host01 SM: [19242] ***** generic exception: sr_scan: EXCEPTION <class 'SR.SROSError'>, The SR is not available [opterr=Database is not mounted]
      May  2 13:22:02 xcp-ng-labs-host01 SM: [19242]   File "/opt/xensource/sm/SRCommand.py", line 110, in run
      May  2 13:22:02 xcp-ng-labs-host01 SM: [19242]     return self._run_locked(sr)
      May  2 13:22:02 xcp-ng-labs-host01 SM: [19242]   File "/opt/xensource/sm/SRCommand.py", line 159, in _run_locked
      May  2 13:22:02 xcp-ng-labs-host01 SM: [19242]     rv = self._run(sr, target)
      May  2 13:22:02 xcp-ng-labs-host01 SM: [19242]   File "/opt/xensource/sm/SRCommand.py", line 364, in _run
      May  2 13:22:02 xcp-ng-labs-host01 SM: [19242]     return sr.scan(self.params['sr_uuid'])
      May  2 13:22:02 xcp-ng-labs-host01 SM: [19242]   File "/opt/xensource/sm/LinstorSR", line 536, in wrap
      May  2 13:22:02 xcp-ng-labs-host01 SM: [19242]     return load(self, *args, **kwargs)
      May  2 13:22:02 xcp-ng-labs-host01 SM: [19242]   File "/opt/xensource/sm/LinstorSR", line 521, in load
      May  2 13:22:02 xcp-ng-labs-host01 SM: [19242]     return wrapped_method(self, *args, **kwargs)
      May  2 13:22:02 xcp-ng-labs-host01 SM: [19242]   File "/opt/xensource/sm/LinstorSR", line 381, in wrapped_method
      May  2 13:22:02 xcp-ng-labs-host01 SM: [19242]     return method(self, *args, **kwargs)
      May  2 13:22:02 xcp-ng-labs-host01 SM: [19242]   File "/opt/xensource/sm/LinstorSR", line 777, in scan
      May  2 13:22:02 xcp-ng-labs-host01 SM: [19242]     opterr='Database is not mounted'
      May  2 13:22:02 xcp-ng-labs-host01 SM: [19242]
      
      posted in XOSTOR
      Maelstrom96M
      Maelstrom96
    • RE: XOSTOR hyperconvergence preview

      @ronan-a said in XOSTOR hyperconvergence preview:

      @Maelstrom96 We must update our documentation for that, This will probably require executing commands manually during an upgrade.

      Any news on that? We're still pretty much blocked until that's figured out.

      Also, any news on when it will be officially released?

      posted in XOSTOR
      Maelstrom96M
      Maelstrom96
    • RE: XOSTOR hyperconvergence preview

      @Maelstrom96 said in XOSTOR hyperconvergence preview:

      Is there a procedure on how we can update our current 8.2 XCP-ng cluster to 8.3? My undertanding is that if I update the host using the ISO, it will effectively wipe all changes that were made to DOM0, including the linstor/sm-linstor packages.

      Any input on this @ronan-a?

      posted in XOSTOR
      Maelstrom96M
      Maelstrom96
    • RE: XOSTOR hyperconvergence preview

      Is there a procedure on how we can update our current 8.2 XCP-ng cluster to 8.3? My undertanding is that if I update the host using the ISO, it will effectively wipe all changes that were made to DOM0, including the linstor/sm-linstor packages.

      posted in XOSTOR
      Maelstrom96M
      Maelstrom96
    • RE: XOSTOR hyperconvergence preview

      @gb-123 said in XOSTOR hyperconvergence preview:

      @ronan-a

      VMs would be using LUKS encryption.

      So if only VDI is replicated and hypothetically, if I loose the master node or any other node actually having the VM, then I will have to create the VM again using the replicated disk? Or would it be something like DRBD where there are actually 2 VMs running in Active/Passive mode and there is an automatic switchover ? Or would it be that One VM is running and the second gets automatically started when 1st is down ?

      Sorry for the noob questions. I just wanted to be sure of the implementation.

      The VM metadata is at the pool level, meaning that you wouldn't have to re-create the VM if the current VM host has a failure. However, memory can't/isn't replicated in the cluster, unless you're doing a live migration which would temporarily replicate the VM memory to the new host, so it can be moved.

      DRBD only replicates the VDI, or in other terms, the disk data across the active Linstor members. If the VM is stopped or is terminated because of host failure, you should be able to start it back up on another host in your pool, but by default, this will require manual intervention to start the VM, and will require you to input your encryption password since it will be a cold boot.

      If you want the VM to automatically self-start in case of failure, you can use the HA feature of XCP-ng. This wouldn't solve your issue with having to input your encryption password since, like explain earlier, the memory isn't replicated, and it would cold boot from the replicated VDI. Also, keep in mind that enabling HA adds maintenance complexity and might not be worth it.

      posted in XOSTOR
      Maelstrom96M
      Maelstrom96