XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    XOSTOR 8.3 controller crash with guest OSes shutting down filesystem

    Scheduled Pinned Locked Moved XOSTOR
    8 Posts 3 Posters 327 Views 3 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • D Offline
      Dark199
      last edited by

      Hello,

      I am currently testing XOSTOR volume (xcp-ng 8.3 build 11 oct 2024, three hosts) and have experienced a two part problem:.

      1. linstor controller crashed, attaching /var/log/linstor-controller/ErrorReport, excerpt:
        Error message: Failed to start transaction
        Error message:
        Error message: IO Exception: null [90028-197]
        Error message: Reading from nio:/var/lib/linstor/linstordb.mv.db failed; file length 901120 read length 8192 at 0 [1.4.197/1]
        Error message: Input/output error

      as far as I can tell, controller was immediately started on one of remaining hosts, but

      1. linux VMs (all 3 of them) lost access to disk ("Shutting down filesystem"), they're up2date centos 7, here's console screenshot:
        XOSTOR_1.png

      2. After VM reboots, all went back to normal without any other action.

      So it seems the biggest issue was the guest OSes giving up at the time of controller crash.

      ErrorReport-679F8267-00000-000001.log.txt

      Can we do something about it ?

      D ronan-aR 2 Replies Last reply Reply Quote 0
      • D Offline
        Dark199 @Dark199
        last edited by

        Afterwards, I left two VMs using XOSTOR storage, each one on a different host, and "Shutting down fileststem" happened only on one of them, with the following report generated on the linstor controller:

        ErrorReport-67B37339-00000-000000.log.txt

        Kind regards,

        1 Reply Last reply Reply Quote 0
        • olivierlambertO Offline
          olivierlambert Vates 🪐 Co-Founder CEO
          last edited by

          Hi,

          XOSTOR isn't yet supported officially on 8.3.

          D 1 Reply Last reply Reply Quote 0
          • D Offline
            Dark199 @olivierlambert
            last edited by Dark199

            olivierlambert
            Hi,

            Yes, thank you, I am aware of that. I read all the docs/forums available, didn't find anything on the subject and just wanted to share the experience. Should I assume it's a known problem? - after all, that's what betas are for šŸ™‚

            Thanks,

            1 Reply Last reply Reply Quote 0
            • olivierlambertO Offline
              olivierlambert Vates 🪐 Co-Founder CEO
              last edited by

              Good question, maybe ronan-a or dthenot are aware.

              1 Reply Last reply Reply Quote 0
              • ronan-aR Offline
                ronan-a Vates 🪐 XCP-ng Team @Dark199
                last edited by

                Dark199 In practice you should have more info via dmesg or kern.log. I have never seen this error until now, since it impacts VMs, I am afraid it is something quite serious. Are your disks ok? Do you have enough RAM on the Dom-0?

                D 2 Replies Last reply Reply Quote 0
                • D Offline
                  Dark199 @ronan-a
                  last edited by Dark199

                  ronan-a
                  Hello,

                  I am uploading kern.log and drbd-kern.log for both events.

                  drbd-kern.Feb06.log.txt
                  kern.Feb06.log.txt

                  drbd-kern.Feb17.log.txt
                  kern.Feb17.log.txt

                  Disks and RAM are 100% ok. But kernel logs make me wonder how XOSTOR should react for a short network outage ?
                  VMs did have local primary drbd resource (diskful volume, all the data they need was available on a local disk)

                  # linstor resource list
                  ╭─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
                  ā”Š ResourceName                                    ā”Š Node       ā”Š Port ā”Š Usage  ā”Š Conns ā”Š      State ā”Š CreatedOn           ā”Š
                  ā•žā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•”
                  ā”Š xcp-persistent-database                         ā”Š xencc-hp03 ā”Š 7000 ā”Š Unused ā”Š Ok    ā”Š   UpToDate ā”Š 2025-02-02 15:28:19 ā”Š
                  ā”Š xcp-persistent-database                         ā”Š xenrt-1    ā”Š 7000 ā”Š InUse  ā”Š Ok    ā”Š   UpToDate ā”Š 2025-02-02 15:28:18 ā”Š
                  ā”Š xcp-persistent-database                         ā”Š xenrt-2    ā”Š 7000 ā”Š Unused ā”Š Ok    ā”Š   Diskless ā”Š 2025-02-02 15:28:17 ā”Š
                  ā”Š xcp-volume-623a917e-614f-4176-8e58-505248ee9db4 ā”Š xencc-hp03 ā”Š 7004 ā”Š InUse  ā”Š Ok    ā”Š   UpToDate ā”Š 2025-02-02 15:35:18 ā”Š
                  ā”Š xcp-volume-623a917e-614f-4176-8e58-505248ee9db4 ā”Š xenrt-1    ā”Š 7004 ā”Š Unused ā”Š Ok    ā”Š   UpToDate ā”Š 2025-02-02 15:35:17 ā”Š
                  ā”Š xcp-volume-623a917e-614f-4176-8e58-505248ee9db4 ā”Š xenrt-2    ā”Š 7004 ā”Š Unused ā”Š Ok    ā”Š TieBreaker ā”Š 2025-02-02 15:35:17 ā”Š
                  ā”Š xcp-volume-9dd3dc66-aa58-40f2-aa56-14b8846a4278 ā”Š xencc-hp03 ā”Š 7007 ā”Š Unused ā”Š Ok    ā”Š   UpToDate ā”Š 2025-02-04 16:18:46 ā”Š
                  ā”Š xcp-volume-9dd3dc66-aa58-40f2-aa56-14b8846a4278 ā”Š xenrt-1    ā”Š 7007 ā”Š Unused ā”Š Ok    ā”Š   UpToDate ā”Š 2025-02-04 16:18:46 ā”Š
                  ā”Š xcp-volume-9dd3dc66-aa58-40f2-aa56-14b8846a4278 ā”Š xenrt-2    ā”Š 7007 ā”Š Unused ā”Š Ok    ā”Š TieBreaker ā”Š 2025-02-04 16:18:46 ā”Š
                  ā”Š xcp-volume-e9428d9d-97a7-4a37-a2bb-630f8b5f3f0f ā”Š xencc-hp03 ā”Š 7005 ā”Š Unused ā”Š Ok    ā”Š   UpToDate ā”Š 2025-02-02 15:42:40 ā”Š
                  ā”Š xcp-volume-e9428d9d-97a7-4a37-a2bb-630f8b5f3f0f ā”Š xenrt-1    ā”Š 7005 ā”Š InUse  ā”Š Ok    ā”Š   UpToDate ā”Š 2025-02-02 15:42:40 ā”Š
                  ā”Š xcp-volume-e9428d9d-97a7-4a37-a2bb-630f8b5f3f0f ā”Š xenrt-2    ā”Š 7005 ā”Š Unused ā”Š Ok    ā”Š TieBreaker ā”Š 2025-02-02 15:42:39 ā”Š
                  ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
                  
                  1 Reply Last reply Reply Quote 0
                  • D Offline
                    Dark199 @ronan-a
                    last edited by

                    ronan-a
                    [...]
                    64 bytes from 172.27.18.161: icmp_seq=21668 ttl=64 time=0.805 ms
                    64 bytes from 172.27.18.161: icmp_seq=21669 ttl=64 time=0.737 ms
                    64 bytes from 172.27.18.161: icmp_seq=21670 ttl=64 time=0.750 ms
                    64 bytes from 172.27.18.161: icmp_seq=21671 ttl=64 time=0.780 ms
                    64 bytes from 172.27.18.161: icmp_seq=21672 ttl=64 time=0.774 ms
                    64 bytes from 172.27.18.161: icmp_seq=21673 ttl=64 time=0.737 ms
                    64 bytes from 172.27.18.161: icmp_seq=21674 ttl=64 time=0.773 ms
                    64 bytes from 172.27.18.161: icmp_seq=21675 ttl=64 time=0.835 ms
                    64 bytes from 172.27.18.161: icmp_seq=21676 ttl=64 time=0.755 ms
                    1004711/1004716 packets, 0% loss, min/avg/ewma/max = 0.712/1.033/0.775/195.781 ms

                    I am attaching simple ping stats for last 11 days. I don't think we can blame the network šŸ™‚

                    1 Reply Last reply Reply Quote 0
                    • First post
                      Last post