XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Lost access to all servers

    Scheduled Pinned Locked Moved Compute
    36 Posts 6 Posters 9.2k Views 6 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • F Offline
      fred974 @ronan-a
      last edited by fred974

      ronan-a said in Lost access to all servers:

      well first, how many hosts do you have?

      We have 4x hosts.
      Host1 was the original master (host2 is new master) and I think the DRBD replication count is 3 (how can I double check?)
      Host1:

      [21:15 uk ~]# drbdsetup status xcp-persistent-database
      xcp-persistent-database role:Secondary
        disk:Diskless quorum:no
        uk.dc1.xcp-ng-hyper2 connection:Connecting
        uk.dc1.xcp-ng-hyper3 connection:Connecting
        uk.dc1.xcp-ng-hyper4 connection:Connecting
      

      Host2, 3 and 4 has

      [21:18 uk ~]# drbdsetup status xcp-persistent-database
      # No currently configured DRBD found.
      xcp-persistent-database: No such resource
      

      kern.log files host1
      host1_kern.log.txt

      kern.log files host2
      host2_kern.log.txt

      kern.log files host3
      host3_kern.log.txt

      kern.log files host4
      host4_kern.log.txt

      Our monitor reported the first VM been down at 11am which is reflected in the log file. We also have ourly snapshot so I was wondering if this could also been the reason why. I hope the file above can help us understand the issue. Also, should I put host1 back as master?

      Thank you

      ronan-aR 1 Reply Last reply Reply Quote 0
      • ronan-aR Offline
        ronan-a Vates 🪐 XCP-ng Team @fred974
        last edited by

        fred974 I'll take a look at the logs. Thanks. What's the ouput of lvs? If the database is not active, execute: vgchange -ay linstor_group.

        F 1 Reply Last reply Reply Quote 0
        • F Offline
          fred974 @ronan-a
          last edited by

          ronan-a said in Lost access to all servers:

          Thanks. What's the ouput of lvs

          host1

          [11:25 uk ~]# lvs
            Device read short 82432 bytes remaining
            Device read short 40960 bytes remaining
            Device read short 40960 bytes remaining
            Device read short 40960 bytes remaining
            Device read short 40960 bytes remaining
            Device read short 98304 bytes remaining
            LV                                                    VG                                                 Attr       LSize    Pool        Origin Data%  Meta%  Move Log Cpy%Sync Convert
            MGT                                                   VG_XenStorage-f7d16827-19e0-c57d-a720-c7fba180d4af -wi-------    4.00m                                                           
            28b8eb58-a6a2-c2fa-ad1e-b339b531330f                  XSLocalEXT-28b8eb58-a6a2-c2fa-ad1e-b339b531330f    -wi-ao---- <517.40g                                                           
            thin_device                                           linstor_group                                      twi-aotz--    2.18t                    3.28   12.11                           
            xcp-persistent-ha-statefile_00000                     linstor_group                                      Vwi-a-tz--    8.00m thin_device        50.00                                  
            xcp-persistent-redo-log_00000                         linstor_group                                      Vwi-a-tz--  260.00m thin_device        2.31                                   
            xcp-volume-126b1370-6042-40ce-8184-22a771fbf1e4_00000 linstor_group                                      Vwi-a-tz--  <40.10g thin_device        0.11                                   
            xcp-volume-24413a81-84b6-4242-a245-6076d5670bb4_00000 linstor_group                                      Vwi-a-tz--   20.05g thin_device        42.69                                  
            xcp-volume-3c45b809-33b7-40a3-a602-01e1511327e7_00000 linstor_group                                      Vwi-a-tz--   10.03g thin_device        45.23                                  
            xcp-volume-46caa8c3-2585-4296-a756-2d96cf2141df_00000 linstor_group                                      Vwi-a-tz--  <40.10g thin_device        8.10                                   
            xcp-volume-6b125e04-7134-40e3-85ce-19417c186ac5_00000 linstor_group                                      Vwi-a-tz--  <50.12g thin_device        54.75                                  
            xcp-volume-78744846-0432-44e6-a135-021f6b5dc072_00000 linstor_group                                      Vwi-a-tz--   20.00m thin_device        90.00                                  
            xcp-volume-7a83c50f-6bd5-4d7e-89a5-c3dee95bdd0b_00000 linstor_group                                      Vwi-a-tz--  <40.10g thin_device        1.13                                   
            xcp-volume-91a4068e-73cc-402f-87eb-2f631e66d6e2_00000 linstor_group                                      Vwi-a-tz--   20.05g thin_device        64.23                                  
            xcp-volume-ac921e7e-71ab-4ee9-8d61-5a55fe5fc369_00000 linstor_group                                      Vwi-a-tz--  <40.10g thin_device        0.11                                   
            xcp-volume-c1a113f6-9d1e-45f6-9b7d-656327523ce3_00000 linstor_group                                      Vwi-a-tz--   20.00m thin_device        90.00                                  
            xcp-volume-c31378ba-1ec6-4756-ab54-67c49b2ecd51_00000 linstor_group                                      Vwi-a-tz--  <40.10g thin_device        17.40                                  
            xcp-volume-d017c7e9-c2bc-422e-a94c-580d7001f5d0_00000 linstor_group                                      Vwi-a-tz--   20.00m thin_device        90.00                                  
            xcp-volume-d0b249b1-fb43-4013-bd6e-67c5fbdcd9b5_00000 linstor_group                                      Vwi-a-tz--   20.00m thin_device        90.00                                  
            xcp-volume-d8d37107-bb28-4884-9fb2-e771b4df1c70_00000 linstor_group                                      Vwi-a-tz--   10.03g thin_device        45.33                                  
            xcp-volume-fb436790-958a-46a3-b38f-aaca7d6738c8_00000 linstor_group                                      Vwi-a-tz--  <50.12g thin_device        0.11                                   
            xcp-volume-fec259ee-bee0-4118-b5ba-09035aad8ca2_00000 linstor_group                                      Vwi-a-tz--   10.03g thin_device        44.47
          

          host2

          [11:28 uk ~]# lvs
            Device read short 82432 bytes remaining
            Device read short 40960 bytes remaining
            Device read short 40960 bytes remaining
            Device read short 40960 bytes remaining
            Device read short 40960 bytes remaining
            Device read short 98304 bytes remaining
            LV                                                    VG                                                 Attr       LSize    Pool        Origin Data%  Meta%  Move Log Cpy%Sync Convert
            MGT                                                   VG_XenStorage-f7d16827-19e0-c57d-a720-c7fba180d4af -wi-a-----    4.00m                                                           
            ae8a3b6f-b412-0294-43f6-6c11250c6927                  XSLocalEXT-ae8a3b6f-b412-0294-43f6-6c11250c6927    -wi-ao---- <517.40g                                                           
            thin_device                                           linstor_group                                      twi-aotz--    2.18t                    3.49   12.21                           
            xcp-persistent-database_00000                         linstor_group                                      Vwi-a-tz--    1.00g thin_device        6.03                                   
            xcp-persistent-ha-statefile_00000                     linstor_group                                      Vwi-a-tz--    8.00m thin_device        50.00                                  
            xcp-persistent-redo-log_00000                         linstor_group                                      Vwi-a-tz--  260.00m thin_device        2.31                                   
            xcp-volume-24413a81-84b6-4242-a245-6076d5670bb4_00000 linstor_group                                      Vwi-a-tz--   20.05g thin_device        42.69                                  
            xcp-volume-2ac918c0-1feb-4ad9-97d6-dcc561832b5d_00000 linstor_group                                      Vwi-a-tz--   10.03g thin_device        73.19                                  
            xcp-volume-3c45b809-33b7-40a3-a602-01e1511327e7_00000 linstor_group                                      Vwi-a-tz--   10.03g thin_device        45.23                                  
            xcp-volume-3e025e5e-c339-4e0c-b8ca-eb4e509ce24d_00000 linstor_group                                      Vwi-a-tz--   <4.02g thin_device        51.99                                  
            xcp-volume-46caa8c3-2585-4296-a756-2d96cf2141df_00000 linstor_group                                      Vwi-a-tz--  <40.10g thin_device        8.10                                   
            xcp-volume-6b125e04-7134-40e3-85ce-19417c186ac5_00000 linstor_group                                      Vwi-a-tz--  <50.12g thin_device        54.75                                  
            xcp-volume-78744846-0432-44e6-a135-021f6b5dc072_00000 linstor_group                                      Vwi-a-tz--   20.00m thin_device        90.00                                  
            xcp-volume-7a83c50f-6bd5-4d7e-89a5-c3dee95bdd0b_00000 linstor_group                                      Vwi-a-tz--  <40.10g thin_device        1.13                                   
            xcp-volume-9e32d56e-7c7d-443b-955a-57015b968375_00000 linstor_group                                      Vwi-a-tz--    6.02g thin_device        96.63                                  
            xcp-volume-ac921e7e-71ab-4ee9-8d61-5a55fe5fc369_00000 linstor_group                                      Vwi-a-tz--  <40.10g thin_device        0.11                                   
            xcp-volume-c176df5f-5ef6-46b3-841e-93ab0b5af30e_00000 linstor_group                                      Vwi-a-tz--  <40.10g thin_device        11.33                                  
            xcp-volume-c31378ba-1ec6-4756-ab54-67c49b2ecd51_00000 linstor_group                                      Vwi-a-tz--  <40.10g thin_device        17.40                                  
            xcp-volume-d017c7e9-c2bc-422e-a94c-580d7001f5d0_00000 linstor_group                                      Vwi-a-tz--   20.00m thin_device        90.00                                  
            xcp-volume-f6901916-1ce0-4757-88b6-642b96c4ab80_00000 linstor_group                                      Vwi-a-tz--  <40.10g thin_device        6.03                                   
            xcp-volume-fb436790-958a-46a3-b38f-aaca7d6738c8_00000 linstor_group                                      Vwi-a-tz--  <50.12g thin_device        0.11                                   
            xcp-volume-fec259ee-bee0-4118-b5ba-09035aad8ca2_00000 linstor_group                                      Vwi-a-tz--   10.03g thin_device        44.47
          

          host3

          [11:25 uk ~]# lvs
            Device read short 82432 bytes remaining
            Device read short 40960 bytes remaining
            Device read short 40960 bytes remaining
            Device read short 40960 bytes remaining
            Device read short 40960 bytes remaining
            Device read short 98304 bytes remaining
            LV                                                    VG                                                 Attr       LSize    Pool        Origin Data%  Meta%  Move Log Cpy%Sync Convert
            MGT                                                   VG_XenStorage-f7d16827-19e0-c57d-a720-c7fba180d4af -wi-------    4.00m                                                           
            5792308f-7a3c-e62d-07c5-21ac24d3a56a                  XSLocalEXT-5792308f-7a3c-e62d-07c5-21ac24d3a56a    -wi-ao---- <517.40g                                                           
            thin_device                                           linstor_group                                      twi-aotz--    2.18t                    3.54   12.23                           
            xcp-persistent-database_00000                         linstor_group                                      Vwi-a-tz--    1.00g thin_device        6.03                                   
            xcp-persistent-ha-statefile_00000                     linstor_group                                      Vwi-a-tz--    8.00m thin_device        50.00                                  
            xcp-persistent-redo-log_00000                         linstor_group                                      Vwi-a-tz--  260.00m thin_device        2.31                                   
            xcp-volume-126b1370-6042-40ce-8184-22a771fbf1e4_00000 linstor_group                                      Vwi-a-tz--  <40.10g thin_device        0.11                                   
            xcp-volume-24413a81-84b6-4242-a245-6076d5670bb4_00000 linstor_group                                      Vwi-a-tz--   20.05g thin_device        42.69                                  
            xcp-volume-2ac918c0-1feb-4ad9-97d6-dcc561832b5d_00000 linstor_group                                      Vwi-a-tz--   10.03g thin_device        73.19                                  
            xcp-volume-3e025e5e-c339-4e0c-b8ca-eb4e509ce24d_00000 linstor_group                                      Vwi-a-tz--   <4.02g thin_device        51.99                                  
            xcp-volume-46caa8c3-2585-4296-a756-2d96cf2141df_00000 linstor_group                                      Vwi-a-tz--  <40.10g thin_device        8.10                                   
            xcp-volume-6b125e04-7134-40e3-85ce-19417c186ac5_00000 linstor_group                                      Vwi-a-tz--  <50.12g thin_device        54.75                                  
            xcp-volume-91a4068e-73cc-402f-87eb-2f631e66d6e2_00000 linstor_group                                      Vwi-a-tz--   20.05g thin_device        64.23                                  
            xcp-volume-9e32d56e-7c7d-443b-955a-57015b968375_00000 linstor_group                                      Vwi-a-tz--    6.02g thin_device        96.63                                  
            xcp-volume-c176df5f-5ef6-46b3-841e-93ab0b5af30e_00000 linstor_group                                      Vwi-a-tz--  <40.10g thin_device        11.33                                  
            xcp-volume-c1a113f6-9d1e-45f6-9b7d-656327523ce3_00000 linstor_group                                      Vwi-a-tz--   20.00m thin_device        90.00                                  
            xcp-volume-d0b249b1-fb43-4013-bd6e-67c5fbdcd9b5_00000 linstor_group                                      Vwi-a-tz--   20.00m thin_device        90.00                                  
            xcp-volume-d8d37107-bb28-4884-9fb2-e771b4df1c70_00000 linstor_group                                      Vwi-a-tz--   10.03g thin_device        45.33                                  
            xcp-volume-f6901916-1ce0-4757-88b6-642b96c4ab80_00000 linstor_group                                      Vwi-a-tz--  <40.10g thin_device        6.03                                   
            xcp-volume-fb436790-958a-46a3-b38f-aaca7d6738c8_00000 linstor_group                                      Vwi-a-tz--  <50.12g thin_device        0.11
          

          host4

          [11:25 uk ~]# lvs
            Device read short 82432 bytes remaining
            Device read short 40960 bytes remaining
            Device read short 40960 bytes remaining
            Device read short 40960 bytes remaining
            Device read short 40960 bytes remaining
            Device read short 98304 bytes remaining
            LV                                                    VG                                                 Attr       LSize    Pool        Origin Data%  Meta%  Move Log Cpy%Sync Convert
            MGT                                                   VG_XenStorage-f7d16827-19e0-c57d-a720-c7fba180d4af -wi-------    4.00m                                                           
            3d07204c-eec9-caf1-f86a-fab419537889                  XSLocalEXT-3d07204c-eec9-caf1-f86a-fab419537889    -wi-ao---- <517.40g                                                           
            thin_device                                           linstor_group                                      twi-aotz--    2.18t                    2.52   11.73                           
            xcp-persistent-database_00000                         linstor_group                                      Vwi-a-tz--    1.00g thin_device        6.03                                   
            xcp-volume-126b1370-6042-40ce-8184-22a771fbf1e4_00000 linstor_group                                      Vwi-a-tz--  <40.10g thin_device        0.11                                   
            xcp-volume-2ac918c0-1feb-4ad9-97d6-dcc561832b5d_00000 linstor_group                                      Vwi-a-tz--   10.03g thin_device        73.19                                  
            xcp-volume-3c45b809-33b7-40a3-a602-01e1511327e7_00000 linstor_group                                      Vwi-a-tz--   10.03g thin_device        45.23                                  
            xcp-volume-3e025e5e-c339-4e0c-b8ca-eb4e509ce24d_00000 linstor_group                                      Vwi-a-tz--   <4.02g thin_device        51.99                                  
            xcp-volume-78744846-0432-44e6-a135-021f6b5dc072_00000 linstor_group                                      Vwi-a-tz--   20.00m thin_device        90.00                                  
            xcp-volume-7a83c50f-6bd5-4d7e-89a5-c3dee95bdd0b_00000 linstor_group                                      Vwi-a-tz--  <40.10g thin_device        1.13                                   
            xcp-volume-91a4068e-73cc-402f-87eb-2f631e66d6e2_00000 linstor_group                                      Vwi-a-tz--   20.05g thin_device        64.23                                  
            xcp-volume-9e32d56e-7c7d-443b-955a-57015b968375_00000 linstor_group                                      Vwi-a-tz--    6.02g thin_device        96.63                                  
            xcp-volume-ac921e7e-71ab-4ee9-8d61-5a55fe5fc369_00000 linstor_group                                      Vwi-a-tz--  <40.10g thin_device        0.11                                   
            xcp-volume-c176df5f-5ef6-46b3-841e-93ab0b5af30e_00000 linstor_group                                      Vwi-a-tz--  <40.10g thin_device        11.33                                  
            xcp-volume-c1a113f6-9d1e-45f6-9b7d-656327523ce3_00000 linstor_group                                      Vwi-a-tz--   20.00m thin_device        90.00                                  
            xcp-volume-c31378ba-1ec6-4756-ab54-67c49b2ecd51_00000 linstor_group                                      Vwi-a-tz--  <40.10g thin_device        17.40                                  
            xcp-volume-d017c7e9-c2bc-422e-a94c-580d7001f5d0_00000 linstor_group                                      Vwi-a-tz--   20.00m thin_device        90.00                                  
            xcp-volume-d0b249b1-fb43-4013-bd6e-67c5fbdcd9b5_00000 linstor_group                                      Vwi-a-tz--   20.00m thin_device        90.00                                  
            xcp-volume-d8d37107-bb28-4884-9fb2-e771b4df1c70_00000 linstor_group                                      Vwi-a-tz--   10.03g thin_device        45.33                                  
            xcp-volume-f6901916-1ce0-4757-88b6-642b96c4ab80_00000 linstor_group                                      Vwi-a-tz--  <40.10g thin_device        6.03                                   
            xcp-volume-fec259ee-bee0-4118-b5ba-09035aad8ca2_00000 linstor_group                                      Vwi-a-tz--   10.03g thin_device        44.47
          

          ronan-a said in Lost access to all servers:

          If the database is not active, execute: vgchange -ay linstor_group.

          How do I know if the database is active or not?

          F 1 Reply Last reply Reply Quote 0
          • F Offline
            fred974 @fred974
            last edited by

            ronan-a did you get a chance to review the log? Did you see anything that can help me move forward?
            Thank you

            ronan-aR 1 Reply Last reply Reply Quote 0
            • ronan-aR Offline
              ronan-a Vates 🪐 XCP-ng Team @fred974
              last edited by

              fred974 I was a little bit busy, I can take a look at your problems tomorrow.
              In the worst case, do you have a way to open a ssh connection to your servers?

              F 1 Reply Last reply Reply Quote 0
              • F Offline
                fred974 @ronan-a
                last edited by

                ronan-a thank you very much. Do you want me to open a tunnel via Xen Orchestra?

                ronan-aR 1 Reply Last reply Reply Quote 0
                • ronan-aR Offline
                  ronan-a Vates 🪐 XCP-ng Team @fred974
                  last edited by

                  fred974 If you can yes. 😉 Send me the code using the chat.

                  F 1 Reply Last reply Reply Quote 0
                  • F Offline
                    fred974 @ronan-a
                    last edited by

                    ronan-a Thank you very much for helping fixing my pool 🙂

                    K 1 Reply Last reply Reply Quote 0
                    • K Offline
                      KPS Top contributor @fred974
                      last edited by

                      fred974
                      It would be great, if you could write down some lines about the issue and how it could get fixed

                      ronan-aR F 2 Replies Last reply Reply Quote 0
                      • ronan-aR Offline
                        ronan-a Vates 🪐 XCP-ng Team @KPS
                        last edited by

                        KPS The DRBD volume of the LINSTOR database was not created by the driver. We just restarted few services + the hosts to fix that. Unfortunately, we have no explanation for what could have happened. So unfortunately I don't have much more interesting information to give. However if a person finds himself again in this situation, I can assist him in order to see if we can obtain more interesting logs.

                        F J 2 Replies Last reply Reply Quote 0
                        • F Offline
                          fred974 @KPS
                          last edited by

                          This post is deleted!
                          1 Reply Last reply Reply Quote 0
                          • F Offline
                            fred974 @ronan-a
                            last edited by

                            ronan-a I will let you and the community know If I run into this problem again. I just find out that we had one of the NFS server that kept rebooting around the same time so I do wonder if it could have contribute to the issue if the hosts couldn't connect to it or not. I am not advanced enough to know if there is any corelation.

                            1 Reply Last reply Reply Quote 0
                            • J Offline
                              johannes @ronan-a
                              last edited by

                              ronan-a

                              I may be having a similar issue to the one you helped fred974 with last month. One (xcp-ng3) of the three hosts on my lab environment dropped out of the SR in the past day or so. I believe it may have coincided with a failed live migration to that host and/or a forced reboot of the host after that failed migration. Here's what I am seeing on the affected host:

                              [09:14 xcp-ng3 ~]# linstor node list
                              Error: Unable to connect to linstor://localhost:3370: [Errno 99] Cannot assign requested address
                              [09:14 xcp-ng3 ~]# drbdadm status
                              xcp-persistent-database role:Secondary
                                disk:UpToDate quorum:no
                                xcp-ng1 connection:StandAlone
                                xcp-ng2 connection:StandAlone
                              

                              The other two hosts, xcp-ng1 and xcp-ng2, are still operating without issue. XO sees xcp-ng3 and does not throw any errors unless I attempt any action that utilizes the SR (makes sense). It seems apparent that the linstor controller is not running, as any linstor command results in the connection error above. Thoughts? Any other logs you need?

                              FWIW, I'd normally just wipe the host and reinstall, but I wanted to bring it to your attention in case there's any value to the project. 😁

                              1 Reply Last reply Reply Quote 0
                              • olivierlambertO Online
                                olivierlambert Vates 🪐 Co-Founder CEO
                                last edited by

                                I think that might be interesting for ronan-a indeed 🙂

                                1 Reply Last reply Reply Quote 0
                                • J Offline
                                  johannes
                                  last edited by johannes

                                  To make matters more interesting, my primary host (xcp-ng1) running XO crashed due to overheating. (I just love the fun curveballs my lab environment throws me! 😆) When I powered that machine back up, I received the same "Unable to connect to linstor://localhost:3370" error as above on xcp-ng1 and of course the XOSTOR SR was not operating with nodes 1 & 3 offline.

                                  After some cursory poking around, I rebooted xcp-ng1 and lo and behold... my XOSTOR SR was fully restored for all three nodes!

                                  Part of this is my ignorance on how the underlying elements of XOSTOR work, so I'm going to throw out some ideas here that may or may not be true. Feel free to correct / educate me!

                                  When I forced a reboot of xcp-ng3, the XOSTOR (LINSTOR?) controller did not automatically rejoin the xcp-ng3 node into the SR for whatever reason when it came back online. Then when xcp-ng1 powered off unexpectedly, the controller (which I believe was running on xcp-ng1 - again I'm not fully sure how this functionality works when distributed across the cluster) again did not resume when that machine powered back on. After a controlled, graceful reboot of xcp-ng1, the controller started and synced back up with all 3 nodes. Does this make sense or seem plausible? 🙂 Any logs that would be valuable?

                                  Edit: One additional detail that I left out: when xcp-ng1 came back online, running 'drbdadm status' resulted in this message: "No currently configured DRBD found." After the graceful reboot, all was working properly again.

                                  ronan-aR 1 Reply Last reply Reply Quote 0
                                  • ronan-aR Offline
                                    ronan-a Vates 🪐 XCP-ng Team @johannes
                                    last edited by

                                    johannes Hello, with a little delay 🙂 , is everything working in your pool now? There might be a useful log in /var/log/linstor-controller/

                                    J 1 Reply Last reply Reply Quote 0
                                    • J Offline
                                      johannes @ronan-a
                                      last edited by

                                      ronan-a My pool is mostly working now, though I'm still seeing some quirks. I believe the problems I've seen thus far is due to the hardware instability of one of my hosts, as noted before. While frustrating, it has given me the opportunity to work through some host and SR recovery processes. That's the silver lining. 🙂

                                      The main issue that I'm still seeing is when I attempt to load a specific stopped VM on a specific host (xcp-ng3). The VM will fail to start and throw this error:

                                      SR_BACKEND_FAILURE_46(, The VDI is not available [opterr=VDI
                                      a9d53abc-f492-4826-b659-744ee87d4d94 not detached cleanly], )
                                      

                                      This VM will start without issue on the other two hosts. XO is not showing any other issues with this VM and is not showing any orphaned VDIs.

                                      I am also seeing these errors in the /var/log/linstor-controller directory:

                                      Host xcp-ng1:

                                      ERROR REPORT 648FDBE6-00000-000000
                                      
                                      ============================================================
                                      
                                      Application:                        LINBIT® LINSTOR
                                      Module:                             Controller
                                      Version:                            1.21.1
                                      Build ID:                           a677db312062add13e9b230b8b902d43a69caf13
                                      Build time:                         2023-03-22T14:05:41+00:00
                                      Error time:                         2023-06-19 00:39:06
                                      Node:                               xcp-ng1
                                      
                                      ============================================================
                                      
                                      Reported error:
                                      ===============
                                      
                                      Category:                           Exception
                                      Class name:                         SocketException
                                      Class canonical name:               java.net.SocketException
                                      Generated at:                       Method 'bind0', Source file 'Net.java, Unknown line number
                                      
                                      Error message:                      Protocol family unavailable
                                      
                                      Call backtrace:
                                      
                                          Method                                   Native Class:Line number
                                          bind0                                    Y      sun.nio.ch.Net:unknown
                                          bind                                     N      sun.nio.ch.Net:461
                                          bind                                     N      sun.nio.ch.Net:453
                                          bind                                     N      sun.nio.ch.ServerSocketChannelImpl:222
                                          bind                                     N      sun.nio.ch.ServerSocketAdaptor:85
                                          bindToChannelAndAddress                  N      org.glassfish.grizzly.nio.transport.TCPNIOBindingHandler:107
                                          bind                                     N      org.glassfish.grizzly.nio.transport.TCPNIOBindingHandler:64
                                          bind                                     N      org.glassfish.grizzly.nio.transport.TCPNIOTransport:215
                                          bind                                     N      org.glassfish.grizzly.nio.transport.TCPNIOTransport:195
                                          bind                                     N      org.glassfish.grizzly.nio.transport.TCPNIOTransport:186
                                          start                                    N      org.glassfish.grizzly.http.server.NetworkListener:711
                                          start                                    N      org.glassfish.grizzly.http.server.HttpServer:256
                                          start                                    N      com.linbit.linstor.api.rest.v1.config.GrizzlyHttpService:314
                                          initialize                               N      com.linbit.linstor.systemstarter.GrizzlyInitializer:88
                                          startSystemServices                      N      com.linbit.linstor.core.ApplicationLifecycleManager:87
                                          start                                    N      com.linbit.linstor.core.Controller:365
                                          main                                     N      com.linbit.linstor.core.Controller:613
                                      
                                      
                                      END OF ERROR REPORT.
                                      

                                      Host xcp-ng2 (each truncated for post limit):

                                      ERROR REPORT 648E4932-00000-000012
                                      
                                      ============================================================
                                      
                                      Application:                        LINBIT® LINSTOR
                                      Module:                             Controller
                                      Version:                            1.21.1
                                      Build ID:                           a677db312062add13e9b230b8b902d43a69caf13
                                      Build time:                         2023-03-22T14:05:41+00:00
                                      Error time:                         2023-06-19 00:38:51
                                      Node:                               xcp-ng2
                                      Peer:                               Node: 'xcp-ng1'
                                      
                                      ============================================================
                                      
                                      Reported error:
                                      ===============
                                      
                                      Category:                           RuntimeException
                                      Class name:                         TransactionException
                                      Class canonical name:               com.linbit.linstor.transaction.TransactionException
                                      Generated at:                       Method 'startTransaction', Source file 'ControllerSQLTransactionMgrGenerator.java', Line #32
                                      
                                      Error message:                      Failed to start transaction
                                      
                                      Error context:
                                          Failed to start transaction
                                      
                                      Asynchronous stage backtrace:
                                      
                                          Error has been observed at the following site(s):
                                              |_ checkpoint ⇢ Execute single-stage API UpdateFreeCapacity
                                              |_ checkpoint ⇢ Fallback error handling wrapper
                                          Stack trace:
                                      
                                      Call backtrace:
                                      
                                          Method                                   Native Class:Line number
                                          startTransaction                         N      com.linbit.linstor.transaction.manager.ControllerSQLTransactionMgrGenerator:32
                                      
                                      ERROR REPORT 648E4932-00000-000013
                                      
                                      ============================================================
                                      
                                      Application:                        LINBIT® LINSTOR
                                      Module:                             Controller
                                      Version:                            1.21.1
                                      Build ID:                           a677db312062add13e9b230b8b902d43a69caf13
                                      Build time:                         2023-03-22T14:05:41+00:00
                                      Error time:                         2023-06-19 00:38:51
                                      Node:                               xcp-ng2
                                      Peer:                               Node: 'xcp-ng3'
                                      
                                      ============================================================
                                      
                                      Reported error:
                                      ===============
                                      
                                      Category:                           RuntimeException
                                      Class name:                         ErrorCallbackNotImplemented
                                      Class canonical name:               reactor.core.Exceptions.ErrorCallbackNotImplemented
                                      Generated at:                       <UNKNOWN>
                                      
                                      Error message:                      com.linbit.linstor.transaction.TransactionException: Failed to start transaction
                                      
                                      Call backtrace:
                                      
                                          Method                                   Native Class:Line number
                                      
                                      ERROR REPORT 648E4932-00000-000014
                                      
                                      ============================================================
                                      
                                      Application:                        LINBIT® LINSTOR
                                      Module:                             Controller
                                      Version:                            1.21.1
                                      Build ID:                           a677db312062add13e9b230b8b902d43a69caf13
                                      Build time:                         2023-03-22T14:05:41+00:00
                                      Error time:                         2023-06-19 00:38:51
                                      Node:                               xcp-ng2
                                      Peer:                               RestClient(127.0.0.1; 'PythonLinstor/1.15.1 (API1.0.4): Client 1.15.1')
                                      
                                      ============================================================
                                      
                                      Reported error:
                                      ===============
                                      
                                      Category:                           RuntimeException
                                      Class name:                         TransactionException
                                      Class canonical name:               com.linbit.linstor.transaction.TransactionException
                                      Generated at:                       Method 'startTransaction', Source file 'ControllerSQLTransactionMgrGenerator.java', Line #32
                                      
                                      Error message:                      Failed to start transaction
                                      
                                      Error context:
                                          Modification of node 'xcp-ng1' failed due to an unknown exception.
                                      
                                      Asynchronous stage backtrace:
                                      
                                          Error has been observed at the following site(s):
                                              |_ checkpoint ⇢ Auto-Quorum and -Tiebreaker after node create
                                              |_ checkpoint ⇢ Reconnect node(s)
                                          Stack trace:
                                      
                                      Call backtrace:
                                      
                                          Method                                   Native Class:Line number
                                          startTransaction                         N      com.linbit.linstor.transaction.manager.ControllerSQLTransactionMgrGenerator:32
                                      
                                      ERROR REPORT 648E4932-00000-000015
                                      
                                      ============================================================
                                      
                                      Application:                        LINBIT® LINSTOR
                                      Module:                             Controller
                                      Version:                            1.21.1
                                      Build ID:                           a677db312062add13e9b230b8b902d43a69caf13
                                      Build time:                         2023-03-22T14:05:41+00:00
                                      Error time:                         2023-06-19 00:38:51
                                      Node:                               xcp-ng2
                                      Peer:                               Node: 'xcp-ng2'
                                      
                                      ============================================================
                                      
                                      Reported error:
                                      ===============
                                      
                                      Category:                           RuntimeException
                                      Class name:                         TransactionException
                                      Class canonical name:               com.linbit.linstor.transaction.TransactionException
                                      Generated at:                       Method 'startTransaction', Source file 'ControllerSQLTransactionMgrGenerator.java', Line #32
                                      
                                      Error message:                      Failed to start transaction
                                      
                                      Error context:
                                          Failed to start transaction
                                      
                                      Asynchronous stage backtrace:
                                      
                                          Error has been observed at the following site(s):
                                              |_ checkpoint ⇢ Execute single-stage API ApplyPropsFromStlt
                                              |_ checkpoint ⇢ Fallback error handling wrapper
                                          Stack trace:
                                      
                                      Call backtrace:
                                      
                                          Method                                   Native Class:Line number
                                          startTransaction                         N      com.linbit.linstor.transaction.manager.ControllerSQLTransactionMgrGenerator:32
                                      
                                      ERROR REPORT 648E4932-00000-000016
                                      
                                      ============================================================
                                      
                                      Application:                        LINBIT® LINSTOR
                                      Module:                             Controller
                                      Version:                            1.21.1
                                      Build ID:                           a677db312062add13e9b230b8b902d43a69caf13
                                      Build time:                         2023-03-22T14:05:41+00:00
                                      Error time:                         2023-06-19 00:38:51
                                      Node:                               xcp-ng2
                                      Peer:                               RestClient(127.0.0.1; 'PythonLinstor/1.15.1 (API1.0.4): Client 1.15.1')
                                      
                                      ============================================================
                                      
                                      Reported error:
                                      ===============
                                      
                                      Category:                           RuntimeException
                                      Class name:                         TransactionException
                                      Class canonical name:               com.linbit.linstor.transaction.TransactionException
                                      Generated at:                       Method 'startTransaction', Source file 'ControllerSQLTransactionMgrGenerator.java', Line #32
                                      
                                      Error message:                      Failed to start transaction
                                      
                                      Error context:
                                          Modification of node 'xcp-ng1' failed due to an unknown exception.
                                      
                                      Asynchronous stage backtrace:
                                      
                                          Error has been observed at the following site(s):
                                              |_ checkpoint ⇢ Reconnect node(s)
                                          Stack trace:
                                      
                                      Call backtrace:
                                      
                                          Method                                   Native Class:Line number
                                          startTransaction                         N      com.linbit.linstor.transaction.manager.ControllerSQLTransactionMgrGenerator:32
                                      
                                      ERROR REPORT 648E4932-00000-000017
                                      
                                      ============================================================
                                      
                                      Application:                        LINBIT® LINSTOR
                                      Module:                             Controller
                                      Version:                            1.21.1
                                      Build ID:                           a677db312062add13e9b230b8b902d43a69caf13
                                      Build time:                         2023-03-22T14:05:41+00:00
                                      Error time:                         2023-06-19 00:38:51
                                      Node:                               xcp-ng2
                                      
                                      ============================================================
                                      
                                      Reported error:
                                      ===============
                                      
                                      Category:                           RuntimeException
                                      Class name:                         ErrorCallbackNotImplemented
                                      Class canonical name:               reactor.core.Exceptions.ErrorCallbackNotImplemented
                                      Generated at:                       <UNKNOWN>
                                      
                                      Error message:                      com.linbit.linstor.transaction.TransactionException: Failed to start transaction
                                      
                                      Error context:
                                          Exception thrown by connection observer when outbound connection established
                                      
                                      Call backtrace:
                                      
                                          Method                                   Native Class:Line number
                                      
                                      ERROR REPORT 648E4932-00000-000018
                                      
                                      ============================================================
                                      
                                      Application:                        LINBIT® LINSTOR
                                      Module:                             Controller
                                      Version:                            1.21.1
                                      Build ID:                           a677db312062add13e9b230b8b902d43a69caf13
                                      Build time:                         2023-03-22T14:05:41+00:00
                                      Error time:                         2023-06-19 00:38:51
                                      Node:                               xcp-ng2
                                      Peer:                               Node: 'xcp-ng1'
                                      
                                      ============================================================
                                      
                                      Reported error:
                                      ===============
                                      
                                      Category:                           RuntimeException
                                      Class name:                         IllegalStateException
                                      Class canonical name:               java.lang.IllegalStateException
                                      Generated at:                       Method 'assertOpen', Source file 'BaseGenericObjectPool.java', Line #759
                                      
                                      Error message:                      Pool not open
                                      
                                      Asynchronous stage backtrace:
                                      
                                          Error has been observed at the following site(s):
                                              |_ checkpoint ⇢ Execute single-stage API NotifyDevMgrRunCompleted
                                              |_ checkpoint ⇢ Fallback error handling wrapper
                                          Stack trace:
                                      
                                      Call backtrace:
                                      
                                          Method                                   Native Class:Line number
                                          assertOpen                               N      org.apache.commons.pool2.impl.BaseGenericObjectPool:759
                                      
                                      ERROR REPORT 648E4932-00000-000019
                                      
                                      ============================================================
                                      
                                      Application:                        LINBIT® LINSTOR
                                      Module:                             Controller
                                      Version:                            1.21.1
                                      Build ID:                           a677db312062add13e9b230b8b902d43a69caf13
                                      Build time:                         2023-03-22T14:05:41+00:00
                                      Error time:                         2023-06-19 00:38:51
                                      Node:                               xcp-ng2
                                      
                                      ============================================================
                                      
                                      Reported error:
                                      ===============
                                      
                                      Category:                           RuntimeException
                                      Class name:                         TransactionException
                                      Class canonical name:               com.linbit.linstor.transaction.TransactionException
                                      Generated at:                       Method 'startTransaction', Source file 'ControllerSQLTransactionMgrGenerator.java', Line #32
                                      
                                      Error message:                      Failed to start transaction
                                      
                                      Asynchronous stage backtrace:
                                      
                                          Error has been observed at the following site(s):
                                              |_ checkpoint ⇢ Restore node
                                          Stack trace:
                                      
                                      Call backtrace:
                                      
                                          Method                                   Native Class:Line number
                                          startTransaction                         N      com.linbit.linstor.transaction.manager.ControllerSQLTransactionMgrGenerator:32
                                      
                                      ERROR REPORT 648E4932-00000-000020
                                      
                                      ============================================================
                                      
                                      Application:                        LINBIT® LINSTOR
                                      Module:                             Controller
                                      Version:                            1.21.1
                                      Build ID:                           a677db312062add13e9b230b8b902d43a69caf13
                                      Build time:                         2023-03-22T14:05:41+00:00
                                      Error time:                         2023-06-19 00:38:51
                                      Node:                               xcp-ng2
                                      
                                      ============================================================
                                      
                                      Reported error:
                                      ===============
                                      
                                      Category:                           RuntimeException
                                      Class name:                         TransactionException
                                      Class canonical name:               com.linbit.linstor.transaction.TransactionException
                                      Generated at:                       Method 'startTransaction', Source file 'ControllerSQLTransactionMgrGenerator.java', Line #32
                                      
                                      Error message:                      Failed to start transaction
                                      
                                      Asynchronous stage backtrace:
                                      
                                          Error has been observed at the following site(s):
                                              |_ checkpoint ⇢ Fetch thin capacity info
                                          Stack trace:
                                      
                                      Call backtrace:
                                      
                                          Method                                   Native Class:Line number
                                          startTransaction                         N      com.linbit.linstor.transaction.manager.ControllerSQLTransactionMgrGenerator:32
                                      
                                      ERROR REPORT 648E4932-00000-000021
                                      
                                      ============================================================
                                      
                                      Application:                        LINBIT® LINSTOR
                                      Module:                             Controller
                                      Version:                            1.21.1
                                      Build ID:                           a677db312062add13e9b230b8b902d43a69caf13
                                      Build time:                         2023-03-22T14:05:41+00:00
                                      Error time:                         2023-06-19 00:38:51
                                      Node:                               xcp-ng2
                                      
                                      ============================================================
                                      
                                      Reported error:
                                      ===============
                                      
                                      Category:                           RuntimeException
                                      Class name:                         TransactionException
                                      Class canonical name:               com.linbit.linstor.transaction.TransactionException
                                      Generated at:                       Method 'startTransaction', Source file 'ControllerSQLTransactionMgrGenerator.java', Line #32
                                      
                                      Error message:                      Failed to start transaction
                                      
                                      Asynchronous stage backtrace:
                                      
                                          Error has been observed at the following site(s):
                                              |_ checkpoint ⇢ Fetch thin capacity info
                                          Stack trace:
                                      
                                      Call backtrace:
                                      
                                          Method                                   Native Class:Line number
                                          startTransaction                         N      com.linbit.linstor.transaction.manager.ControllerSQLTransactionMgrGenerator:32
                                      
                                      ERROR REPORT 648E4932-00000-000023
                                      
                                      ============================================================
                                      
                                      Application:                        LINBIT® LINSTOR
                                      Module:                             Controller
                                      Version:                            1.21.1
                                      Build ID:                           a677db312062add13e9b230b8b902d43a69caf13
                                      Build time:                         2023-03-22T14:05:41+00:00
                                      Error time:                         2023-06-19 00:38:51
                                      Node:                               xcp-ng2
                                      Peer:                               RestClient(127.0.0.1; 'PythonLinstor/1.15.1 (API1.0.4): Client 1.15.1')
                                      
                                      ============================================================
                                      
                                      Reported error:
                                      ===============
                                      
                                      Category:                           RuntimeException
                                      Class name:                         TransactionException
                                      Class canonical name:               com.linbit.linstor.transaction.TransactionException
                                      Generated at:                       Method 'startTransaction', Source file 'ControllerSQLTransactionMgrGenerator.java', Line #32
                                      
                                      Error message:                      Failed to start transaction
                                      
                                      Error context:
                                          Modification of node 'xcp-ng1' failed due to an unknown exception.
                                      
                                      Asynchronous stage backtrace:
                                      
                                          Error has been observed at the following site(s):
                                              |_ checkpoint ⇢ Reconnect node(s)
                                          Stack trace:
                                      
                                      Call backtrace:
                                      
                                          Method                                   Native Class:Line number
                                          startTransaction                         N      com.linbit.linstor.transaction.manager.ControllerSQLTransactionMgrGenerator:32
                                      
                                      ERROR REPORT 648E4932-00000-000024
                                      
                                      ============================================================
                                      
                                      Application:                        LINBIT® LINSTOR
                                      Module:                             Controller
                                      Version:                            1.21.1
                                      Build ID:                           a677db312062add13e9b230b8b902d43a69caf13
                                      Build time:                         2023-03-22T14:05:41+00:00
                                      Error time:                         2023-06-19 00:38:51
                                      Node:                               xcp-ng2
                                      
                                      ============================================================
                                      
                                      Reported error:
                                      ===============
                                      
                                      Category:                           RuntimeException
                                      Class name:                         TransactionException
                                      Class canonical name:               com.linbit.linstor.transaction.TransactionException
                                      Generated at:                       Method 'startTransaction', Source file 'ControllerSQLTransactionMgrGenerator.java', Line #32
                                      
                                      Error message:                      Failed to start transaction
                                      
                                      Asynchronous stage backtrace:
                                      
                                          Error has been observed at the following site(s):
                                              |_ checkpoint ⇢ Fetch thin capacity info
                                          Stack trace:
                                      
                                      Call backtrace:
                                      
                                          Method                                   Native Class:Line number
                                          startTransaction                         N      com.linbit.linstor.transaction.manager.ControllerSQLTransactionMgrGenerator:32
                                      

                                      Host xcp-ng3:

                                      ERROR REPORT 64905A1B-00000-000000
                                      
                                      ============================================================
                                      
                                      Application:                        LINBIT® LINSTOR
                                      Module:                             Controller
                                      Version:                            1.21.1
                                      Build ID:                           a677db312062add13e9b230b8b902d43a69caf13
                                      Build time:                         2023-03-22T14:05:41+00:00
                                      Error time:                         2023-06-19 09:37:36
                                      Node:                               xcp-ng3
                                      
                                      ============================================================
                                      
                                      Reported error:
                                      ===============
                                      
                                      Category:                           Exception
                                      Class name:                         SocketException
                                      Class canonical name:               java.net.SocketException
                                      Generated at:                       Method 'bind0', Source file 'Net.java, Unknown line number
                                      
                                      Error message:                      Protocol family unavailable
                                      
                                      Call backtrace:
                                      
                                          Method                                   Native Class:Line number
                                          bind0                                    Y      sun.nio.ch.Net:unknown
                                          bind                                     N      sun.nio.ch.Net:461
                                          bind                                     N      sun.nio.ch.Net:453
                                          bind                                     N      sun.nio.ch.ServerSocketChannelImpl:222
                                          bind                                     N      sun.nio.ch.ServerSocketAdaptor:85
                                          bindToChannelAndAddress                  N      org.glassfish.grizzly.nio.transport.TCPNIOBindingHandler:107
                                          bind                                     N      org.glassfish.grizzly.nio.transport.TCPNIOBindingHandler:64
                                          bind                                     N      org.glassfish.grizzly.nio.transport.TCPNIOTransport:215
                                          bind                                     N      org.glassfish.grizzly.nio.transport.TCPNIOTransport:195
                                          bind                                     N      org.glassfish.grizzly.nio.transport.TCPNIOTransport:186
                                          start                                    N      org.glassfish.grizzly.http.server.NetworkListener:711
                                          start                                    N      org.glassfish.grizzly.http.server.HttpServer:256
                                          start                                    N      com.linbit.linstor.api.rest.v1.config.GrizzlyHttpService:314
                                          initialize                               N      com.linbit.linstor.systemstarter.GrizzlyInitializer:88
                                          startSystemServices                      N      com.linbit.linstor.core.ApplicationLifecycleManager:87
                                          start                                    N      com.linbit.linstor.core.Controller:365
                                          main                                     N      com.linbit.linstor.core.Controller:613
                                      
                                      
                                      END OF ERROR REPORT.
                                      

                                      Sorry, that was a lot of logs, but I don't know exactly what you'd be interested in seeing. It's possible that all of the logs from host xcp-ng2 are from the last time host xcp-ng1 melted down. I'm still trying to determine exactly when that happened overnight. 🙂

                                      Let me know if you need to see anything else! Thank you!

                                      1 Reply Last reply Reply Quote 1
                                      • First post
                                        Last post