XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Rolling Pool Update - host took too long to restart

    Scheduled Pinned Locked Moved Xen Orchestra
    36 Posts 9 Posters 11.2k Views 6 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • pdoniasP Offline
      pdonias Vates πŸͺ XO Team @olivierlambert
      last edited by

      @olivierlambert By default, it's 20 minutes. And it's already configurable through xo-server's config by adding:

      [xapiOptions]
      restartHostTimeout = '40 minutes'
      
      1 Reply Last reply Reply Quote 3
      • Tristis OrisT Offline
        Tristis Oris Top contributor
        last edited by

        got that issue too. Sometimes server restart takes longer than usual, so rolling is canceled by timeout.
        Why it's so long? i dunno. Maybe some startup checks. Can't restart production to notice any difference.

        Is the timeout really requried?

        nikadeN 1 Reply Last reply Reply Quote 0
        • nikadeN Offline
          nikade Top contributor
          last edited by

          Our Dell R630's with 512Gb RAM also takes a while to reboot, so yeah being able to adjust the value is great.

          1 Reply Last reply Reply Quote 0
          • nikadeN Offline
            nikade Top contributor @Tristis Oris
            last edited by

            @Tristis-Oris said in Rolling Pool Update - host took too long to restart:

            got that issue too. Sometimes server restart takes longer than usual, so rolling is canceled by timeout.
            Why it's so long? i dunno. Maybe some startup checks. Can't restart production to notice any difference.

            Is the timeout really requried?

            If they have ECC it will check the memory, collect diagnostics and so on, it is pretty common on enterprise servers.

            1 Reply Last reply Reply Quote 0
            • olivierlambertO Offline
              olivierlambert Vates πŸͺ Co-Founder CEO
              last edited by

              Thanks @pdonias I forgot about this πŸ™‚ I didn't check in the doc, have we documented that too?

              pdoniasP 1 Reply Last reply Reply Quote 0
              • pdoniasP Offline
                pdonias Vates πŸͺ XO Team @olivierlambert
                last edited by

                @olivierlambert It doesn't look like we did. It's documented in the config file but we can add it to the RPU doc too if necessary.

                1 Reply Last reply Reply Quote 1
                • olivierlambertO Offline
                  olivierlambert Vates πŸͺ Co-Founder CEO
                  last edited by

                  Let's do that then, this will reduce a potential thread or two in here πŸ™‚

                  D 1 Reply Last reply Reply Quote 1
                  • D Offline
                    dsiminiuk @olivierlambert
                    last edited by

                    @olivierlambert I've made the needed adjustment in the build script to override the default. Now I wait for another set of patches to test it.
                    Thanks all.

                    1 Reply Last reply Reply Quote 2
                    • Tristis OrisT Offline
                      Tristis Oris Top contributor
                      last edited by

                      just installed latest updates, rolling again was canceled by timeout. Since that never happens before, i think it begin after some updates about 2-3 months ago.

                      1 Reply Last reply Reply Quote 0
                      • olivierlambertO Offline
                        olivierlambert Vates πŸͺ Co-Founder CEO
                        last edited by

                        It's hard to give an answer because we are not inside your infrastructure. How long your host took to reboot in the end?

                        Tristis OrisT 1 Reply Last reply Reply Quote 0
                        • Tristis OrisT Offline
                          Tristis Oris Top contributor @olivierlambert
                          last edited by Tristis Oris

                          @olivierlambert according to monitoring it takes 10min. >.<
                          maaaybe some disabled VMs is started after reboot, so it was not enough memory for rolling.

                          But at previous time, reboot really takes very long.

                          i see here lack of logs. Nothing tell me that rolling was canceled.

                          1 Reply Last reply Reply Quote 0
                          • olivierlambertO Offline
                            olivierlambert Vates πŸͺ Co-Founder CEO
                            last edited by

                            We are introducing an XO task to monitor the RPU process. That will be easier to track the whole process πŸ™‚

                            nikadeN 1 Reply Last reply Reply Quote 1
                            • Tristis OrisT Offline
                              Tristis Oris Top contributor
                              last edited by

                              next pool, almost empty, enough memory for rolling, reboot takes 5min.
                              2nd host not updated.

                              012f7325-f92b-403a-bd4b-9c665f2ac7fc-ΠΈΠ·ΠΎΠ±Ρ€Π°ΠΆΠ΅Π½ΠΈΠ΅.png

                              1 Reply Last reply Reply Quote 0
                              • nikadeN Offline
                                nikade Top contributor @olivierlambert
                                last edited by

                                @olivierlambert said in Rolling Pool Update - host took too long to restart:

                                We are introducing an XO task to monitor the RPU process. That will be easier to track the whole process πŸ™‚

                                This should also be displayed in the pool overview to make sure other admins dont start other tasks by mistake.
                                In a perfect world there would be a indicative icon by the pool name and also a warning in the general pool overview with some kind of notification ,like "Pool upgrade in progress - Wait for it to complete before starting new tasks" or similar in a very red/orange/other visible color that you cant miss:

                                5700d1b2-a686-4ee4-a033-116e5795aefc-bild.png

                                1 Reply Last reply Reply Quote 0
                                • olivierlambertO Offline
                                  olivierlambert Vates πŸͺ Co-Founder CEO
                                  last edited by

                                  Good idea for XO 6 likely (because they are not XAPI but XO tasks and I don't think we planned to use the task to have an impact on XAPI objects). But that's a nice idea πŸ™‚ Adding @pdonias in the loop

                                  D 1 Reply Last reply Reply Quote 1
                                  • D Offline
                                    dsiminiuk @olivierlambert
                                    last edited by

                                    @olivierlambert I finally had a chance to apply patches to the two ProLiant servers with the 20 minute boot time and everything worked as expected.

                                    1 Reply Last reply Reply Quote 1
                                    • olivierlambertO Offline
                                      olivierlambert Vates πŸͺ Co-Founder CEO
                                      last edited by

                                      Yay!! Thanks for the feedback @dsiminiuk !

                                      1 Reply Last reply Reply Quote 1
                                      • T Offline
                                        tuxpowered
                                        last edited by

                                        This seems to be an issue still for myself.
                                        Running XO - CE 9321b

                                        The last few updates on different 2 different clusters, have resulted in failed rolling updates. The most current one today.
                                        The odd part is that the server is back up in about 8 min and I can attach to it via XO.

                                        The system never reconnects to complete the upgrade process. resulting in having to manually apply patches move VMs and restore HA.

                                        Our xo-server/config.toml file has been updated so `restartHostTimeout = '60 minutes' to try to overcome this but to no avail.

                                        dc86ba9e-ac62-466b-b4d2-ea8393453ac2-image.png

                                        {
                                          "id": "0m1s8k827",
                                          "properties": {
                                            "poolId": "62d8471c-e515-0d7a-d77f-5ac38a945507",
                                            "poolName": "TESTING-POOL-01",
                                            "progress": 33,
                                            "name": "Rolling pool update",
                                            "userId": "61ea8f96-4e67-468f-a0cf-9d9711482a42"
                                          },
                                          "start": 1727895825871,
                                          "status": "failure",
                                          "updatedAt": 1727897834288,
                                          "tasks": [
                                            {
                                              "id": "go7y3ilbo9u",
                                              "properties": {
                                                "name": "Listing missing patches",
                                                "total": 3,
                                                "progress": 100
                                              },
                                              "start": 1727895825873,
                                              "status": "success",
                                              "tasks": [
                                                {
                                                  "id": "wesndzbzb0r",
                                                  "properties": {
                                                    "name": "Listing missing patches for host dd36b6f8-ffff-4310-aa29-66f312c83930",
                                                    "hostId": "dd36b6f8-ffff-4310-aa29-66f312c83930",
                                                    "hostName": "TESTING-03"
                                                  },
                                                  "start": 1727895825873,
                                                  "status": "success",
                                                  "end": 1727895825874
                                                },
                                                {
                                                  "id": "cib36mtqu1k",
                                                  "properties": {
                                                    "name": "Listing missing patches for host 6fb0707f-2655-4ef2-a0a2-13abe70e3077",
                                                    "hostId": "6fb0707f-2655-4ef2-a0a2-13abe70e3077",
                                                    "hostName": "TESTING-02"
                                                  },
                                                  "start": 1727895825873,
                                                  "status": "success",
                                                  "end": 1727895825874
                                                },
                                                {
                                                  "id": "9zk1dabnwuj",
                                                  "properties": {
                                                    "name": "Listing missing patches for host 1f4b8cd7-e9da-414e-8558-8059a3165b98",
                                                    "hostId": "1f4b8cd7-e9da-414e-8558-8059a3165b98",
                                                    "hostName": "TESTING-01"
                                                  },
                                                  "start": 1727895825874,
                                                  "status": "success",
                                                  "end": 1727895825874
                                                }
                                              ],
                                              "end": 1727895825874
                                            },
                                            {
                                              "id": "ayiur0vq5bn",
                                              "properties": {
                                                "name": "Updating and rebooting"
                                              },
                                              "start": 1727895825874,
                                              "status": "failure",
                                              "tasks": [
                                                {
                                                  "id": "7l55urnrsdj",
                                                  "properties": {
                                                    "name": "Restarting hosts",
                                                    "progress": 22
                                                  },
                                                  "start": 1727895835047,
                                                  "status": "failure",
                                                  "tasks": [
                                                    {
                                                      "id": "y7ba4451qg",
                                                      "properties": {
                                                        "name": "Restarting host 1f4b8cd7-e9da-414e-8558-8059a3165b98",
                                                        "hostId": "1f4b8cd7-e9da-414e-8558-8059a3165b98",
                                                        "hostName": "TESTING-01"
                                                      },
                                                      "start": 1727895835047,
                                                      "status": "failure",
                                                      "tasks": [
                                                        {
                                                          "id": "qtt61zspfwl",
                                                          "properties": {
                                                            "name": "Evacuate",
                                                            "hostId": "1f4b8cd7-e9da-414e-8558-8059a3165b98",
                                                            "hostName": "TESTING-01"
                                                          },
                                                          "start": 1727895835285,
                                                          "status": "success",
                                                          "end": 1727896526200
                                                        },
                                                        {
                                                          "id": "3i41dez1f3q",
                                                          "properties": {
                                                            "name": "Installing patches",
                                                            "hostId": "1f4b8cd7-e9da-414e-8558-8059a3165b98",
                                                            "hostName": "TESTING-01"
                                                          },
                                                          "start": 1727896526200,
                                                          "status": "success",
                                                          "end": 1727896634080
                                                        },
                                                        {
                                                          "id": "jmhv7qbyyid",
                                                          "properties": {
                                                            "name": "Restart",
                                                            "hostId": "1f4b8cd7-e9da-414e-8558-8059a3165b98",
                                                            "hostName": "TESTING-01"
                                                          },
                                                          "start": 1727896634110,
                                                          "status": "success",
                                                          "end": 1727896634248
                                                        },
                                                        {
                                                          "id": "ky40100any9",
                                                          "properties": {
                                                            "name": "Waiting for host to be up",
                                                            "hostId": "1f4b8cd7-e9da-414e-8558-8059a3165b98",
                                                            "hostName": "TESTING-01"
                                                          },
                                                          "start": 1727896634248,
                                                          "status": "failure",
                                                          "end": 1727897834282,
                                                          "result": {
                                                            "message": "Host 1f4b8cd7-e9da-414e-8558-8059a3165b98 took too long to restart",
                                                            "name": "Error",
                                                            "stack": "Error: Host 1f4b8cd7-e9da-414e-8558-8059a3165b98 took too long to restart\n    at file:///opt/xen-orchestra/packages/xo-server/src/xapi/mixins/pool.mjs:152:17\n    at /opt/xen-orchestra/@vates/task/index.js:54:40\n    at AsyncLocalStorage.run (node:async_hooks:346:14)\n    at Task.runInside (/opt/xen-orchestra/@vates/task/index.js:169:41)\n    at Task.run (/opt/xen-orchestra/@vates/task/index.js:153:31)\n    at Function.run (/opt/xen-orchestra/@vates/task/index.js:54:27)\n    at file:///opt/xen-orchestra/packages/xo-server/src/xapi/mixins/pool.mjs:142:24\n    at Task.runInside (/opt/xen-orchestra/@vates/task/index.js:169:22)\n    at Task.run (/opt/xen-orchestra/@vates/task/index.js:153:20)\n    at file:///opt/xen-orchestra/packages/xo-server/src/xapi/mixins/pool.mjs:112:11\n    at Task.runInside (/opt/xen-orchestra/@vates/task/index.js:169:22)\n    at Task.run (/opt/xen-orchestra/@vates/task/index.js:153:20)\n    at Xapi.rollingPoolReboot (file:///opt/xen-orchestra/packages/xo-server/src/xapi/mixins/pool.mjs:102:5)\n    at file:///opt/xen-orchestra/packages/xo-server/src/xapi/mixins/patching.mjs:524:7\n    at Task.runInside (/opt/xen-orchestra/@vates/task/index.js:169:22)\n    at Task.run (/opt/xen-orchestra/@vates/task/index.js:153:20)\n    at Task.runInside (/opt/xen-orchestra/@vates/task/index.js:169:22)\n    at Task.run (/opt/xen-orchestra/@vates/task/index.js:153:20)\n    at XenServers.rollingPoolUpdate (file:///opt/xen-orchestra/packages/xo-server/src/xo-mixins/xen-servers.mjs:703:5)\n    at Xo.rollingUpdate (file:///opt/xen-orchestra/packages/xo-server/src/api/pool.mjs:243:3)\n    at Task.runInside (/opt/xen-orchestra/@vates/task/index.js:169:22)\n    at Task.run (/opt/xen-orchestra/@vates/task/index.js:153:20)"
                                                          }
                                                        }
                                                      ],
                                                      "end": 1727897834282,
                                                      "result": {
                                                        "message": "Host 1f4b8cd7-e9da-414e-8558-8059a3165b98 took too long to restart",
                                                        "name": "Error",
                                                        "stack": "Error: Host 1f4b8cd7-e9da-414e-8558-8059a3165b98 took too long to restart\n    at file:///opt/xen-orchestra/packages/xo-server/src/xapi/mixins/pool.mjs:152:17\n    at /opt/xen-orchestra/@vates/task/index.js:54:40\n    at AsyncLocalStorage.run (node:async_hooks:346:14)\n    at Task.runInside (/opt/xen-orchestra/@vates/task/index.js:169:41)\n    at Task.run (/opt/xen-orchestra/@vates/task/index.js:153:31)\n    at Function.run (/opt/xen-orchestra/@vates/task/index.js:54:27)\n    at file:///opt/xen-orchestra/packages/xo-server/src/xapi/mixins/pool.mjs:142:24\n    at Task.runInside (/opt/xen-orchestra/@vates/task/index.js:169:22)\n    at Task.run (/opt/xen-orchestra/@vates/task/index.js:153:20)\n    at file:///opt/xen-orchestra/packages/xo-server/src/xapi/mixins/pool.mjs:112:11\n    at Task.runInside (/opt/xen-orchestra/@vates/task/index.js:169:22)\n    at Task.run (/opt/xen-orchestra/@vates/task/index.js:153:20)\n    at Xapi.rollingPoolReboot (file:///opt/xen-orchestra/packages/xo-server/src/xapi/mixins/pool.mjs:102:5)\n    at file:///opt/xen-orchestra/packages/xo-server/src/xapi/mixins/patching.mjs:524:7\n    at Task.runInside (/opt/xen-orchestra/@vates/task/index.js:169:22)\n    at Task.run (/opt/xen-orchestra/@vates/task/index.js:153:20)\n    at Task.runInside (/opt/xen-orchestra/@vates/task/index.js:169:22)\n    at Task.run (/opt/xen-orchestra/@vates/task/index.js:153:20)\n    at XenServers.rollingPoolUpdate (file:///opt/xen-orchestra/packages/xo-server/src/xo-mixins/xen-servers.mjs:703:5)\n    at Xo.rollingUpdate (file:///opt/xen-orchestra/packages/xo-server/src/api/pool.mjs:243:3)\n    at Task.runInside (/opt/xen-orchestra/@vates/task/index.js:169:22)\n    at Task.run (/opt/xen-orchestra/@vates/task/index.js:153:20)"
                                                      }
                                                    }
                                                  ],
                                                  "end": 1727897834282,
                                                  "result": {
                                                    "message": "Host 1f4b8cd7-e9da-414e-8558-8059a3165b98 took too long to restart",
                                                    "name": "Error",
                                                    "stack": "Error: Host 1f4b8cd7-e9da-414e-8558-8059a3165b98 took too long to restart\n    at file:///opt/xen-orchestra/packages/xo-server/src/xapi/mixins/pool.mjs:152:17\n    at /opt/xen-orchestra/@vates/task/index.js:54:40\n    at AsyncLocalStorage.run (node:async_hooks:346:14)\n    at Task.runInside (/opt/xen-orchestra/@vates/task/index.js:169:41)\n    at Task.run (/opt/xen-orchestra/@vates/task/index.js:153:31)\n    at Function.run (/opt/xen-orchestra/@vates/task/index.js:54:27)\n    at file:///opt/xen-orchestra/packages/xo-server/src/xapi/mixins/pool.mjs:142:24\n    at Task.runInside (/opt/xen-orchestra/@vates/task/index.js:169:22)\n    at Task.run (/opt/xen-orchestra/@vates/task/index.js:153:20)\n    at file:///opt/xen-orchestra/packages/xo-server/src/xapi/mixins/pool.mjs:112:11\n    at Task.runInside (/opt/xen-orchestra/@vates/task/index.js:169:22)\n    at Task.run (/opt/xen-orchestra/@vates/task/index.js:153:20)\n    at Xapi.rollingPoolReboot (file:///opt/xen-orchestra/packages/xo-server/src/xapi/mixins/pool.mjs:102:5)\n    at file:///opt/xen-orchestra/packages/xo-server/src/xapi/mixins/patching.mjs:524:7\n    at Task.runInside (/opt/xen-orchestra/@vates/task/index.js:169:22)\n    at Task.run (/opt/xen-orchestra/@vates/task/index.js:153:20)\n    at Task.runInside (/opt/xen-orchestra/@vates/task/index.js:169:22)\n    at Task.run (/opt/xen-orchestra/@vates/task/index.js:153:20)\n    at XenServers.rollingPoolUpdate (file:///opt/xen-orchestra/packages/xo-server/src/xo-mixins/xen-servers.mjs:703:5)\n    at Xo.rollingUpdate (file:///opt/xen-orchestra/packages/xo-server/src/api/pool.mjs:243:3)\n    at Task.runInside (/opt/xen-orchestra/@vates/task/index.js:169:22)\n    at Task.run (/opt/xen-orchestra/@vates/task/index.js:153:20)"
                                                  }
                                                }
                                              ],
                                              "end": 1727897834288,
                                              "result": {
                                                "message": "Host 1f4b8cd7-e9da-414e-8558-8059a3165b98 took too long to restart",
                                                "name": "Error",
                                                "stack": "Error: Host 1f4b8cd7-e9da-414e-8558-8059a3165b98 took too long to restart\n    at file:///opt/xen-orchestra/packages/xo-server/src/xapi/mixins/pool.mjs:152:17\n    at /opt/xen-orchestra/@vates/task/index.js:54:40\n    at AsyncLocalStorage.run (node:async_hooks:346:14)\n    at Task.runInside (/opt/xen-orchestra/@vates/task/index.js:169:41)\n    at Task.run (/opt/xen-orchestra/@vates/task/index.js:153:31)\n    at Function.run (/opt/xen-orchestra/@vates/task/index.js:54:27)\n    at file:///opt/xen-orchestra/packages/xo-server/src/xapi/mixins/pool.mjs:142:24\n    at Task.runInside (/opt/xen-orchestra/@vates/task/index.js:169:22)\n    at Task.run (/opt/xen-orchestra/@vates/task/index.js:153:20)\n    at file:///opt/xen-orchestra/packages/xo-server/src/xapi/mixins/pool.mjs:112:11\n    at Task.runInside (/opt/xen-orchestra/@vates/task/index.js:169:22)\n    at Task.run (/opt/xen-orchestra/@vates/task/index.js:153:20)\n    at Xapi.rollingPoolReboot (file:///opt/xen-orchestra/packages/xo-server/src/xapi/mixins/pool.mjs:102:5)\n    at file:///opt/xen-orchestra/packages/xo-server/src/xapi/mixins/patching.mjs:524:7\n    at Task.runInside (/opt/xen-orchestra/@vates/task/index.js:169:22)\n    at Task.run (/opt/xen-orchestra/@vates/task/index.js:153:20)\n    at Task.runInside (/opt/xen-orchestra/@vates/task/index.js:169:22)\n    at Task.run (/opt/xen-orchestra/@vates/task/index.js:153:20)\n    at XenServers.rollingPoolUpdate (file:///opt/xen-orchestra/packages/xo-server/src/xo-mixins/xen-servers.mjs:703:5)\n    at Xo.rollingUpdate (file:///opt/xen-orchestra/packages/xo-server/src/api/pool.mjs:243:3)\n    at Task.runInside (/opt/xen-orchestra/@vates/task/index.js:169:22)\n    at Task.run (/opt/xen-orchestra/@vates/task/index.js:153:20)"
                                              }
                                            }
                                          ],
                                          "end": 1727897834288,
                                          "result": {
                                            "message": "Host 1f4b8cd7-e9da-414e-8558-8059a3165b98 took too long to restart",
                                            "name": "Error",
                                            "stack": "Error: Host 1f4b8cd7-e9da-414e-8558-8059a3165b98 took too long to restart\n    at file:///opt/xen-orchestra/packages/xo-server/src/xapi/mixins/pool.mjs:152:17\n    at /opt/xen-orchestra/@vates/task/index.js:54:40\n    at AsyncLocalStorage.run (node:async_hooks:346:14)\n    at Task.runInside (/opt/xen-orchestra/@vates/task/index.js:169:41)\n    at Task.run (/opt/xen-orchestra/@vates/task/index.js:153:31)\n    at Function.run (/opt/xen-orchestra/@vates/task/index.js:54:27)\n    at file:///opt/xen-orchestra/packages/xo-server/src/xapi/mixins/pool.mjs:142:24\n    at Task.runInside (/opt/xen-orchestra/@vates/task/index.js:169:22)\n    at Task.run (/opt/xen-orchestra/@vates/task/index.js:153:20)\n    at file:///opt/xen-orchestra/packages/xo-server/src/xapi/mixins/pool.mjs:112:11\n    at Task.runInside (/opt/xen-orchestra/@vates/task/index.js:169:22)\n    at Task.run (/opt/xen-orchestra/@vates/task/index.js:153:20)\n    at Xapi.rollingPoolReboot (file:///opt/xen-orchestra/packages/xo-server/src/xapi/mixins/pool.mjs:102:5)\n    at file:///opt/xen-orchestra/packages/xo-server/src/xapi/mixins/patching.mjs:524:7\n    at Task.runInside (/opt/xen-orchestra/@vates/task/index.js:169:22)\n    at Task.run (/opt/xen-orchestra/@vates/task/index.js:153:20)\n    at Task.runInside (/opt/xen-orchestra/@vates/task/index.js:169:22)\n    at Task.run (/opt/xen-orchestra/@vates/task/index.js:153:20)\n    at XenServers.rollingPoolUpdate (file:///opt/xen-orchestra/packages/xo-server/src/xo-mixins/xen-servers.mjs:703:5)\n    at Xo.rollingUpdate (file:///opt/xen-orchestra/packages/xo-server/src/api/pool.mjs:243:3)\n    at Task.runInside (/opt/xen-orchestra/@vates/task/index.js:169:22)\n    at Task.run (/opt/xen-orchestra/@vates/task/index.js:153:20)"
                                          }
                                        }
                                        
                                        1 Reply Last reply Reply Quote 0
                                        • olivierlambertO Offline
                                          olivierlambert Vates πŸͺ Co-Founder CEO
                                          last edited by

                                          So you are saying XO doesn't detect when the host is back online?

                                          T 1 Reply Last reply Reply Quote 0
                                          • T Offline
                                            tuxpowered @olivierlambert
                                            last edited by

                                            @olivierlambert It has not for me on 2 different clusters. And I made sure I was on the latest XO release before attempting.

                                            This also occurred about a month ago, but I was not on the current release. So this time I updated to the current xo first then proceeded to do the rolling pool update.

                                            One of the clusters is local to the xo vm. (the VM runs on the cluster)
                                            The other is done over a VPN connection. Both failed with timeouts and the machines were up.

                                            I also had verification done by being connected to the iLO on both systems. These are DL360 GEN10 systems with 2.5 - 5GB internet connections, with at least 128GB of ram so no slow machines. All disks are also SSD's.

                                            Not sure if any of that really helps, only to point out that the systems are not slow, they were observed coming back on line via ILO, and even going in to settings> server I could reconnect the master node.

                                            The pattern seems to be that they always migrate the VM's off the master node, and reboots, but never seems to reconnect after. Only way to recover is to manually update each node and move VM's then go and reactivate HA.

                                            This also started a bout 2 months ago , and it has been working wonderful in the past. Maybe something changed?

                                            nikadeN D 2 Replies Last reply Reply Quote 0
                                            • First post
                                              Last post