XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Update strategy for a consistent XCP-ng pool

    Scheduled Pinned Locked Moved XCP-ng
    8 Posts 3 Posters 566 Views 3 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • B Offline
      borivoj-tydlitat
      last edited by

      Hello the XCP-ng community,

      We are running XCP-ng on 9 hosts in 3 pools. To maintain continuous operation of the cluster, we perform rolling updates for security and other fixes, one host at a time, in a weekly maintenance window. The whole process typically takes about 3-4 windows, i.e. spans over 2-3 weeks. If a new update is published during that time, version skew can occur between some of the components installed on the hosts, and it has already happened to us that the skew resulted in a disruption of the cluster operation - specifically, VM backup via XenOrchestra stopped working. (And yes, we did follow the documentation on upgrading the pool master first.)

      Is there some good practice for this kind of scenario, to make sure that the update cycle will result in consistent versions installed across the cluster? I can imagine that one could record the package versions installed by yum upgrade on the first host and then script the update on the subsequent hosts to use the same package versions, but maybe there is a better way?

      Thank you.

      1 Reply Last reply Reply Quote 0
      • olivierlambertO Online
        olivierlambert Vates 🪐 Co-Founder CEO
        last edited by

        Hi,
        Thanks for your feedback. Can you explain why you are not updating in the same scheduled window? Because it's not meant to have a pool with different updates, that is known to potential cause issues (as you discovered)

        B 1 Reply Last reply Reply Quote 0
        • B Offline
          borivoj-tydlitat @olivierlambert
          last edited by

          Hi @olivierlambert - the reason is that we typically bundle the physical host reboot with other updates (e.g. host firmware, SW running in the host's VMs). Also, the software stack running in the VMs on the hosts often requires special care when shutting down (for example Kubernetes node VMs running production workloads where some components are a bit fragile, Ceph filesystem, which is HA, but may take long time to recover after a node is taken down etc.) In many cases, we also cannot use VM migration - epecially for VMs using large local storage. So far, the procedure has been scheduling a 2-hour maintenance window every week, which typically allows us to update 2-3 hosts. I have read this post https://xcp-ng.org/forum/topic/7200/patching-to-a-specific-version/4 , but digging into the behavior of yum update, it looks like it cannot update to a specific version (unlike yum install).

          1 Reply Last reply Reply Quote 0
          • olivierlambertO Online
            olivierlambert Vates 🪐 Co-Founder CEO
            last edited by

            Adding @stormi in the loop so we can think about something.

            My first approach would be to reduce the pool size then, so if you have a 3 hosts pool, it will fit in the maintenance window to get all the nodes fully up to date and consistent.

            B 1 Reply Last reply Reply Quote 0
            • B Offline
              borivoj-tydlitat @olivierlambert
              last edited by

              Of course, we can also try negotiating the expansion of the window with the business / management, so that the entire pool update fits in it. But it may not solve the entire problem, as there are various other preparation processes that need to happen between the maintenance windows. Also, breaking a pool is inconvenient - it complicates the management and reduces the options to move VMs around (we make limited, but essential use of that). I am asking here, hoping we can find some technical solution within the existing XCP-ng features.

              1 Reply Last reply Reply Quote 0
              • stormiS Offline
                stormi Vates 🪐 XCP-ng Team
                last edited by

                You could host a local mirror (look up reposync), update from it, and then stop syncing it when you start your maintenance operations.

                However, I must stress that it's not good for a pool to be in an heterogeneous state for so long.

                B 1 Reply Last reply Reply Quote 0
                • B Offline
                  borivoj-tydlitat @stormi
                  last edited by

                  @stormi and @olivierlambert thank you for your advice.

                  I did some exploration on my side, too, and I think we have two workable strategies:

                  1. Use reposync mirror for xcp-ng-base and xcp-ng-updates repos on a shared filesystem visible by all hosts. Sync it, update the master, stop syncing and gradually update the remaining hosts.

                  2. Use a variation of the rpm -qa-based approach discussed earlier - update the master, collect the package state with rpm -qa > reference.pkglist, for each of the remaining hosts yum upgrade-to $(cat reference.pkglist), check with yum check-update or yum --assumeno upgrade for any irregularities, e.g. due to packages installed on some hosts only, and resolve these manually.

                  That's a good point about a pool in heterogeneous state too long - we will definitely reconsider our maintenance procedures.
                  We will try this approach in our upcoming maintenance and report here how we fared.

                  1 Reply Last reply Reply Quote 0
                  • olivierlambertO Online
                    olivierlambert Vates 🪐 Co-Founder CEO
                    last edited by

                    Thanks, and also thank you for your feedback, it's important to understand the pain points to improve our product 🙂

                    Keep us posted!

                    1 Reply Last reply Reply Quote 0
                    • First post
                      Last post