XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Shared Storage Redundancy Testing

    Scheduled Pinned Locked Moved Xen Orchestra
    13 Posts 4 Posters 691 Views 4 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • mauzillaM Offline
      mauzilla
      last edited by

      We're busy with some redundancy testing in our test bench for network storage to be added. We plan to move our local storage to shared storage on TrueNAS servers with NFS shares connected to our pool. We will have 2 TrueNAS servers with identical pool names and NFS sharing (so ultimately it will only be a single IP Address that changes). We will then add to the pool a single NFS share pointing to the primary TrueNAS server. In the event where this server were to fail (due to whatever reason), we would like to simply change the IP address of the share in the pool so that it connects to the secondary shared pool which in theory should be an identical dataset to the primary share.

      In our test bench we have setup a pool with 2 hosts. We also have 2 TrueNAS servers already configured which is replicating a set of test VM's to eachother.

      Our initial experience has been a bit strange. Even after shutting down the TrueNAS server, in the pool it still stays "connected" - VM's keep running (albeit does not have any storage connected to it so only what is left in RAM). We forcefully shutdown all of the VM's (this is a test bench so we want to replicate a real world scenario where we need to switch to the failover storage). In the pool going to the storage, the storage stays connected but we're unable to disconnect it even without any VM's running on it. I suspect this is due to it being unable to "disconnect" from the NFS point as the actual server is offline.

      This leaves us with a bit of a problem and hoping others can help here:

      • As both TrueNAS servers should be identical in storage (the NFS point points and pools are called exactly the same), we figured it would be as simple as changing the NFS shared storage IP to point to the new server, but this seems to be problematic. What would be the best way to achieve this to simply update the IP address of the storage?
      ForzaF 1 Reply Last reply Reply Quote 0
      • ForzaF Offline
        Forza @mauzilla
        last edited by

        @mauzilla it's unfortunately not possible to change the IP or dns of an existing SR. It has to be dropped and recreated. I suppose you can swap the IP addresses on the NFS servers, but even that may not help.

        mauzillaM 1 Reply Last reply Reply Quote 0
        • mauzillaM Offline
          mauzilla @Forza
          last edited by

          @Forza I assume we should recreate the PBD or is there another way to achieve the above? It seems like a realworld issue someone may be faced with if you replicate your NAS to a secondary as a failover?

          1 Reply Last reply Reply Quote 0
          • olivierlambertO Offline
            olivierlambert Vates 🪐 Co-Founder CEO
            last edited by

            PBD remove and recreate will do the trick, no need to remove the SR.

            mauzillaM 1 Reply Last reply Reply Quote 0
            • mauzillaM Offline
              mauzilla @olivierlambert
              last edited by

              @olivierlambert stupid question but would that then be "busy as usual" (AKA if the storage has the replicated data on it (or some version / snapshot) in theory after the PBD recreation the VM's will automatically pick up their individual vhds?

              1 Reply Last reply Reply Quote 0
              • olivierlambertO Offline
                olivierlambert Vates 🪐 Co-Founder CEO
                last edited by

                You can't remove a PBD as long as you have one running disk on it. So you'll need to migrate or shutdown any VM using a disk on it, then delete the PBD and recreate it. You can use the SR maintenance mode button in XO to make it easier.

                ForzaF 1 Reply Last reply Reply Quote 0
                • ForzaF Offline
                  Forza @olivierlambert
                  last edited by

                  @olivierlambert said in Shared Storage Redundancy Testing:

                  You can't remove a PBD as long as you have one running disk on it. So you'll need to migrate or shutdown any VM using a disk on it, then delete the PBD and recreate it. You can use the SR maintenance mode button in XO to make it easier.

                  Does this mean the mapping between the disk, snapshots and VM is preserved?

                  It would be great if this procedure was implemented as an easy-to-use tool in XO/XOA.

                  nikadeN 1 Reply Last reply Reply Quote 0
                  • mauzillaM Offline
                    mauzilla
                    last edited by

                    We will test this today and let you know. Ultimately the use case here is to be able to make use of a failover NAS (which is replicated at NAS level) so that it's a simpler process to switch to a failover in the event of failure (else there is no practical point to replicate the VHD's between external storage servers if we cannot "switch" to another NAS.

                    I will let you know the outcome, but agree with @Forza that if this does work it would be a great addition to the GUI to allow for a "switch to failover" scenario

                    nikadeN 1 Reply Last reply Reply Quote 1
                    • nikadeN Offline
                      nikade Top contributor @Forza
                      last edited by

                      @Forza said in Shared Storage Redundancy Testing:

                      @olivierlambert said in Shared Storage Redundancy Testing:

                      You can't remove a PBD as long as you have one running disk on it. So you'll need to migrate or shutdown any VM using a disk on it, then delete the PBD and recreate it. You can use the SR maintenance mode button in XO to make it easier.

                      Does this mean the mapping between the disk, snapshots and VM is preserved?

                      It would be great if this procedure was implemented as an easy-to-use tool in XO/XOA.

                      If the storage is 100% replicated and looks the same, the snapshots should map to the VM's correctly since the paths would be identical.

                      ForzaF 1 Reply Last reply Reply Quote 0
                      • nikadeN Offline
                        nikade Top contributor @mauzilla
                        last edited by

                        @mauzilla said in Shared Storage Redundancy Testing:

                        We will test this today and let you know. Ultimately the use case here is to be able to make use of a failover NAS (which is replicated at NAS level) so that it's a simpler process to switch to a failover in the event of failure (else there is no practical point to replicate the VHD's between external storage servers if we cannot "switch" to another NAS.

                        I will let you know the outcome, but agree with @Forza that if this does work it would be a great addition to the GUI to allow for a "switch to failover" scenario

                        We're using NFS and failover on a Dell Powerstore 1000T, works pretty good and the NAS presents just 1 IP so we dont have to reconfigure or unplug/plug any VBD's.
                        When a node is failed the VM's just continue running and the secondary node takes over within seconds so there is really nothing happening except a really short hickup.

                        1 Reply Last reply Reply Quote 0
                        • ForzaF Offline
                          Forza @nikade
                          last edited by

                          @nikade said in Shared Storage Redundancy Testing:

                          @Forza said in Shared Storage Redundancy Testing:

                          @olivierlambert said in Shared Storage Redundancy Testing:

                          You can't remove a PBD as long as you have one running disk on it. So you'll need to migrate or shutdown any VM using a disk on it, then delete the PBD and recreate it. You can use the SR maintenance mode button in XO to make it easier.

                          Does this mean the mapping between the disk, snapshots and VM is preserved?

                          It would be great if this procedure was implemented as an easy-to-use tool in XO/XOA.

                          If the storage is 100% replicated and looks the same, the snapshots should map to the VM's correctly since the paths would be identical.

                          I mean't with the procedure to remove and recreate the PDB as @olivierlambert mentioned.

                          1 Reply Last reply Reply Quote 0
                          • mauzillaM Offline
                            mauzilla
                            last edited by

                            @olivierlambert, we're simulating the pbd disconnect to see what would happen in production. The NAS was shutdown (albeit the VM's were still running), we then force shutdown the VM's.

                            Running xe pbd-unplug is stuck (and I assume this is likely due to the Dom being unable to umount the now stale NFS mount point). This could normally be resolved (if one has access to the dom0 with a lazy unmount) but obviously we only interact with through XAPI so not sure if there is an option to achieve this?

                            What we're trying to do is to avoid a reboot if a NAS fails (as it may be for the entire pool and not just for 1 host). Any ideas?

                            1 Reply Last reply Reply Quote 0
                            • olivierlambertO Offline
                              olivierlambert Vates 🪐 Co-Founder CEO
                              last edited by

                              You can lazy umount a failed network share, then PBD unplug will work.

                              1 Reply Last reply Reply Quote 0
                              • First post
                                Last post