XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    continuous replication job - no progress

    Scheduled Pinned Locked Moved Solved Xen Orchestra
    21 Posts 4 Posters 3.3k Views 2 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • T Offline
      techjeff @tony
      last edited by techjeff

      @tony I hadn't acutally put together that the "remotes" are features of XO and therefore would be mounted to my XOCE instance. Thank you for bringing that to my attention. Now that you mention it, I think "of course!"

      I was able to "touch" a test file on both remotes without issue while logged on as a non-root user:

      2021-09-14 09_51_53-Window.png

      T 1 Reply Last reply Reply Quote 0
      • T Offline
        tony @techjeff
        last edited by

        @techjeff So network permission is probably not the cause, but I'm still not sure why it hasn't worked right out of the box. Did you ever try replicating before setting up the NIC bonding? I only asked because your MAC address (FE:FF:FF:FF:FF:FF) is a bit weird. Also another thing you can try is delta backup instead of replication, just to see if that goes through.

        T 1 Reply Last reply Reply Quote 0
        • T Offline
          techjeff @tony
          last edited by techjeff

          @tony The MAC address of the NIC Bond seemed funny to me too, but it has been that way every time I have created a bond interface with xcp-ngβ€”and I've made bone-headed mistakes and had to start over a few times over the years as I've learned the platform πŸ˜‚. It has seemed to work like that so I didn't bother to look into it further.

          I had also tried to set up a custom xapi-ssl.pem certificate from memory on the host that happens to be my pool master at the moment and it's possible I made a mistake since I didn't take notes last time and it's possible that this could be causing some issues as well.

          All of this has happened because I didn't know how (and frankly still don't know for sure yet) to properly back up a host and I needed to shift hard drives around through my pool and NAS to make bet use of my resources. I ended up thrashing through vlans, vifs, and vdbs using xe because they were tied to the host that was lost without a backup >_<

          I know, I'm piling complications onto this topic, but I'm just trying to focus on one thing at a time right now..

          I haven't tried the replication before NIC bonding, but I could certainly give that a try. I will probably try to do a delta first before playing with nic bonding.

          Thanks again for the help.

          1 Reply Last reply Reply Quote 0
          • olivierlambertO Offline
            olivierlambert Vates πŸͺ Co-Founder CEO
            last edited by

            If you are using XO from the sources, can you update to the latest commit on master?

            T 1 Reply Last reply Reply Quote 0
            • T Offline
              techjeff @olivierlambert
              last edited by

              @olivierlambert I am using XO from sources. I haven't yet tried Tony's suggestions, but because a lot can change when tracking the master branch, I'll try updating then share my results.

              1 Reply Last reply Reply Quote 0
              • olivierlambertO Offline
                olivierlambert Vates πŸͺ Co-Founder CEO
                last edited by olivierlambert

                That's the concept on being on the sources: you track the master branch. You are very likely on master (except if you didn't follow our doc, in that case, there's little we can do to help). Just pull and rebuild, it might work now πŸ™‚

                T 1 Reply Last reply Reply Quote 0
                • T Offline
                  techjeff @olivierlambert
                  last edited by techjeff

                  @olivierlambert I just confirmed that I am indeed tracking master. I am using a third party tool that basically follows the steps in your doc. I understand you have no obligation to support any procedure other than the official doc, but I do appreciate your suggestions and insights -- this is a home system, nothing is running in "production" per se, but I like when I can get things workin as expected πŸ™‚

                  I have updated to latest commit on master and rebuilt, I can confirm that my continuous replication to my local storage is working. Once that completes, I'll try replication to my NFS shared SR as well. Assuming that works, I'll try my metadata backup to my NFS Remotes, then finally I'll share my results here.

                  Thanks again!

                  1 Reply Last reply Reply Quote 0
                  • olivierlambertO Offline
                    olivierlambert Vates πŸͺ Co-Founder CEO
                    last edited by olivierlambert

                    Good, so the fix @julien-f pushed on master solved it πŸ™‚

                    In general, everytime you have an issue, follow the guide lines: get on latest commit on master and check if you still have the problem πŸ™‚ This will reduce the burden to answer multiple time to a problem already fixed.

                    Thanks for the feedback! πŸ‘

                    T 1 Reply Last reply Reply Quote 0
                    • T Offline
                      techjeff @olivierlambert
                      last edited by techjeff

                      @olivierlambert Now that you mention it, I do recall reading that we should always be on Master if I have issues -- I will commit that to memory and be sure to use that as a first-step πŸ™‚

                      It's not often that we get to speak directly to the developers of quality projects and less often still that they are polite and helpful. Thank you for the assistance!

                      1 Reply Last reply Reply Quote 0
                      • olivierlambertO Offline
                        olivierlambert Vates πŸͺ Co-Founder CEO
                        last edited by

                        You are very welcome, happy to learn that it works for you πŸ™‚

                        H 1 Reply Last reply Reply Quote 0
                        • H Offline
                          HeMaN @olivierlambert
                          last edited by HeMaN

                          @olivierlambert
                          I experienced also failed backups yesterday, XOCE 5.82.1 / 5.87.0

                                  "message": "connect ECONNREFUSED 192.168.20.71:443",
                                  "name": "Error",
                                  "stack": "Error: connect ECONNREFUSED 192.168.20.71:443\n    at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1148:16)\n    at TCPConnectWrap.callbackTrampoline (internal/async_hooks.js:131:17)"
                          
                          

                          First suspected the xcp-ng updates but I have had successfull backups after installing the patches so ruled that out.

                          I also had updated XOCE this week, so started looking into that.
                          Did an update to the current version of XOCE 5.82.2 / 5.87.0 and backups are working again.

                          Only thing I did not figure out right away; how to kill the still active tasks from the failed backup jobs? But a restart of the toolstack on the xcp-ng server fixed that as well πŸ™‚

                          Of course after all the troubleshooting I went to the forum and found this topic. Should have done that first thing πŸ˜„

                          1 Reply Last reply Reply Quote 0
                          • olivierlambertO Offline
                            olivierlambert Vates πŸͺ Co-Founder CEO
                            last edited by

                            Always remember: as sources users, your "role" is to stay as close as possible to the latest master commit. This way, you are actively helping the project to spot issues that been missed during our usual dev review process πŸ™‚

                            1 Reply Last reply Reply Quote 1
                            • T Offline
                              techjeff
                              last edited by techjeff

                              @olivierlambert -- this may be related to xcp-ng bug 5929, but perhaps I don't know what types of traffic I need to allow through my firewall xo allow xoc to communicate with my storage network.

                              I am again on the latest commit to master as of today.

                              My Default Migration Network was set to my Storage network (10.0.60.0/24).
                              My storage server is 10.0.60.2 which hosts my default NFS SR as well as my xo nfs remotes.
                              xcp-ng-1 management interface is 10.0.10.11 with 10.0.60.11 on storage net
                              xcp-ng-2 (pool master) management interface is 10.0.10.12 with 10.0.60.12 on storage net

                              My xoc instance only had one interface on management network with address 10.0.10.10 and backups didn't work.

                              I added firewall rules allowing 10.0.10.10 to access 10.0.60.11 and 10.0.60.12 via tcp/udp 111,443, and 2049, but that made no impact on backups -- I saw via pftop that 10.0.10.10 (xoc) was contacting my pool master, xcp-ng-2 via 10.0.60.12:443 when I tried to start the backup. I didn't pay super-close attention, the connections did not stay active and disappeared after 30 seconds - 1 minute

                              I then added an interface to xoc on my storage network with address 10.0.60.10 to avoid the firewall altogether and started a CR job which proceeded almost immediately.

                              I then disconnected the xoc interface on the storage network, configured my default migration network back to the Management network, disabled my firewall rules and the backups worked again.

                              As per @julien-f in xcp-ng bug 5929 xo uses the pool's default migration network for stats, backups, etc. as of 5.62 and that it might not have been a good idea because xo might not always be able to access the storage network. Wouldn't xo always want to communicate with the pool master via the master's management interface, even if management interface is not on the default migration network? I had always assumed that hosts only listen for xapi commands on the management interface's IP -- is that correct or a misunderstanding on my part?

                              1 Reply Last reply Reply Quote 0
                              • olivierlambertO Offline
                                olivierlambert Vates πŸͺ Co-Founder CEO
                                last edited by

                                Ping @julien-f and/or @fohdeesha

                                1 Reply Last reply Reply Quote 0
                                • First post
                                  Last post