XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    XO task watcher issue/CR broken

    Scheduled Pinned Locked Moved Solved Xen Orchestra
    71 Posts 6 Posters 12.7k Views 7 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • julien-fJ Offline
      julien-f Vates 🪐 Co-Founder XO Team @Gheppy
      last edited by

      I still don't understand exactly the issue, not sure if it comes from XO or XCP-ng/XenServer, but latest version does integrate a work-around, if you encounter Premature close error during CR, you can add the following to your xo-server's configuration file (usually /etc/xo-server/config.toml) :

      [xapiOptions]
      ignorePrematureClose = true
      

      It's not enabled by default until completely understand the root cause and it's properly fixed.

      Gheppy Thanks a lot for your tests and feedbacks 🙂

      Andrew Thank you very much for the test appliance, it was an invaluable help investigating this. If you can keep it online for the time being I'll probably have further tests to do with it next week 🙏

      A 1 Reply Last reply Reply Quote 0
      • A Offline
        Andrew Top contributor @julien-f
        last edited by

        julien-f I updated XO Source to current master and added the new ignorePrematureClose=true option. Backup ran the CR correctly again.

        Yes, I can leave the XOA test tunnel up for testing. I'm happy to help you help me!

        1 Reply Last reply Reply Quote 0
        • olivierlambertO Online
          olivierlambert Vates 🪐 Co-Founder CEO
          last edited by

          Andrew do you still have the issue without the ignorePrematureClose?

          A 1 Reply Last reply Reply Quote 0
          • A Offline
            Andrew Top contributor @olivierlambert
            last edited by

            olivierlambert Yes. Still problems on the new code without the option set. 90% of the VMs fail 10% finish correctly on CR.

            1 Reply Last reply Reply Quote 1
            • olivierlambertO Online
              olivierlambert Vates 🪐 Co-Founder CEO
              last edited by

              Thanks for your precious feedback 👍

              1 Reply Last reply Reply Quote 0
              • GheppyG Offline
                Gheppy
                last edited by Gheppy

                Just for info
                I installed the latest version of XOCE, commit 083db67df9e1645a2f8fe2fac564b3aecf30d55e
                CR is ok and can be used in case of disaster, with ignorePrematureClose = true.
                I managed to start the VM without problems, the VM itself ( that is the CR copy ) .
                At the moment I have a very slow copy problem of the VM that is created with CR on the seccond server ( 18Mb max ), the problem is that the same connection goes with 400Mb for a CR copy. I want to make a copy of VM-CR and start it

                M 1 Reply Last reply Reply Quote 0
                • GheppyG Gheppy referenced this topic on
                • M Offline
                  magicker @Gheppy
                  last edited by

                  Gheppy weird.. DR is now working for me but CR is still never ending

                  GheppyG 1 Reply Last reply Reply Quote 0
                  • GheppyG Offline
                    Gheppy @magicker
                    last edited by olivierlambert

                    magicker
                    You need to add this on config file.
                    Location for me is /opt/xen-orchestra/packages/xo-server/.xo-server.toml

                    [xapiOptions]
                    ignorePrematureClose = true
                    
                    M 1 Reply Last reply Reply Quote 0
                    • M Offline
                      magicker @Gheppy
                      last edited by

                      Gheppy Yes.. putting that in place seemed to fix DR.. but not CR.

                      1 Reply Last reply Reply Quote 0
                      • olivierlambertO Online
                        olivierlambert Vates 🪐 Co-Founder CEO
                        last edited by

                        🤔 Any log or something magicker ? Are you sure it's a CR started after modifying the config and restarting xo-server?

                        M 2 Replies Last reply Reply Quote 0
                        • M Offline
                          magicker @olivierlambert
                          last edited by

                          olivierlambert actually.. just tried CR again... and this time it is working.. will test with a few more vms.

                          1 Reply Last reply Reply Quote 1
                          • M Offline
                            magicker @olivierlambert
                            last edited by

                            olivierlambert spoke too soon.. CR works in one direction but not the other
                            https://images.dx3webs.com/kgSC4E.png

                            julien-fJ 1 Reply Last reply Reply Quote 0
                            • julien-fJ Offline
                              julien-f Vates 🪐 Co-Founder XO Team @magicker
                              last edited by

                              magicker Any differences between the two hosts? XCP-ng versions maybe?

                              JamfoFLJ 1 Reply Last reply Reply Quote 0
                              • JamfoFLJ Offline
                                JamfoFL @julien-f
                                last edited by

                                Jumping in as another "victim"... I have lost almost all ability to run any kinds of backups.

                                System information:

                                XO from Sources
                                XO-Server: 5.109.0
                                XO-Web: 5.111.0
                                Commit: d4bbf

                                I had no issues running any kinds of backups prior to applying the latest commits two weeks ago, Monday, February 6, 2023. After applying that commit, everything seemed to be OK until late that evening. At that time, one of my CR jobs started and never completed. All of the subsequent jobs failed, of course, because another job was already running. I was able to delete the job and restarted the toolstacks and rebuilt a new job. This started to spread until, eventually, all of my CR jobs would fail. That issue has now spread and my DR jobs are now failing, too. Out of the seven backup jobs that were working with no issues prior to that date, I am now only able to run one job successfully, and that is a DR job.

                                The DR and CR jobs are written to different repositories (the CR jobs are written to the storage repository on one of the other XCP-ng servers and the DR jobs are written to an NFS share).

                                I tried running the cr-issue branch update that julien-f recommended, and modified my config.toml file with the ignorePrematureClose = true code that was also recommended. Nothing works.

                                If I try to run a DR job now, even a brand-new one, it looks like it wants to start and create the snapshot, but then fails and actually shows it will start in 53.086 years.

                                Anything I can get, just let me know what to pull and I'll add it here... hopefully that will help. Right now, I'm able to only back up a single VM!

                                julien-fJ 1 Reply Last reply Reply Quote 1
                                • julien-fJ Offline
                                  julien-f Vates 🪐 Co-Founder XO Team @JamfoFL
                                  last edited by

                                  JamfoFL cr-issue was a test branch that I've just removed, please try on master.

                                  JamfoFLJ 2 Replies Last reply Reply Quote 0
                                  • JamfoFLJ Offline
                                    JamfoFL @julien-f
                                    last edited by

                                    julien-f OK... I just updated again and there's no change.

                                    It certainly looks like a snapshot is created (I see one attached to the VM right now), but nothing ever progresses. Right now, the job is listed with this status:

                                    [XO] Exporting content of VDI VXCTEST-22-02 Primary (on XCPHOST1) 0%

                                    It's been there for about 10 minutes. I will allow this to continue to run to see if there's ever any progress... but this is usually where thigs jam up and the job never completes.

                                    julien-fJ 1 Reply Last reply Reply Quote 0
                                    • julien-fJ Offline
                                      julien-f Vates 🪐 Co-Founder XO Team @JamfoFL
                                      last edited by

                                      JamfoFL If you can find the latest commit on which it worked for you, that would be very helpful 🙏

                                      One way that you could find the correct commit without checking out every one of them is to use a binary search with git bisect.

                                      First you need to find an older commit that did not exhibit the issue, then you can start the search:

                                      # Move to XO directory
                                      cd xen-orchestra
                                      
                                      # Ensure you are up-to-date on `master`
                                      git checkout master
                                      git pull --ff-only
                                      
                                      # Tell git to start the search
                                      git bisect start
                                      
                                      # Current version is bad
                                      git bisect bad
                                      
                                      # Tell git which commit was good
                                      git bisect good <commit id>
                                      
                                      # Now that git knows the good and the bad commit, it will select a commit to test
                                      

                                      Testing a commit:

                                      # Re-install dependencies and rebuild XO
                                      yarn; yarn build
                                      
                                      # Run xo-server
                                      ./packages/xo-server/dist/cli.mjs
                                      
                                      # Test XO to see if it has the problem
                                      
                                      # Interrupt xo-server with Ctrl+C
                                      
                                      # If the problem is present:
                                      git bisect bad
                                      
                                      # If the problem is absent:
                                      git bisect good
                                      
                                      # Continue testing commits until git tells you which commit is the first with the problem, example:
                                      #
                                      # Bisecting: 0 revisions left to test after this (roughly 0 steps)
                                      # [31f850c19c2c2ad5054c292d0f22e9237869dc04] fix(xo-server-transport-email): log async errors
                                      

                                      After the search, to go back to the initial state:

                                      git bisect reset
                                      
                                      JamfoFLJ 1 Reply Last reply Reply Quote 0
                                      • JamfoFLJ Offline
                                        JamfoFL @julien-f
                                        last edited by

                                        julien-f Thank you, Julien-f!

                                        I ran through that procedure (sorry it took so long) and the last commit where everything works as expected was bf51b94. Of course, I could be back to the point where it might need to run all day before one fails, but I was, at least, able to run one of each type of backup on that commit.

                                        Is there any way I can revert my environment back to that commit so I can have the security of backups again? Or should I ride this out while you take a look?

                                        Thanks!

                                        1 Reply Last reply Reply Quote 1
                                        • olivierlambertO Online
                                          olivierlambert Vates 🪐 Co-Founder CEO
                                          last edited by

                                          Just to be sure, if you go on very next commit, like 263c23ae8f1bc6f3f32ab5fa02c3800b29db8d37, then do a git checkout 263c23ae8f1bc6f3f32ab5fa02c3800b29db8d37 and yarn, plus yarn build after, and re-run xo-server with ./packages/xo-server/dist/cli.mjs. Start a backup job and report 🙂

                                          JamfoFLJ 1 Reply Last reply Reply Quote 0
                                          • JamfoFLJ Offline
                                            JamfoFL @olivierlambert
                                            last edited by

                                            olivierlambert I think I'm a bit confused. After I finished the testing as julien-f recommended and ran git bisect reset, it put me back on the very latest commit (at the time). Right now, it shows me on commit 890b46b and nothing is working again.

                                            Is there a way I can revert my current build back to bf51b94 permanently? I would assume that, after that point, I would then use your suggestions moving forward?

                                            Sorry if I'm understanding correctly!

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post