XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    XO task watcher issue/CR broken

    Scheduled Pinned Locked Moved Solved Xen Orchestra
    71 Posts 6 Posters 12.7k Views 7 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • julien-fJ Offline
      julien-f Vates 🪐 Co-Founder XO Team @magicker
      last edited by

      magicker Any differences between the two hosts? XCP-ng versions maybe?

      JamfoFLJ 1 Reply Last reply Reply Quote 0
      • JamfoFLJ Offline
        JamfoFL @julien-f
        last edited by

        Jumping in as another "victim"... I have lost almost all ability to run any kinds of backups.

        System information:

        XO from Sources
        XO-Server: 5.109.0
        XO-Web: 5.111.0
        Commit: d4bbf

        I had no issues running any kinds of backups prior to applying the latest commits two weeks ago, Monday, February 6, 2023. After applying that commit, everything seemed to be OK until late that evening. At that time, one of my CR jobs started and never completed. All of the subsequent jobs failed, of course, because another job was already running. I was able to delete the job and restarted the toolstacks and rebuilt a new job. This started to spread until, eventually, all of my CR jobs would fail. That issue has now spread and my DR jobs are now failing, too. Out of the seven backup jobs that were working with no issues prior to that date, I am now only able to run one job successfully, and that is a DR job.

        The DR and CR jobs are written to different repositories (the CR jobs are written to the storage repository on one of the other XCP-ng servers and the DR jobs are written to an NFS share).

        I tried running the cr-issue branch update that julien-f recommended, and modified my config.toml file with the ignorePrematureClose = true code that was also recommended. Nothing works.

        If I try to run a DR job now, even a brand-new one, it looks like it wants to start and create the snapshot, but then fails and actually shows it will start in 53.086 years.

        Anything I can get, just let me know what to pull and I'll add it here... hopefully that will help. Right now, I'm able to only back up a single VM!

        julien-fJ 1 Reply Last reply Reply Quote 1
        • julien-fJ Offline
          julien-f Vates 🪐 Co-Founder XO Team @JamfoFL
          last edited by

          JamfoFL cr-issue was a test branch that I've just removed, please try on master.

          JamfoFLJ 2 Replies Last reply Reply Quote 0
          • JamfoFLJ Offline
            JamfoFL @julien-f
            last edited by

            julien-f OK... I just updated again and there's no change.

            It certainly looks like a snapshot is created (I see one attached to the VM right now), but nothing ever progresses. Right now, the job is listed with this status:

            [XO] Exporting content of VDI VXCTEST-22-02 Primary (on XCPHOST1) 0%

            It's been there for about 10 minutes. I will allow this to continue to run to see if there's ever any progress... but this is usually where thigs jam up and the job never completes.

            julien-fJ 1 Reply Last reply Reply Quote 0
            • julien-fJ Offline
              julien-f Vates 🪐 Co-Founder XO Team @JamfoFL
              last edited by

              JamfoFL If you can find the latest commit on which it worked for you, that would be very helpful 🙏

              One way that you could find the correct commit without checking out every one of them is to use a binary search with git bisect.

              First you need to find an older commit that did not exhibit the issue, then you can start the search:

              # Move to XO directory
              cd xen-orchestra
              
              # Ensure you are up-to-date on `master`
              git checkout master
              git pull --ff-only
              
              # Tell git to start the search
              git bisect start
              
              # Current version is bad
              git bisect bad
              
              # Tell git which commit was good
              git bisect good <commit id>
              
              # Now that git knows the good and the bad commit, it will select a commit to test
              

              Testing a commit:

              # Re-install dependencies and rebuild XO
              yarn; yarn build
              
              # Run xo-server
              ./packages/xo-server/dist/cli.mjs
              
              # Test XO to see if it has the problem
              
              # Interrupt xo-server with Ctrl+C
              
              # If the problem is present:
              git bisect bad
              
              # If the problem is absent:
              git bisect good
              
              # Continue testing commits until git tells you which commit is the first with the problem, example:
              #
              # Bisecting: 0 revisions left to test after this (roughly 0 steps)
              # [31f850c19c2c2ad5054c292d0f22e9237869dc04] fix(xo-server-transport-email): log async errors
              

              After the search, to go back to the initial state:

              git bisect reset
              
              JamfoFLJ 1 Reply Last reply Reply Quote 0
              • JamfoFLJ Offline
                JamfoFL @julien-f
                last edited by

                julien-f Thank you, Julien-f!

                I ran through that procedure (sorry it took so long) and the last commit where everything works as expected was bf51b94. Of course, I could be back to the point where it might need to run all day before one fails, but I was, at least, able to run one of each type of backup on that commit.

                Is there any way I can revert my environment back to that commit so I can have the security of backups again? Or should I ride this out while you take a look?

                Thanks!

                1 Reply Last reply Reply Quote 1
                • olivierlambertO Online
                  olivierlambert Vates 🪐 Co-Founder CEO
                  last edited by

                  Just to be sure, if you go on very next commit, like 263c23ae8f1bc6f3f32ab5fa02c3800b29db8d37, then do a git checkout 263c23ae8f1bc6f3f32ab5fa02c3800b29db8d37 and yarn, plus yarn build after, and re-run xo-server with ./packages/xo-server/dist/cli.mjs. Start a backup job and report 🙂

                  JamfoFLJ 1 Reply Last reply Reply Quote 0
                  • JamfoFLJ Offline
                    JamfoFL @olivierlambert
                    last edited by

                    olivierlambert I think I'm a bit confused. After I finished the testing as julien-f recommended and ran git bisect reset, it put me back on the very latest commit (at the time). Right now, it shows me on commit 890b46b and nothing is working again.

                    Is there a way I can revert my current build back to bf51b94 permanently? I would assume that, after that point, I would then use your suggestions moving forward?

                    Sorry if I'm understanding correctly!

                    1 Reply Last reply Reply Quote 0
                    • olivierlambertO Online
                      olivierlambert Vates 🪐 Co-Founder CEO
                      last edited by

                      Latest commit on master branch is 2de9984945a970b7e2273e69c4c89662156d1824.

                      Are you sure you are on master, right? (do a git branch to so it)

                      JamfoFLJ 1 Reply Last reply Reply Quote 0
                      • JamfoFLJ Offline
                        JamfoFL @olivierlambert
                        last edited by

                        olivierlambert Thank you... yes, shortly after I replied, I did figure out how to "roll back" and am currently on commit bf51b945c5348ba76459e603be87416d3415b264, which is the last commit I was on when everything worked. With my XO on that commit, I am now able to run my routine backups and am in the process of doing so! I had no backups due to the error, so I want to get at least one good one for my test machines to make sure I'm covered.

                        I ran the "git branch" as you asked and can confirm I see two branches listed, master and cr-issue. The "*" was next to master (and it was highlighted in green), so I am assuming that is the current branch I am on.

                        So... now that everything appears to be working again, are you saying you would like me to run "git checkout 2de9984945a970b7e2273e69c4c89662156d1824" to update to the latest version? Does that have the effect of bypassing all of the commits between the working bf51b and the very latest 2de99? I'd hate to have done all of this and got my backups working only to break them again. I do realize I need to get up-to-date at some point!

                        Thanks!

                        1 Reply Last reply Reply Quote 0
                        • olivierlambertO Online
                          olivierlambert Vates 🪐 Co-Founder CEO
                          last edited by olivierlambert

                          You can't stick forever at a commit "because it works". As a source user, you need to stay up to date on a recent basis, otherwise you might have other issues you can't see at first.

                          But in your case, your useful contribution in the project can be to check the commit just after the "working one" for you (so the commit just after bf51b94) and see if it still works. If it doesn't, it's certain that the issue is the first faulty commit. It will help julien-f to find the bug 🙂

                          You can instantly rollback to the previous known commit for you.

                          ⚠ If you don't rebuild after a commit change, this won't change anything. You must rebuild every time you change a commit or update to the latest commit!

                          JamfoFLJ 1 Reply Last reply Reply Quote 0
                          • JamfoFLJ Offline
                            JamfoFL @olivierlambert
                            last edited by

                            olivierlambert OK... I am happy to help! 😊

                            Let me take a little time to capture good, working backups of my lab environment so I have that peace of mind, and then I will start applying each of the subsequent commits until I can report the specific commit that starts the issue.

                            This may take a little time, as I can report when it first happened (on Monday, Feburary 6th) everything did seem to work for several hours after I applied the three commits that were published that day:

                            • 2f65a86aa08a8c05de0f5d864994b560f528d364
                            • 2a70ebf66711030cf6e277aeabc64037548b9a6b
                            • 55920a58a32e01a3ea0b966c8f6f542f835e936a

                            It wasn't until well into the evening of Monday night when the first CR backups started to fail, and everything seemed to cascade from there. So, I will incrementally apply each commit and will let it run for a day or two (unless if fails sooner, of course). Once I have the definitive commit where things break, I will report that back right away.

                            Thanks you for all your help!

                            1 Reply Last reply Reply Quote 1
                            • olivierlambertO Online
                              olivierlambert Vates 🪐 Co-Founder CEO
                              last edited by

                              Thank you for helping us to track down the issue, this is truly helpful 👍

                              1 Reply Last reply Reply Quote 0
                              • JamfoFLJ Offline
                                JamfoFL @julien-f
                                last edited by

                                OK... just an update. Everything has been working perfectly on bf51b94 for the past few days, so I have just updated to the next commit after that one, 263c23a.

                                I will continue to monitor and if things keep working, will move to the next commit in line.

                                julien-fJ 1 Reply Last reply Reply Quote 1
                                • julien-fJ Offline
                                  julien-f Vates 🪐 Co-Founder XO Team @JamfoFL
                                  last edited by

                                  JamfoFL Thank you 🙂

                                  JamfoFLJ 1 Reply Last reply Reply Quote 0
                                  • JamfoFLJ Offline
                                    JamfoFL @julien-f
                                    last edited by

                                    julien-f Everything still working well after the weekend. Switched to commit 2f65a86.

                                    Will report back!

                                    JamfoFLJ 1 Reply Last reply Reply Quote 0
                                    • JamfoFLJ Offline
                                      JamfoFL @JamfoFL
                                      last edited by

                                      All still working... have moved up to commit 55920a58a32e01a3ea0b966c8f6f542f835e936a.

                                      JamfoFLJ 1 Reply Last reply Reply Quote 0
                                      • JamfoFLJ Offline
                                        JamfoFL @JamfoFL
                                        last edited by

                                        Now on 9f4fce9daa75b418d099928d1ab37a0b2cdd3078...

                                        julien-fJ 1 Reply Last reply Reply Quote 0
                                        • julien-fJ Offline
                                          julien-f Vates 🪐 Co-Founder XO Team @JamfoFL
                                          last edited by

                                          JamfoFL Thank you for your all your testing 🙂

                                          We have fixed a connection issue last Friday, and I'm currently working on additional fixes, I'll let you know as soon as it's available so that you will be able to test latest changes!

                                          JamfoFLJ 2 Replies Last reply Reply Quote 0
                                          • JamfoFLJ Offline
                                            JamfoFL @julien-f
                                            last edited by

                                            julien-f Awesome! Thank you for the update.

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post