XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    XOA 5.107.2 Backup Failure via SMB and S3 (Backblaze)

    Scheduled Pinned Locked Moved Backup
    16 Posts 3 Posters 139 Views 3 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • planedropP Offline
      planedrop Top contributor
      last edited by

      This is a new one, just updated XOA to 5.107.2 and now my backups are no longer working.

      I have support and can put in a ticket, but figured it's better to try here first.

      I am getting an error: Fail to connect to any Nbd client on the backups to Backblaze and on my SMB backups I just get a Footer1 !== footer2 error.

      What's important here is that it's only about half my VMs, and this is a single host setup, so the NBD client issues don't really make sense to me, unless I'm misunderstanding something about NBD.

      Anyone else seeing issues with backups after this update?

      Also not seeing anything consistent, not like an issue with Windows VMs in specific, it seems random.

      planedropP 1 Reply Last reply Reply Quote 1
      • planedropP Offline
        planedrop Top contributor @planedrop
        last edited by

        Going to give it more time, but restarting all the backups seems to have fixed the issue. Unsure why they would fail once and then resume just fine though.

        planedropP 1 Reply Last reply Reply Quote 0
        • planedropP Offline
          planedrop Top contributor @planedrop
          last edited by

          Still seeing this issue, trying to pinpoint it but haven't had any luck. It seems like each VM is about a 50/50 chance if it fails or succeeds, but the logs don't really lead me to anything and there's no consistent reason why it would be happening that I can find.

          This is only since going to 5.107.2 as well, wasn't happening on the previous version (which I unfortunately don't recall the version number of).

          1 Reply Last reply Reply Quote 0
          • olivierlambertO Offline
            olivierlambert Vates 🪐 Co-Founder CEO
            last edited by

            Do you have the issue on stable release channel?

            planedropP 1 Reply Last reply Reply Quote 0
            • planedropP Offline
              planedrop Top contributor @olivierlambert
              last edited by

              @olivierlambert Good question, I am on Latest by mistake in this environment actually.

              Is it safe to roll back to Stable channel even though I am already on latest?

              1 Reply Last reply Reply Quote 0
              • olivierlambertO Offline
                olivierlambert Vates 🪐 Co-Founder CEO
                last edited by

                Yes it should be safe 🙂

                planedropP 1 Reply Last reply Reply Quote 0
                • planedropP Offline
                  planedrop Top contributor @olivierlambert
                  last edited by

                  @olivierlambert I will give this a shot and report back. It may be a day or so, one of the backups is still running (very large VM over S3 so takes a while) but once it's done I will go back and see if the failures go away.

                  1 Reply Last reply Reply Quote 1
                  • R Offline
                    ravenet
                    last edited by

                    I was seeing similar broken backups in 5.107. Reverting to 5.106.4 seems to have resolved so far. Have a case open with logs submitted

                    planedropP 1 Reply Last reply Reply Quote 1
                    • planedropP Offline
                      planedrop Top contributor @ravenet
                      last edited by

                      @ravenet @olivierlambert yeah going back to 5.106 seems to have resolved the issue. I want to give it one more day before saying 100% that it did, but all VMs in both my backup jobs last night finished properly.

                      1 Reply Last reply Reply Quote 0
                      • olivierlambertO Offline
                        olivierlambert Vates 🪐 Co-Founder CEO
                        last edited by

                        Okay thanks, the backup code made a big leap in latest, so there's maybe something fishy in there 🤔 @florent : might be a lead to find a potential bug in the new code

                        planedropP R 2 Replies Last reply Reply Quote 0
                        • planedropP Offline
                          planedrop Top contributor @olivierlambert
                          last edited by

                          @olivierlambert Happy to help in any way that I can as well!

                          Notably, I am not seeing any issues doing backups to SMB or S3 with my lab at home which is on the latest. My lab is XCP-ng 8.3 though, rather than 8.2 like this production setup (which will be getting upgraded to 8.3 now that it's LTS), so maybe something specific with the new backup code and 8.2?

                          1 Reply Last reply Reply Quote 0
                          • olivierlambertO Offline
                            olivierlambert Vates 🪐 Co-Founder CEO
                            last edited by

                            It's more likely related to your XO backup code than XCP-ng version (my gut feeling ATM)

                            planedropP 1 Reply Last reply Reply Quote 1
                            • planedropP Offline
                              planedrop Top contributor @olivierlambert
                              last edited by

                              @olivierlambert Gotcha. I'll see if I can get this issue to replicate in my lab at all but so far my backups have been smooth over there.

                              I'll try to re-create more similar backup jobs in the lab as well, maybe it's a specific setting or something on my jobs.

                              1 Reply Last reply Reply Quote 0
                              • R Offline
                                ravenet @olivierlambert
                                last edited by

                                @olivierlambert

                                @florent already jumped in on our case and submitted a fix for "_removeUnusedSnapshots don t handle vdi related to multiple VMs" that we were seeing.
                                We have a vdi that won't coalesce, so I need to reopen that case. I think this above error was triggered from this angry vdi and my previous attempt to fix it.

                                It was also noted that 5.107 was ignoring our backup setting for concurrent backups and was running all 24vms at once instead of the 3 we had set. Reverting to 5.106.4 resolved this. Waiting for an update on what's broken in 5.107 to ignore this setting. Different timezone so assume I'll hear tonight.

                                I'm on 8.3 as well and fully updated with latest patches

                                planedropP 1 Reply Last reply Reply Quote 0
                                • planedropP Offline
                                  planedrop Top contributor @ravenet
                                  last edited by

                                  @ravenet All of my errors seemed related to NBD access, so if the concurrency setting was being ignored, that might be the source of the issue I was seeing.

                                  I'll watch my lab as well and see if the concurrency is being respected or not on the latest from the sources build.

                                  Glad to see you were on 8.3, so not related to me being on 8.2.

                                  R 1 Reply Last reply Reply Quote 0
                                  • R Offline
                                    ravenet @planedrop
                                    last edited by

                                    @planedrop I was getting a lot of NBD errors as well. so I'm not positive on if it was fully just ignoring the concurrency, or just moving onto next backup because of nbd communication error then just leaving the previous backups under active attempt. Either way, there's a bug if it leaves backup 'active' and then starts another one beyond set concurrent limit.

                                    1 Reply Last reply Reply Quote 0
                                    • First post
                                      Last post