XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Host crash during backup

    Scheduled Pinned Locked Moved Backup
    13 Posts 5 Posters 1.4k Views 4 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • julien-fJ Offline
      julien-f Vates 🪐 Co-Founder XO Team @Mathieu
      last edited by

      @Mathieu Interrupted means that XO was stopped (abruptly) during the backup run, it has no information why just that the backup run could not finish.

      As @Danp said, in this case you need to check out the pool's and host's logs to figure out why it crashed.

      MathieuM 1 Reply Last reply Reply Quote 0
      • MathieuM Offline
        Mathieu @julien-f
        last edited by

        @julien-f

        Hello,

        I can definitely link the crash to the backup job, as it occurs a few minutes after I started the job and it happened several times.
        That backup was a fresh new job, just a full backup of 8 VMs from a single host to a NFS remote, no delta involved.

        I've check xensource.log, daemon.log, kernel.log and SMlog, nothing obvious appears to me, but maybe I'm missing something.
        If you want to have look, the host crashed at about 09:20:40 and started back at 09:24

        /var/crash is empty, nothing there.

        The pool is for now a single host, nothing fancy on that side.

        Should I check other log files?

        Many thanks,

        1 Reply Last reply Reply Quote 0
        • DanpD Offline
          Danp Pro Support Team
          last edited by

          What type of server hardware are you using? Have you performed a test on the RAM to check for issues?

          MathieuM 1 Reply Last reply Reply Quote 0
          • MathieuM Offline
            Mathieu @Danp
            last edited by

            @Danp
            It's a barebone server from ASRock Rack 1U4LW-X570/2L2T RPSU
            CPU : AMD Ryzen 9 5950X
            RAM : 4 x 32 GB ECC DDR4

            I'm not on site right now, I will try a memchek asap to see if there are some errors appearing.

            Except for these crashes, the host had been stable with the VMs for two weeks.

            MathieuM 1 Reply Last reply Reply Quote 0
            • MathieuM Offline
              Mathieu @Mathieu
              last edited by

              So, after a night of memtest and a few hours of cpu stress test, no crash or error with the RAM modules.

              In the meantime, I moved my XO VM to a different host (outside of the pool I'm backing up) and now everything seems OK, no more crashes.
              I still don't have any explication for the host crash, but at least I got the backup working.

              BrantleyHobbsB 1 Reply Last reply Reply Quote 0
              • BrantleyHobbsB Offline
                BrantleyHobbs @Mathieu
                last edited by

                @Mathieu did this ever happen to you again?

                I recently started having this happen as well, on a host that has been running rock solid for 6-8 months now. Coincidentally also a Ryzen 9 (a 7900X). The crashes don't always happen, but they are pretty frequent, and in some cases it has caused data corruption.

                MathieuM DanpD 2 Replies Last reply Reply Quote 0
                • MathieuM Offline
                  Mathieu @BrantleyHobbs
                  last edited by

                  @BrantleyHobbs
                  No more crashes with backup for a long time, now.
                  My pool has been upgraded with a second host an XO is running in one of them without any hassle.

                  1 Reply Last reply Reply Quote 0
                  • DanpD Offline
                    Danp Pro Support Team @BrantleyHobbs
                    last edited by

                    @BrantleyHobbs You may want to provide some additional details on your setup, ie:

                    • Version of XCP-ng
                    • Are the host's fully patched?
                    • Current XO version or commit
                    • etc

                    Have you checked /var/crash subdirectory on the crashed host to see if kernel crash logs were captured? https://docs.xcp-ng.org/troubleshooting/log-files/

                    BrantleyHobbsB 1 Reply Last reply Reply Quote 0
                    • BrantleyHobbsB Offline
                      BrantleyHobbs @Danp
                      last edited by

                      @Danp fully patched 8.3 (through the most recent May patches). XO commit d810e (master commit e3a58). I usually update XO around the first of each month; so it's a little bit behind master.

                      There are crash logs in /var/crash. I can provide a log bundle if needed, or copy/paste some info here if I know what to provide.

                      BrantleyHobbsB 1 Reply Last reply Reply Quote 0
                      • BrantleyHobbsB Offline
                        BrantleyHobbs @BrantleyHobbs
                        last edited by

                        Friends, I am once again here to make stupid mistakes so that you don't have to: it appears that the reason my backups were causing the machine to crash/hang is that:
                        A) I was making DR replicas to local storage (the same device the host is booting from; not the same partition)

                        B) I was filling it up.

                        Again, this is not a production environment, simply a home lab, and I'm a bit resource poor, and I was looking for something other than my main disk repository for quick recovery. Making use of unused disk space left over on the boot device seemed like a Good Idea at the time.

                        I added an additional physical disk specifically for DR replication and all my problems stopped.

                        Hope that helps someone in the future.

                        P 1 Reply Last reply Reply Quote 0
                        • P Offline
                          Pilow @BrantleyHobbs
                          last edited by

                          @BrantleyHobbs 0669c00d-f209-47fc-bf42-4a935393efa2-image.jpeg

                          facepalm situation ! thanks for the feedback 😄

                          1 Reply Last reply Reply Quote 0

                          Hello! It looks like you're interested in this conversation, but you don't have an account yet.

                          Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.

                          With your input, this post could be even better 💗

                          Register Login
                          • First post
                            Last post