XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    VM Shuts down on 24th of every month

    Scheduled Pinned Locked Moved Compute
    13 Posts 5 Posters 1.2k Views 3 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • A Offline
      asai
      last edited by olivierlambert

      Greetings,

      We have a VM that has started to hard shutdown on the 24th of every month for the last 3 months. There's nothing in the crontab (CentOS 7), and the xen logs show things like below. Can anyone give us some advice on where to start looking for solutions here? Thank you for any assistance you can render.

      Feb 24 23:59:42 monota xenopsd-xc: [debug|monota|7 ||xenops] EPOLL error on domain-17, close QMP socket
      Feb 24 23:59:43 monota xenopsd-xc: [debug|monota|17 |Parallel:task=3310620.atoms=2.(VBD.unplug vm=6e91bf5a-6f02-4597-6e37-30698b4e539c)|xenops] Device.Generic.hard_shutdown about to blow away backend and error paths
      Feb 24 23:59:43 monota xenopsd-xc: [debug|monota|15 |Parallel:task=3310620.atoms=2.(VBD.unplug vm=6e91bf5a-6f02-4597-6e37-30698b4e539c)|xenops] Device.Generic.hard_shutdown about to blow away backend and error paths
      Feb 24 23:59:43 monota xenopsd-xc: [debug|monota|17 |Parallel:task=3310620.atoms=2.(VBD.unplug vm=6e91bf5a-6f02-4597-6e37-30698b4e539c)|xenops] xenstore-rm /local/domain/0/error/backend/vbd3/17
      Feb 24 23:59:43 monota xenopsd-xc: [debug|monota|17 |Parallel:task=3310620.atoms=2.(VBD.unplug vm=6e91bf5a-6f02-4597-6e37-30698b4e539c)|xenops] xenstore-rm /local/domain/17/error/device/vbd/768
      Feb 24 23:59:43 monota xenopsd-xc: [debug|monota|15 |Parallel:task=3310620.atoms=2.(VBD.unplug vm=6e91bf5a-6f02-4597-6e37-30698b4e539c)|xenops] xenstore-rm /local/domain/0/error/backend/vbd3/17
      Feb 24 23:59:43 monota xenopsd-xc: [debug|monota|15 |Parallel:task=3310620.atoms=2.(VBD.unplug vm=6e91bf5a-6f02-4597-6e37-30698b4e539c)|xenops] xenstore-rm /local/domain/17/error/device/vbd/5696
      Feb 24 23:59:44 monota xenopsd-xc: [debug|monota|10 |events|xenops] Device.Generic.hard_shutdown about to blow away backend and error paths
      Feb 24 23:59:44 monota xenopsd-xc: [debug|monota|10 |events|xenops] xenstore-rm /local/domain/0/error/backend/vif/17
      Feb 24 23:59:44 monota xenopsd-xc: [debug|monota|10 |events|xenops] xenstore-rm /local/domain/17/error/device/vif/0
      Feb 24 23:59:45 monota xapi: [debug|monota|561 |xapi events D:34fe87c9d9cc|helpers] Helpers.call_api_functions failed to logout: Server_error(SESSION_INVALID, [ OpaqueRef:ce31d273-1eaf-440b-8632-69a6282b278c ]) (ignoring)
      
      1 Reply Last reply Reply Quote 0
      • olivierlambertO Offline
        olivierlambert Vates 🪐 Co-Founder CEO
        last edited by

        Do you have any logs in the guest operating system?

        DanpD 1 Reply Last reply Reply Quote 0
        • DanpD Offline
          Danp Pro Support Team @olivierlambert
          last edited by

          Are you using XO? Any backups or jobs scheduled to run only on the 24th?

          A 1 Reply Last reply Reply Quote 0
          • A Offline
            asai @Danp
            last edited by

            @danp and @olivierlambert ,

            Thanks for the response. There are no logs in the guest VM, just a cutoff in /var/log/messages.

            I am using XO, and using Delta Backups, but they were not happening on the 24th consistently for the last 3 months, only last month did a backup happen on the 24th. They're scheduled on Mon., Wed., and Fri.

            Yes, this is a very odd thing. I've been running Xen VMs since 2007 and have never seen this kind of thing.

            D 1 Reply Last reply Reply Quote 0
            • D Offline
              D_J @asai
              last edited by

              @asai Did you ever figure it out? Just happened to me today (oddly enough, on the 24th) with the same messages in the log...

              A 1 Reply Last reply Reply Quote 0
              • A Offline
                asai @D_J
                last edited by

                @d_j , man that's weird.

                It hasn't happened again, but it happened 3 months in a row. Dec. - Feb.

                Super weird.

                D 1 Reply Last reply Reply Quote 0
                • D Offline
                  D_J @asai
                  last edited by

                  @asai Thanks for the response! Not only do I have the same messages but the C Drive was wiped (almost like every file was deleted except those which were open, the server has 6 other drives which weren't affected). I'm suspecting an issue with the Storage Repository because it's the only drive on that repository. I do have a weekly backup that runs on Sundays through Xen Orchestra. So extremely odd that it happened on the 24th, I was only googling the error message from the log! Ugh!

                  I ordered a new server ($$$$) which I'll transition all the VMs to and then I'll wipe and reload the current one and test/validate it from scratch since I don't trust it at this point.

                  D 1 Reply Last reply Reply Quote 0
                  • D Offline
                    dredknight @D_J
                    last edited by dredknight

                    Hey everyone,
                    bumped into this topic as we had similar issue with one of our test vms. full day of logs attached logs.tgz.txt

                    We found out that on the 5th of April one of the VMs shutdown. You can see the issues in the log after 14:10:00.

                    grep error <the log file>
                    to see the specific messages.

                    We are still investigating and not yet sure what is the problem but it seems like it is related to storage.

                    XCP is latest version 8.2.1.

                    D 1 Reply Last reply Reply Quote 0
                    • D Offline
                      dredknight @dredknight
                      last edited by

                      Reuploaded logs again because tgz was not allowed format. Added .txt extention so just remove it and extract.

                      1 Reply Last reply Reply Quote 0
                      • olivierlambertO Offline
                        olivierlambert Vates 🪐 Co-Founder CEO
                        last edited by

                        Hi,

                        I'm not sure it's related at all. Do you have HA enabled in your pool? How many hosts do you have? What's your shared storage?

                        D 1 Reply Last reply Reply Quote 0
                        • D Offline
                          dredknight @olivierlambert
                          last edited by

                          @olivierlambert this is just 1 single host in a single cluster. Only local SSD storage.

                          It is managed by Cloudstack, we are doing high performance tests on that server so nothing of value on it. I thought logs can help find the issue if such actually exists.

                          1 Reply Last reply Reply Quote 0
                          • olivierlambertO Offline
                            olivierlambert Vates 🪐 Co-Founder CEO
                            last edited by

                            It's hard to tell with just one log file. I can see that the VM is ordered to be shutdown (doesn't sound like a problem), but I can't tell why.

                            D 1 Reply Last reply Reply Quote 0
                            • D Offline
                              dredknight @olivierlambert
                              last edited by

                              @olivierlambert we couldn't find a reason either.
                              We will run more tests in the following weeks and report if we find anything of value.

                              1 Reply Last reply Reply Quote 0
                              • First post
                                Last post