XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Issue after latest host update

    Scheduled Pinned Locked Moved XCP-ng
    57 Posts 9 Posters 8.7k Views 9 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • olivierlambertO Online
      olivierlambert Vates 🪐 Co-Founder CEO
      last edited by

      Could you do a mem test on the current master?

      What kind of storage are you using?

      1 Reply Last reply Reply Quote 0
      • nikadeN Offline
        nikade Top contributor
        last edited by

        dmesg looks like, probably something else borked here.
        You wrote in the reddit thread that you were able to start VM's but they never actually started and the task was stuck at 1.000 progress, is that still the case after electing a new master?

        If yes, check "xentop" on the host where the VM was started to see if it's consuming resources.

        1 Reply Last reply Reply Quote 0
        • olivierlambertO Online
          olivierlambert Vates 🪐 Co-Founder CEO
          last edited by

          Yeah I'm baffled because this is not something we've seen before on a "normal" setup, I really wonder where the problem lies 🤔

          1 Reply Last reply Reply Quote 0
          • RealTehrealR Offline
            RealTehreal
            last edited by

            @olivierlambert The issue started on all three hosts after the latest update via yum update. I can't think of three devices having faulty memory, just one after another. Before the issue, I used a NFS share as VM storage. But I already deployed XOA on local storage (LVM). Same issue on all three hosts.

            @nikade First, I'll redeploy XOA on the pool master and take a look at xentop. Regarding xsconfig, every VM runs with one vCore at 100% all the time and not responding to anything. xe vm-list always lists them as running, though. Being in this state, the only way to shut down VMs is forced shutdown, since they won't react to soft shutdown command.

            I never had such issues, either. I'm running my setup for about a year now, did several updates via cli. Likewise, I'm baffled, that everything suddenly went down the flush, too.

            1 Reply Last reply Reply Quote 0
            • RealTehrealR Offline
              RealTehreal
              last edited by

              xentop shows XOA consuming 100.0 CPU (%), meaning one core. But quick deployment is stuck at "almost there", until it times out. The VM is still consuming one CPU core, while not being accessible.

              1 Reply Last reply Reply Quote 0
              • nikadeN Offline
                nikade Top contributor
                last edited by

                I cant really understand what happend to be honest, i've done this many times without issues.
                What can you see in the console tab of the VM when u start it? Or in the stats tab?

                RealTehrealR 1 Reply Last reply Reply Quote 0
                • J Offline
                  john.c
                  last edited by

                  @RealTehreal What's the state of the network stack is it up and what's the activity percentage?

                  1 Reply Last reply Reply Quote 0
                  • RealTehrealR Offline
                    RealTehreal
                    last edited by

                    @nikade @john-c I'm not sure... how do I elaborate? At least, I can ssh into the hosts and never disconnect.

                    1 Reply Last reply Reply Quote 0
                    • RealTehrealR Offline
                      RealTehreal @nikade
                      last edited by

                      @nikade said in Issue after latest host update:

                      I cant really understand what happend to be honest, i've done this many times without issues.
                      What can you see in the console tab of the VM when u start it? Or in the stats tab?

                      I can'T see anything, because XOA itself is inaccessible, since it's a VM. And VMs won't start into a usable state.

                      J 1 Reply Last reply Reply Quote 0
                      • J Offline
                        john.c @RealTehreal
                        last edited by john.c

                        @RealTehreal said in Issue after latest host update:

                        @nikade said in Issue after latest host update:

                        I cant really understand what happend to be honest, i've done this many times without issues.
                        What can you see in the console tab of the VM when u start it? Or in the stats tab?

                        I can'T see anything, because XOA itself is inaccessible, since it's a VM. And VMs won't start into a usable state.

                        Anything in the XCP-ng 8.2.1 host logs for it attempting to start the VM and generally? It may hold clues, about any underlying issues.

                        Also any appropriate logs for the NFS storage server would help, as that may reveal anything that can be causing issues on its end.

                        RealTehrealR 1 Reply Last reply Reply Quote 0
                        • olivierlambertO Online
                          olivierlambert Vates 🪐 Co-Founder CEO
                          last edited by

                          Any specific MTU settings?

                          RealTehrealR 1 Reply Last reply Reply Quote 0
                          • olivierlambertO Online
                            olivierlambert Vates 🪐 Co-Founder CEO
                            last edited by

                            A way to check if it's not network related would be using a local SR to boot a VM and see if it works.

                            RealTehrealR 1 Reply Last reply Reply Quote 0
                            • RealTehrealR Offline
                              RealTehreal @john.c
                              last edited by

                              @john-c I already took a look at dmesg and /var/log/xensource.log (I crawled through >1k log lines) and couldn't find anything revealing. The NFS server is unrelated, because, as stated before, I currently only use host's local storage to eliminate possible external issues.

                              1 Reply Last reply Reply Quote 0
                              • RealTehrealR Offline
                                RealTehreal @olivierlambert
                                last edited by

                                @olivierlambert That's what I'm doing, to make sure, it's not a network related issue.

                                1 Reply Last reply Reply Quote 0
                                • RealTehrealR Offline
                                  RealTehreal @olivierlambert
                                  last edited by

                                  @olivierlambert I didn't change anything, at least. Just yum update and it went down the flush.

                                  1 Reply Last reply Reply Quote 0
                                  • olivierlambertO Online
                                    olivierlambert Vates 🪐 Co-Founder CEO
                                    last edited by

                                    I'm not sure the yum update is really related. It could be a coincidence, otherwise we would have been swamped in similar reports. Or it's a very specific combo that's unseen elsewhere.

                                    What kind of hardware are we talking about?

                                    RealTehrealR 1 Reply Last reply Reply Quote 0
                                    • RealTehrealR Offline
                                      RealTehreal @olivierlambert
                                      last edited by

                                      @olivierlambert I finally made some progress. And it really seems to be update related.

                                      I took one of the hosts and plugged a display and keyboard into it. When booting up, I can choose to use an older version of Xen from the boot menu. Doing so makes VMs work again.

                                      Culprit: Xen 4.13.5-9.39 (current default)
                                      Working: Xen 4.13.4-9.19.1 (which I can choose from boot menu)

                                      All three hosts are Fujitsu Futro 740 thin clients.

                                      J 1 Reply Last reply Reply Quote 0
                                      • J Offline
                                        john.c @RealTehreal
                                        last edited by john.c

                                        @RealTehreal said in Issue after latest host update:

                                        @olivierlambert I finally made some progress. And it really seems to be update related.

                                        I took one of the hosts and plugged a display and keyboard into it. When booting up, I can choose to use an older version of Xen from the boot menu. Doing so makes VMs work again.

                                        Culprit: Xen 4.13.5-9.39 (current default)
                                        Working: Xen 4.13.4-9.19.1 (which I can choose from boot menu)

                                        All three hosts are Fujitsu Futro 740 thin clients.

                                        What's the BIOS version of the Fujitsu Futro 740 and also the more exact model please? There's lots of Fujitsu Futro 740 thin clients, so you could be using any one of them.

                                        RealTehrealR 1 Reply Last reply Reply Quote 0
                                        • RealTehrealR Offline
                                          RealTehreal @john.c
                                          last edited by

                                          @john-c
                                          Model: FUJITSU FUTRO S740/D3544-A1
                                          BIOS: V5.0.0.13 R1.13.0 for D3544-A1x (09/23/2022)

                                          J 1 Reply Last reply Reply Quote 0
                                          • J Offline
                                            john.c @RealTehreal
                                            last edited by

                                            @RealTehreal said in Issue after latest host update:

                                            @john-c
                                            Model: FUJITSU FUTRO S740/D3544-A1
                                            BIOS: V5.0.0.13 R1.13.0 for D3544-A1x (09/23/2022)

                                            Thanks that will help. As it enables identification if there's any issues, specific to that device. As well as its specific included CPU and its functions and features, especially its instruction set capabilities.

                                            RealTehrealR 1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post