XCP-ng

    • Register
    • Login
    • Search
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups

    desperately searching for solution for xe command timeouts and xcp-ng crashes

    Compute
    4
    11
    150
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • olivierlambert
      olivierlambert Vates 🪐 Co-Founder🦸 CEO 🧑‍💼 last edited by

      Hi,

      I would start to read the XAPI log, xensource.log which is where you have all XAPI thing happening.

      See https://docs.xcp-ng.org/Troubleshooting for more details

      C 1 Reply Last reply Reply Quote 0
      • C
        chcnetconsulting @olivierlambert last edited by olivierlambert

        @olivierlambert Thank you, I was doing this already. I do not really understand, what is going on....
        The server is in a private network, not accessible from the bad internet 😉

        I do not understand, why there is a Session authentication failed....

        This is happening all 2 seconds:

        # tail -f /var/log/xensource.log
        Jan 10 13:04:23 ahbxen1 xapi: [debug||963 |org.xen.xapi.xenops.classic events D:ae7d342ca10e|xenops] Processing event: ["Vm","5d51c38b-5260-2903-ae21-4bbe607fb99c"]
        Jan 10 13:04:23 ahbxen1 xapi: [debug||963 |org.xen.xapi.xenops.classic events D:ae7d342ca10e|xenops] xenops event on VM 5d51c38b-5260-2903-ae21-4bbe607fb99c
        Jan 10 13:04:23 ahbxen1 xenopsd-xc: [debug||72617 |org.xen.xapi.xenops.classic events D:ae7d342ca10e|xenops_server] VM.stat 5d51c38b-5260-2903-ae21-4bbe607fb99c
        Jan 10 13:04:23 ahbxen1 xapi: [debug||963 |org.xen.xapi.xenops.classic events D:ae7d342ca10e|xenops] xenopsd event: processing event for VM 5d51c38b-5260-2903-ae21-4bbe607fb99c
        Jan 10 13:04:23 ahbxen1 xapi: [debug||963 |org.xen.xapi.xenops.classic events D:ae7d342ca10e|xenops] Supressing VM.allowed_operations update because guest_agent data is largely the same
        Jan 10 13:04:23 ahbxen1 xapi: [debug||963 |org.xen.xapi.xenops.classic events D:ae7d342ca10e|xenops] xenopsd event: Updating VM 5d51c38b-5260-2903-ae21-4bbe607fb99c domid 14 guest_agent
        Jan 10 13:04:27 ahbxen1 xapi: [debug||5755278 INET :::80||dummytaskhelper] task dispatch:session.logout D:cfcb0866d8ec created by task D:b4fe67a1bb65
        Jan 10 13:04:30 ahbxen1 xapi: [debug||5755279 UNIX /var/lib/xcp/xapi||cli] xe pool-param-get uuid=Stopping ha-lizard= (via= systemctl):= [= OK= ]= param-name=other-config param-key=XenCenter.CustomFields.ha-lizard-enabled username=root password=(omitted)
        Jan 10 13:04:30 ahbxen1 xapi: [ info||5755279 UNIX /var/lib/xcp/xapi|session.login_with_password D:655f9bce2d1c|xapi] Session.create trackid=24d93f0af567eed5f81fb70f2557487e pool=false uname=root originator=cli is_local_superuser=true auth_user_sid= parent=trackid=9834f5af41c964e225f24279aefe4e49
        Jan 10 13:04:32 ahbxen1 xcp-rrdd: [ info||7 ||rrdd_main] memfree has changed to 5363956 in domain 6
        Jan 10 13:04:35 ahbxen1 xapi: [ info||5755280 INET :::80|session.login_with_password D:2d12df15927e|xapi] Failed to locally authenticate user root from HTTP request from Internet with User-Agent: xmlrpclib.py/1.0.1 (by www.pythonware.com): Authentication failure
        Jan 10 13:04:35 ahbxen1 xapi: [debug||5755282 UNIX /var/lib/xcp/xapi||cli] xe host-list name-label=ahbxen1 minimal=true username=root password=(omitted)
        Jan 10 13:04:35 ahbxen1 xapi: [ info||5755282 UNIX /var/lib/xcp/xapi|session.login_with_password D:87466e0161dd|xapi] Session.create trackid=249a86916b75d708a2c52adb1f011eed pool=false uname=root originator=cli is_local_superuser=true auth_user_sid= parent=trackid=9834f5af41c964e225f24279aefe4e49
        Jan 10 13:04:40 ahbxen1 xapi: [debug||1038 scanning_thread|SR scanner D:f2340ef7fc82|xapi_sr] Automatically scanning SRs = [ OpaqueRef:cf346a4f-4981-412d-a057-9b386d8bd2d6 ]
        Jan 10 13:04:40 ahbxen1 xapi: [error||5755280 INET :::80||backtrace] session.login_with_password D:2d12df15927e failed with exception Server_error(SESSION_AUTHENTICATION_FAILED, [ root; Authentication failure ])
        Jan 10 13:04:40 ahbxen1 xapi: [error||5755280 INET :::80||backtrace] Raised Server_error(SESSION_AUTHENTICATION_FAILED, [ root; Authentication failure ])
        Jan 10 13:04:40 ahbxen1 xapi: [error||5755280 INET :::80||backtrace] 1/8 xapi Raised at file ocaml/xapi/xapi_session.ml, line 405
        Jan 10 13:04:40 ahbxen1 xapi: [error||5755280 INET :::80||backtrace] 2/8 xapi Called from file ocaml/xapi/xapi_session.ml, line 40
        Jan 10 13:04:40 ahbxen1 xapi: [error||5755280 INET :::80||backtrace] 3/8 xapi Called from file ocaml/xapi/xapi_session.ml, line 40
        Jan 10 13:04:40 ahbxen1 xapi: [error||5755280 INET :::80||backtrace] 4/8 xapi Called from file ocaml/xapi/server_helpers.ml, line 83
        Jan 10 13:04:40 ahbxen1 xapi: [error||5755280 INET :::80||backtrace] 5/8 xapi Called from file ocaml/xapi/server_helpers.ml, line 99
        Jan 10 13:04:40 ahbxen1 xapi: [error||5755280 INET :::80||backtrace] 6/8 xapi Called from file lib/xapi-stdext-pervasives/pervasiveext.ml, line 24
        Jan 10 13:04:40 ahbxen1 xapi: [error||5755280 INET :::80||backtrace] 7/8 xapi Called from file lib/xapi-stdext-pervasives/pervasiveext.ml, line 35
        Jan 10 13:04:40 ahbxen1 xapi: [error||5755280 INET :::80||backtrace] 8/8 xapi Called from file lib/backtrace.ml, line 177
        Jan 10 13:04:40 ahbxen1 xapi: [error||5755280 INET :::80||backtrace]
        Jan 10 13:04:42 ahbxen1 xapi: [ info||5755283 UNIX /var/lib/xcp/xapi||cli] xe message-create name=HA-Lizard - xe_wrapper priority=1 body=xe_wrapper: COMMAND: xe pool-param-get pool-uuid=c1dbc848-aa29-1603-2af7-078466842ac2 username=root password=(omitted)
        
        1 Reply Last reply Reply Quote 0
        • olivierlambert
          olivierlambert Vates 🪐 Co-Founder🦸 CEO 🧑‍💼 last edited by

          1. Do you have the same root password on both machines?
          2. Do you have enough disk space on both hosts?
          3. Double check if you don't have any IP conflict somewhere
          C 1 Reply Last reply Reply Quote 0
          • C
            chcnetconsulting last edited by chcnetconsulting

            It seems to be a ha-lizard related problem.

            As I said, this cluster was running 2 years flawless. just the last 4 weeks theses weird problems.

            There seems to be a restapi request with bad password. although it is possible to send those xe commands without password...

            Kind regards
            Christoph

            tjkreidl 1 Reply Last reply Reply Quote 0
            • C
              chcnetconsulting @olivierlambert last edited by

              @olivierlambert
              diskspace - ok
              network - no conflicts
              root password is also on both machines the same.

              kr
              Christoph

              1 Reply Last reply Reply Quote 0
              • olivierlambert
                olivierlambert Vates 🪐 Co-Founder🦸 CEO 🧑‍💼 last edited by

                So go check with HA lizard people then 🙂

                1 Reply Last reply Reply Quote 0
                • tjkreidl
                  tjkreidl Ambassador 📣 @chcnetconsulting last edited by

                  @chcnetconsulting Are you running the latest version of HA-Lizard? They care very responsive to issues if you contact them. And, yes, see if you see anything odd in the logs. Also make sure you hosts are properly time-synched with each other.

                  C 1 Reply Last reply Reply Quote 0
                  • C
                    chcnetconsulting @tjkreidl last edited by

                    @tjkreidl hi, it is 2.2.3-1 the latest version. All of a sudden it seems. that the cluster is working again... no timeouts anymore. totally weird.

                    BUT - xcp-ng center takes long time to synchronize the hosts.

                    Before the timeouts ended. I had fixed a bug in /etc/ha-lizard/ha-lizard.func in line 645, where the were reading a pool-param-get what did never work. and obviously this was running much too fast and spawning so many processes, that the rest-api was dead.

                    cbf84da1-4052-4c60-9867-0701e5d44ef4-image.png

                    This fixed a million error-notifications with in the error log. After restarting ha-lizard (service ha-lizard restart), everything returned back to normal.

                    Although the timeouts are history the wrong query wich creates the backtrace (authentication error) is not fixed yet. Probably this is an issue. the guys at ha-lizard know how to fix.

                    Thank you for helping me out!
                    kind regards
                    Christoph

                    tjkreidl A 2 Replies Last reply Reply Quote 0
                    • tjkreidl
                      tjkreidl Ambassador 📣 @chcnetconsulting last edited by

                      @chcnetconsulting Glad to hear and nice debugging work! Yes, the HA-Lizard folks are very responsive and I'm sure will have this taken care of in the next release.
                      I published an article originally on xenserver.org back in 2016 on tests and improvements to HA-Lizard I did in cooperation with the company, but alas, the chart with all the findings didn't translate properly when taken over by this site. I may have the original squirreled away somewhere.
                      https://xenserver.pl/citrix-xenserver/xenserver-high-availability-alternative-ha-lizard-2/9122

                      1 Reply Last reply Reply Quote 0
                      • A
                        Ajmind 0 @chcnetconsulting last edited by

                        @chcnetconsulting

                        Just to mention here that your problem(s) have been addressed in the most recent version of HA-Lizard (2.3.1).

                        Simply upgrading and you are happy again 🙂

                        1 Reply Last reply Reply Quote 0
                        • First post
                          Last post