XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login
    1. Home
    2. chcnetconsulting
    C
    Offline
    • Profile
    • Following 0
    • Followers 0
    • Topics 2
    • Posts 6
    • Groups 0

    chcnetconsulting

    @chcnetconsulting

    1
    Reputation
    4
    Profile views
    6
    Posts
    0
    Followers
    0
    Following
    Joined
    Last Online

    chcnetconsulting Unfollow Follow

    Latest posts made by chcnetconsulting

    • cluster slave no connection to pool

      Hi friends of xcp-ng

      I'm facing here a lot of issues, all probably because I have deciced years ago, to go for the ha-lizard solution. As a matter of fact quite nice one, but there have been technical problems, sync problems with the drbd discs etc, which are now in the state, after several rescue actions, that we ended up with the

      server1: master and pool master, all VM are running here.
      server2: Pool master is unreachable, nothing is configured, no network interfaces, no storage resources, nothing. No VMs are running here

      Can I fix the issue, by "Remove this Host from the Pool"?

      I would expect the following:

      • everything is deleted and reset on server2
      • the server reboots and we have a fresh machine without anything on it.
      • I could now create local storage and migrate all machines from the previous server1 (i.e. from the former pool) to server2, to also remove the pool?

      In other words, is it possible to remove a dead slave from the pool without creating big problems to the remaining master?

      Another issue: Because of issues with the VDI Chain, my backups won't work either... also alltogether a quite unpleasant situation.
      Is there an easy way, how to fix thi VDI chain issue? e.g. by deleteing all snapshots?

      BR
      Christoph

      posted in XCP-ng
      C
      chcnetconsulting
    • RE: desperately searching for solution for xe command timeouts and xcp-ng crashes

      @tjkreidl hi, it is 2.2.3-1 the latest version. All of a sudden it seems. that the cluster is working again... no timeouts anymore. totally weird.

      BUT - xcp-ng center takes long time to synchronize the hosts.

      Before the timeouts ended. I had fixed a bug in /etc/ha-lizard/ha-lizard.func in line 645, where the were reading a pool-param-get what did never work. and obviously this was running much too fast and spawning so many processes, that the rest-api was dead.

      cbf84da1-4052-4c60-9867-0701e5d44ef4-image.png

      This fixed a million error-notifications with in the error log. After restarting ha-lizard (service ha-lizard restart), everything returned back to normal.

      Although the timeouts are history the wrong query wich creates the backtrace (authentication error) is not fixed yet. Probably this is an issue. the guys at ha-lizard know how to fix.

      Thank you for helping me out!
      kind regards
      Christoph

      posted in Compute
      C
      chcnetconsulting
    • RE: desperately searching for solution for xe command timeouts and xcp-ng crashes

      @olivierlambert
      diskspace - ok
      network - no conflicts
      root password is also on both machines the same.

      kr
      Christoph

      posted in Compute
      C
      chcnetconsulting
    • RE: desperately searching for solution for xe command timeouts and xcp-ng crashes

      It seems to be a ha-lizard related problem.

      As I said, this cluster was running 2 years flawless. just the last 4 weeks theses weird problems.

      There seems to be a restapi request with bad password. although it is possible to send those xe commands without password...

      Kind regards
      Christoph

      posted in Compute
      C
      chcnetconsulting
    • RE: desperately searching for solution for xe command timeouts and xcp-ng crashes

      @olivierlambert Thank you, I was doing this already. I do not really understand, what is going on....
      The server is in a private network, not accessible from the bad internet 😉

      I do not understand, why there is a Session authentication failed....

      This is happening all 2 seconds:

      # tail -f /var/log/xensource.log
      Jan 10 13:04:23 ahbxen1 xapi: [debug||963 |org.xen.xapi.xenops.classic events D:ae7d342ca10e|xenops] Processing event: ["Vm","5d51c38b-5260-2903-ae21-4bbe607fb99c"]
      Jan 10 13:04:23 ahbxen1 xapi: [debug||963 |org.xen.xapi.xenops.classic events D:ae7d342ca10e|xenops] xenops event on VM 5d51c38b-5260-2903-ae21-4bbe607fb99c
      Jan 10 13:04:23 ahbxen1 xenopsd-xc: [debug||72617 |org.xen.xapi.xenops.classic events D:ae7d342ca10e|xenops_server] VM.stat 5d51c38b-5260-2903-ae21-4bbe607fb99c
      Jan 10 13:04:23 ahbxen1 xapi: [debug||963 |org.xen.xapi.xenops.classic events D:ae7d342ca10e|xenops] xenopsd event: processing event for VM 5d51c38b-5260-2903-ae21-4bbe607fb99c
      Jan 10 13:04:23 ahbxen1 xapi: [debug||963 |org.xen.xapi.xenops.classic events D:ae7d342ca10e|xenops] Supressing VM.allowed_operations update because guest_agent data is largely the same
      Jan 10 13:04:23 ahbxen1 xapi: [debug||963 |org.xen.xapi.xenops.classic events D:ae7d342ca10e|xenops] xenopsd event: Updating VM 5d51c38b-5260-2903-ae21-4bbe607fb99c domid 14 guest_agent
      Jan 10 13:04:27 ahbxen1 xapi: [debug||5755278 INET :::80||dummytaskhelper] task dispatch:session.logout D:cfcb0866d8ec created by task D:b4fe67a1bb65
      Jan 10 13:04:30 ahbxen1 xapi: [debug||5755279 UNIX /var/lib/xcp/xapi||cli] xe pool-param-get uuid=Stopping ha-lizard= (via= systemctl):= [= OK= ]= param-name=other-config param-key=XenCenter.CustomFields.ha-lizard-enabled username=root password=(omitted)
      Jan 10 13:04:30 ahbxen1 xapi: [ info||5755279 UNIX /var/lib/xcp/xapi|session.login_with_password D:655f9bce2d1c|xapi] Session.create trackid=24d93f0af567eed5f81fb70f2557487e pool=false uname=root originator=cli is_local_superuser=true auth_user_sid= parent=trackid=9834f5af41c964e225f24279aefe4e49
      Jan 10 13:04:32 ahbxen1 xcp-rrdd: [ info||7 ||rrdd_main] memfree has changed to 5363956 in domain 6
      Jan 10 13:04:35 ahbxen1 xapi: [ info||5755280 INET :::80|session.login_with_password D:2d12df15927e|xapi] Failed to locally authenticate user root from HTTP request from Internet with User-Agent: xmlrpclib.py/1.0.1 (by www.pythonware.com): Authentication failure
      Jan 10 13:04:35 ahbxen1 xapi: [debug||5755282 UNIX /var/lib/xcp/xapi||cli] xe host-list name-label=ahbxen1 minimal=true username=root password=(omitted)
      Jan 10 13:04:35 ahbxen1 xapi: [ info||5755282 UNIX /var/lib/xcp/xapi|session.login_with_password D:87466e0161dd|xapi] Session.create trackid=249a86916b75d708a2c52adb1f011eed pool=false uname=root originator=cli is_local_superuser=true auth_user_sid= parent=trackid=9834f5af41c964e225f24279aefe4e49
      Jan 10 13:04:40 ahbxen1 xapi: [debug||1038 scanning_thread|SR scanner D:f2340ef7fc82|xapi_sr] Automatically scanning SRs = [ OpaqueRef:cf346a4f-4981-412d-a057-9b386d8bd2d6 ]
      Jan 10 13:04:40 ahbxen1 xapi: [error||5755280 INET :::80||backtrace] session.login_with_password D:2d12df15927e failed with exception Server_error(SESSION_AUTHENTICATION_FAILED, [ root; Authentication failure ])
      Jan 10 13:04:40 ahbxen1 xapi: [error||5755280 INET :::80||backtrace] Raised Server_error(SESSION_AUTHENTICATION_FAILED, [ root; Authentication failure ])
      Jan 10 13:04:40 ahbxen1 xapi: [error||5755280 INET :::80||backtrace] 1/8 xapi Raised at file ocaml/xapi/xapi_session.ml, line 405
      Jan 10 13:04:40 ahbxen1 xapi: [error||5755280 INET :::80||backtrace] 2/8 xapi Called from file ocaml/xapi/xapi_session.ml, line 40
      Jan 10 13:04:40 ahbxen1 xapi: [error||5755280 INET :::80||backtrace] 3/8 xapi Called from file ocaml/xapi/xapi_session.ml, line 40
      Jan 10 13:04:40 ahbxen1 xapi: [error||5755280 INET :::80||backtrace] 4/8 xapi Called from file ocaml/xapi/server_helpers.ml, line 83
      Jan 10 13:04:40 ahbxen1 xapi: [error||5755280 INET :::80||backtrace] 5/8 xapi Called from file ocaml/xapi/server_helpers.ml, line 99
      Jan 10 13:04:40 ahbxen1 xapi: [error||5755280 INET :::80||backtrace] 6/8 xapi Called from file lib/xapi-stdext-pervasives/pervasiveext.ml, line 24
      Jan 10 13:04:40 ahbxen1 xapi: [error||5755280 INET :::80||backtrace] 7/8 xapi Called from file lib/xapi-stdext-pervasives/pervasiveext.ml, line 35
      Jan 10 13:04:40 ahbxen1 xapi: [error||5755280 INET :::80||backtrace] 8/8 xapi Called from file lib/backtrace.ml, line 177
      Jan 10 13:04:40 ahbxen1 xapi: [error||5755280 INET :::80||backtrace]
      Jan 10 13:04:42 ahbxen1 xapi: [ info||5755283 UNIX /var/lib/xcp/xapi||cli] xe message-create name=HA-Lizard - xe_wrapper priority=1 body=xe_wrapper: COMMAND: xe pool-param-get pool-uuid=c1dbc848-aa29-1603-2af7-078466842ac2 username=root password=(omitted)
      
      posted in Compute
      C
      chcnetconsulting
    • desperately searching for solution for xe command timeouts and xcp-ng crashes

      Hi all,
      I am facing here already a few weeks a weird and faulty behavior of our xcp-ng cluster with halizard and iscsi-ha with drbd mirroring of the storage
      partition. This setup was working without any changes or problems for two years, now it is crashing repeatedly.

      After successfully starting the cluster, and after drbd finished updating the clustered partition, everything works fine or a few minutes or half an hour (xe host-list, xe vm-list is responding fast and normal as it should be). Then after 15 minutes each request to xe host-list, xe vm-list or whatever else command has an incredible long timeout before a result is delivered (1 minute or longer).

      If this is running even longer it is getting totally unresponsive.

      dns-resolving is working flawlessly.
      if I switch off the second cluster server. everything is running from one server and there are no timeouts so far. So this is the pool communication somehow. I do not have any idea, what could be wrong.

      Logically this behaviour is killing any xcp.ng-center or xen orchestra connection and it is not possible to work with the cluster.

      Virtual machines are working normally.

      Today I did a cold start of the cluster, but no joy.

      Does anybody have probably an idea, how to debug that or fix it??!!

      kind regards
      Christoph

      posted in Compute
      C
      chcnetconsulting