XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login
    1. Home
    2. felibb
    F
    Offline
    • Profile
    • Following 0
    • Followers 0
    • Topics 1
    • Posts 10
    • Groups 0

    felibb

    @felibb

    2
    Reputation
    2
    Profile views
    10
    Posts
    0
    Followers
    0
    Following
    Joined
    Last Online

    felibb Unfollow Follow

    Best posts made by felibb

    • RE: XO server loses pool and hosts momentarily, timeout error

      So I booted a pre-bookworm upgrade XO VM I had created earlier, moved its network to Management LAN, and installed 6444f88 (latest as of writing this). No timeouts anymore. Seems that Debian version doesn't matter here, and the code is fine (as we sort of determined already), but my networks are perhaps not routed fast enough for XO? It makes sense to put a mgmt appliance into the mgmt LAN, of course, however older code (that I tested, pre-bfb8d3b) did not have this issue. So maybe a combo of that new undici piece and slightly higher latency is causing it for me?

      posted in Management
      F
      felibb
    • RE: XO server loses pool and hosts momentarily, timeout error

      @olivierlambert thanks for the tip. Looks like bfb8d3b29e4f9531dda368f6624652479682b69d is the culprit, and the comment mentions "http-request-plus → undici" which seems like what you referred to above. Some earlier commits had weird glitches like not displaying any VMs / any storage, but they did not time out.

      posted in Management
      F
      felibb

    Latest posts made by felibb

    • RE: XO server loses pool and hosts momentarily, timeout error

      So I booted a pre-bookworm upgrade XO VM I had created earlier, moved its network to Management LAN, and installed 6444f88 (latest as of writing this). No timeouts anymore. Seems that Debian version doesn't matter here, and the code is fine (as we sort of determined already), but my networks are perhaps not routed fast enough for XO? It makes sense to put a mgmt appliance into the mgmt LAN, of course, however older code (that I tested, pre-bfb8d3b) did not have this issue. So maybe a combo of that new undici piece and slightly higher latency is causing it for me?

      posted in Management
      F
      felibb
    • RE: XO server loses pool and hosts momentarily, timeout error

      (Replying to my previous post, a bit off-topic for the thread, but having installed https://github.com/xenserver/xe-guest-utilities/releases/tag/v8.4.0 manually, I see the IP in GUI now, but XOA says "Management agent 8.3.60-1 detected")

      posted in Management
      F
      felibb
    • RE: XO server loses pool and hosts momentarily, timeout error

      @felibb said in XO server loses pool and hosts momentarily, timeout error:

      upgrade my old XO to bookworm + latest commit in master

      Welp, that didn't help much, still seeing timeouts. Also neither XO nor XOA show the VM's own IP in the GUI anymore. dist-upgrade renamed interface from eth0 to etX0, and I had to edit /etc/network/interfaces to get the network back up, and I can connect, but GUI still says "No IP record". Management agent 8.0.50-1 detected, in case it matters.

      Fresh VM setup to be tested another day.

      posted in Management
      F
      felibb
    • RE: XO server loses pool and hosts momentarily, timeout error

      @olivierlambert both channels seem to work fine, yes.

      posted in Management
      F
      felibb
    • RE: XO server loses pool and hosts momentarily, timeout error

      @olivierlambert right, XO vs. XOA, gotcha. XOA seems to work fine, no timeouts for about 1/2hr. I did select "Management" LAN for it.

      I think the next step for me would be to upgrade my old XO to bookworm + latest commit in master. Then I probably can try a fresh VM with bookworm + XO latest commit in master + interface in mgmt LAN.

      posted in Management
      F
      felibb
    • RE: XO server loses pool and hosts momentarily, timeout error

      @Andrew said in XO server loses pool and hosts momentarily, timeout error:

      some issues with undici that were resolved in a later commit 0794a63

      Tried 79c9ef0 (1 day older than 0794a63), seeing timeouts.

      @olivierlambert said in XO server loses pool and hosts momentarily, timeout error:

      1. Can you try to use XOA in latest release channel in the same environment and see if you also have the issue?

      Unsure I understand what you are referring to, can you please clarify?

      1. Is your XO far away from the pool in terms of network latency?

      I would expect it the latency to be quite low: XOA VM lives on the same pool, has an IP in the same subnet as 10Gx2 bond interface on each host. This is not however the same 1G network as the one marked with "Management" blue bubble in the Host network tab. These two are different subnets. Can this have an effect?

      1. Your OS is Debian 11, IDK if that could cause the problem (XOA is on Debian 12).

      dist-upgrade is fast and easy, I can definitely try that.

      @julien-f said in XO server loses pool and hosts momentarily, timeout error:

      If you can, please test the xen-api-blocking branch and let me know if that helps.

      ce15ef6 deployed, seeing timeouts.

      posted in Management
      F
      felibb
    • RE: XO server loses pool and hosts momentarily, timeout error

      @olivierlambert thanks for the tip. Looks like bfb8d3b29e4f9531dda368f6624652479682b69d is the culprit, and the comment mentions "http-request-plus → undici" which seems like what you referred to above. Some earlier commits had weird glitches like not displaying any VMs / any storage, but they did not time out.

      posted in Management
      F
      felibb
    • RE: XO server loses pool and hosts momentarily, timeout error

      Okay, tried a few at random and narrowed it down to this:

      • 0ccfd4b / 2024.03.14: has timeouts
      • 18dea2f / 2024.02.08: does not have timeouts

      These two have about 100 commits between them. Any suggestions on how to narrow it down further?

      posted in Management
      F
      felibb
    • RE: XO server loses pool and hosts momentarily, timeout error

      Same issue with the latest commit. Hunting for a commit that may or may not work is a wild goose chase, I don't really have the time for this, especially since I agree it is hard to tell, and XCP-ng can easily be the culprit here, hope I didn't imply that XO has to be at fault. I just didn't see any errors in /var/log/xensource.log, but maybe I wasn't looking in the right place. I was more hoping for some debugging hints I didn't think of myself.

      posted in Management
      F
      felibb
    • XO server loses pool and hosts momentarily, timeout error

      XO server: 2 vCPU, 4GiB RAM
      OS: Debian 11 / 5.10.0-28-amd64 #1 SMP Debian 5.10.209-2 (2024-01-31) x86_64 GNU/Linux
      Node.js version: v18.20.2
      Yarn version: 1.22.19
      XO version: https://github.com/vatesfr/xen-orchestra/commit/771b04acc4480cf138a0c476968d7c613bb8147d
      XCP-NG server version: 8.2.1
      Environment: 3 hosts, HA, shared storage

      The problem is that pool, hosts, VMs (all inventory except one manually added server) seem to disappear from the web UI every 3-6 min, only to reappear automagically after exactly 1 min.

      No network changes that could explain timeouts. All was working fine until last week. In fact I think it started after the server patch/update to 8.2.1 (don't recall from which version), the only significant change I did, but no errors in the server logs.

      xo-server logs this when it loses the pool (but nothing when the pool reappears):

      May  8 17:06:32 xo-ce xo-server[328]: _watchEvents TimeoutError: operation timed out
      May  8 17:06:32 xo-ce xo-server[328]:     at Promise.timeout (/opt/xo/xo-builds/xen-orchestra-202405070909/node_modules/promise-toolbox/timeout.js:11:16)
      May  8 17:06:32 xo-ce xo-server[328]:     at Xapi.apply (file:///opt/xo/xo-builds/xen-orchestra-202405070909/packages/xen-api/index.mjs:773:37)
      May  8 17:06:32 xo-ce xo-server[328]:     at Xapi._call (/opt/xo/xo-builds/xen-orchestra-202405070909/node_modules/limit-concurrency-decorator/src/index.js:85:24)
      May  8 17:06:32 xo-ce xo-server[328]:     at Xapi._watchEvents (file:///opt/xo/xo-builds/xen-orchestra-202405070909/packages/xen-api/index.mjs:1198:31) {
      May  8 17:06:32 xo-ce xo-server[328]:   call: {
      May  8 17:06:32 xo-ce xo-server[328]:     method: 'event.from',
      May  8 17:06:32 xo-ce xo-server[328]:     params: [ [Array], '00000000000063727552,00000000000063699698', 60.1 ]
      May  8 17:06:32 xo-ce xo-server[328]:   }
      May  8 17:06:32 xo-ce xo-server[328]: }
      May  8 17:09:32 xo-ce xo-server[328]: _watchEvents TimeoutError: operation timed out
      May  8 17:09:32 xo-ce xo-server[328]:     at Promise.timeout (/opt/xo/xo-builds/xen-orchestra-202405070909/node_modules/promise-toolbox/timeout.js:11:16)
      May  8 17:09:32 xo-ce xo-server[328]:     at Xapi.apply (file:///opt/xo/xo-builds/xen-orchestra-202405070909/packages/xen-api/index.mjs:773:37)
      May  8 17:09:32 xo-ce xo-server[328]:     at Xapi._call (/opt/xo/xo-builds/xen-orchestra-202405070909/node_modules/limit-concurrency-decorator/src/index.js:85:24)
      May  8 17:09:32 xo-ce xo-server[328]:     at Xapi._watchEvents (file:///opt/xo/xo-builds/xen-orchestra-202405070909/packages/xen-api/index.mjs:1198:31) {
      May  8 17:09:32 xo-ce xo-server[328]:   call: {
      May  8 17:09:32 xo-ce xo-server[328]:     method: 'event.from',
      May  8 17:09:32 xo-ce xo-server[328]:     params: [ [Array], '00000000000063727963,00000000000063699698', 60.1 ]
      May  8 17:09:32 xo-ce xo-server[328]:   }
      May  8 17:09:32 xo-ce xo-server[328]: }
      

      Some xcp-ng forum posts from 2023 talked about downgrading to node.js v18 as a solution to a similar timeout issue, but I am already on v18. Would be grateful for any hints, and can share more info.

      0 pdonias committed to vatesfr/xen-orchestra
      fix(CHANGELOG): packages versions (#7639)
      
      Introduced by 5d5dc9891f3ee8533c2b1e7239b710f8820c4d2e
      posted in Management
      F
      felibb