Change/Remove Master from Pool error - missing column

cigarman

Hi Team,
I want to remove the master from our production pool.

I get the following error:

xe pool-designate-new-master host-uuid=<<other pool server uuid>>

The server failed to handle your request, due to an internal error. The given message may give details useful for debugging the problem.
message: missing column
<extra>: host
<extra>: https_only

I have made sure that the current Pool master was fully updated first, and rebooted.
I then updated all other hosts in the pool (5 in total).

yum update
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
Excluding mirror: updates.xcp-ng.org
 * xcp-ng-base: mirrors.xcp-ng.org
Excluding mirror: updates.xcp-ng.org
 * xcp-ng-updates: mirrors.xcp-ng.org
No packages marked for update

I have even shutdown the master, loosing connection via XO and running the change master command via the command line on the other servers, which of course, do nothing.

I have no VM's on the current pool master, and have tried put in maintainence mode first.

In XO (from courses) I have tried to Detach the master which generates the following error in the logs;

host.detach
{
  "host": "d2d01b9a-e6d1-4dd0-90e6-f101a251c7b1"
}
{
  "code": "INTERNAL_ERROR",
  "params": [
    "Xapi_pool.Cannot_eject_master"
  ],
  "call": {
    "method": "pool.eject",
    "params": [
      "OpaqueRef:90ba839f-61a2-4f96-b485-df6332fb0bce"
    ]
  },
  "message": "INTERNAL_ERROR(Xapi_pool.Cannot_eject_master)",
  "name": "XapiError",
  "stack": "XapiError: INTERNAL_ERROR(Xapi_pool.Cannot_eject_master)
    at Function.wrap (/opt/xo/xo-builds/xen-orchestra-202310302357/packages/xen-api/src/_XapiError.js:16:12)
    at /opt/xo/xo-builds/xen-orchestra-202310302357/packages/xen-api/src/transports/json-rpc.js:35:21
    at runNextTicks (node:internal/process/task_queues:60:5)
    at processImmediate (node:internal/timers:447:9)
    at process.callbackTrampoline (node:internal/async_hooks:130:17)"
}

I have run xe host-list to get the other host UUID's from all 5 servers and have tried the xe pool-designate-new-master host-uuid= command on the current master and all other servers with the same error message.

I have even tried the window controller MSI via the gui

Which appears to timeout and fails.

Host list:

XO Errors Screen:

cigarman

Okay Team,
For my reference, and anyone else finding this;

I was able to recover the pool by completing the following steps.
As this is out production cluster, I did it late into the night - thankfully, with no downtime.

1: I shutdown the old master than had no VM's running.

1: On another host that I wanted to become the master, I ssh'd on and ran the following command.
xe pool-emergency-transition-to-master

I got the following response:

Host agent will restart and transition to master in 10.000 seconds...

3: I then went into XO => Settings => Server and connected to the new master.

2: After a god 10-15 seconds, On the new master I ran the following command
xe pool-recover-slaves

Which after a few seconds, the GUID's of the other hosts in the pool appeared.

xe pool-recover-slaves
f600ea3a-cc02-4f24-a15e-938756feb00c
d123525c-6022-43c9-8c23-d0f1ca567219
16c7c955-f65e-4d5b-96e7-787085c2d25f

And all hosts are showing correctly in XO!

cigarman

I've found this post:
https://docs.xenserver.com/en-us/citrix-hypervisor/dr/machine-failures.html#master-failures

Which looks like I can run

xe pool-emergency-transition-to-master

On one of the other members of the pool.
Is this the best/safest way to change the pool master?

@olivierlambert Do you think this is a solution to the errors?

Thanks

olivierlambert

Hi,

Are all your hosts in the pool fully updated? (not just the master)

cigarman

@olivierlambert Yes, I updated the master and rebooted it first.
Then updated the other hosts and rebooted each of those.

cigarman

Okay Team,
For my reference, and anyone else finding this;

I was able to recover the pool by completing the following steps.
As this is out production cluster, I did it late into the night - thankfully, with no downtime.

1: I shutdown the old master than had no VM's running.

1: On another host that I wanted to become the master, I ssh'd on and ran the following command.
xe pool-emergency-transition-to-master

I got the following response:

Host agent will restart and transition to master in 10.000 seconds...

3: I then went into XO => Settings => Server and connected to the new master.

2: After a god 10-15 seconds, On the new master I ran the following command
xe pool-recover-slaves

Which after a few seconds, the GUID's of the other hosts in the pool appeared.

xe pool-recover-slaves
f600ea3a-cc02-4f24-a15e-938756feb00c
d123525c-6022-43c9-8c23-d0f1ca567219
16c7c955-f65e-4d5b-96e7-787085c2d25f

And all hosts are showing correctly in XO!

olivierlambert

Good to know you made it

What about the old primary after that? Did you started it again or just removed it for good?

cigarman

@olivierlambert Good question,
Once I booted it back, it added as a slave to the pool with out issues.

But I have removed it from the pool and am re-installing as we speak.
We're moving data centres, so I'm splitting the pool up to do a part-move.

Thanks for your interest, and assistance.

olivierlambert

Okay, glad to know it works! Good luck for the rest of your operations!