XCP-ng 8.2 Rolling Update Error

cyrus104

Is there a way to look up OpaqueRef and does it point to a specific service/host/vm/etc?
I checked all running vms including XO and XCP-ng, 33GB assigned / being used. Each host has 64GB of RAM. I do have cores over-provisioned if all the VMs are forced onto one host but I can fix that. With a rolling update dividing VMs among the 2/3 hosts at a given time, there shouldn't be an over-provision of cores.

olivierlambert

I don't remember where to check OpaqueRef

Try to find the error in XCP-ng host logs (/var/log/xensource.log) you might have more details

cyrus104

There are more debug messages but these seem like the most relevant.

Apr  3 20:02:18 localhost xapi: [debug||1864 ||vgpuops] vGPUs allocated to VM (OpaqueRef:ee6254af-44af-4bcf-8382-d9b7542d2b3c) are:
Apr  3 20:02:56 localhost xapi: [ warn||1864 ||rbac_audit] cannot marshall arguments for the action VM.migrate_send: name and value list lengths don't match. str_names=[session_id,vm,dest,live,vdi_map,vif_map,options,vgpu_map,], xml_values=[S(OpaqueRef:4489cc5b-fbe9-44d4-a60f-627b8960338f),S(OpaqueRef:ee6254af-44af-4bcf-8382-d9b7542d2b3c),{SM:S(http://10.100.10.31/services/SM?session_id=OpaqueRef:cd0a7342-5155-434d-a8ca-2e9f7dcf8460);host:S(OpaqueRef:ac2e50bc-1f80-417a-8eea-a7ce955d4c24);xenops:S(http://10.100.10.31/services/xenops?session_id=OpaqueRef:cd0a7342-5155-434d-a8ca-2e9f7dcf8460);session_id:S(OpaqueRef:cd0a7342-5155-434d-a8ca-2e9f7dcf8460);master:S(http://10.100.10.31/)},B(true),{OpaqueRef:a4398d4e-946c-4c83-ad5f-b4dc1c3ddeee:S(OpaqueRef:4c00dfd9-a30c-47a4-8367-e8a7d7c53cb6);OpaqueRef:188682c3-1b45-4ae4-98cf-0dfff1d90d56:S(OpaqueRef:4c00dfd9-a30c-47a4-8367-e8a7d7c53cb6)},{},{force:S(false)},]

olivierlambert

Those 2 lines aren't relevant at all indeed. Can you copy more lines in https://paste.vates.fr/ for example.

cyrus104

@olivierlambert
https://paste.vates.fr/?f2f854418b6493c8#B1Wrjkb9wVQTY1BKvTvcz1r4WZmXHnTB2n52UjLx4Rof

I used: cat /var/log/xensource.log | grep -i ee6254af-44af-4bcf-8382-d9b7542d2b3c

The logs I sent are from host 1, not sure if I need to run that command on all of the hosts.

olivierlambert

You are in HA. Please disable HA first before doing a rolling pool update.

This error comes from HA: if the host is down for a reboot, then if there's a failure on the extra left host, you won't have enough memory to get the HA plan to be deployed.

edit: I'm opening an issue on XO GH repo to check and disable HA before doing RPU. Thanks for the feedback!

cyrus104

@olivierlambert
Awesome, thanks for helping to troubleshoot. I'll go a head and disable that then do the update.

I've been having to switch between XO and XCP-ng Center to get certain features to work.

The HA configuration in Center is super easy with good feedback on the heartbeat, will this type of info be incorporated into XO?
The in trying to troubleshoot this I was looking at the VM memory options and the XO vs XCP-ng Center set fixed memory vs automatic is much easier in Memory.
XO detects that all of my VMs properly have the management agent installed (with ip, os, and agent version being reported), XCP-ng Center only reports 1/4 of the VMs having the management agent installed.

olivierlambert

We are aware of those shortcoming and they'll be fixed in XO 6

olivierlambert

In copy @pdonias and @julien-f so they'll check if we got this noted for XO 6

cyrus104

Great to hear, looking forward to the updates.