Posts made by jimmymiller | XCP-ng and XO forum

jimmymiller

Has anyone seen issues migrating VDIs once CBT is enabled? We're seeing VDI_CBT_ENABLED errors when we try to live migrate disks between SRs. Obviously disabling CBT on the disk allows for the migration to move forward. 'Users' who have limited access don't seem to see specifics on the error but us as admins get a VDI_CBT_ENABLED error. Ideally I think we'd want to be able to still migrate VDIs with CBT enabled or maybe as a part of a VDI migration process CBT would be disabled temporarily, migrated then re-enabled?

User errors:
Screenshot 2024-08-07 at 17.42.07.png

Admins see:

{
  "id": "7847a7c3-24a3-4338-ab3a-0c1cdbb3a12a",
  "resourceSet": "q0iE-x7MpAg",
  "sr_id": "5d671185-66f6-a292-e344-78e5106c3987"
}
{
  "code": "VDI_CBT_ENABLED",
  "params": [
    "OpaqueRef:aeaa21fc-344d-45f1-9409-8e1e1cf3f515"
  ],
  "task": {
    "uuid": "9860d266-d91a-9d0e-ec2a-a7752fa01a6d",
    "name_label": "Async.VDI.pool_migrate",
    "name_description": "",
    "allowed_operations": [],
    "current_operations": {},
    "created": "20240807T21:33:29Z",
    "finished": "20240807T21:33:29Z",
    "status": "failure",
    "resident_on": "OpaqueRef:8d372a96-f37c-4596-9610-1beaf26af9db",
    "progress": 1,
    "type": "<none/>",
    "result": "",
    "error_info": [
      "VDI_CBT_ENABLED",
      "OpaqueRef:aeaa21fc-344d-45f1-9409-8e1e1cf3f515"
    ],
    "other_config": {},
    "subtask_of": "OpaqueRef:NULL",
    "subtasks": [],
    "backtrace": "(((process xapi)(filename ocaml/xapi/xapi_vdi.ml)(line 470))((process xapi)(filename ocaml/xapi/message_forwarding.ml)(line 4696))((process xapi)(filename ocaml/xapi/message_forwarding.ml)(line 199))((process xapi)(filename ocaml/xapi/message_forwarding.ml)(line 203))((process xapi)(filename lib/xapi-stdext-pervasives/pervasiveext.ml)(line 42))((process xapi)(filename lib/xapi-stdext-pervasives/pervasiveext.ml)(line 51))((process xapi)(filename ocaml/xapi/message_forwarding.ml)(line 4708))((process xapi)(filename ocaml/xapi/message_forwarding.ml)(line 4711))((process xapi)(filename lib/xapi-stdext-pervasives/pervasiveext.ml)(line 24))((process xapi)(filename lib/xapi-stdext-pervasives/pervasiveext.ml)(line 35))((process xapi)(filename ocaml/xapi/helpers.ml)(line 1503))((process xapi)(filename ocaml/xapi/message_forwarding.ml)(line 4705))((process xapi)(filename lib/xapi-stdext-pervasives/pervasiveext.ml)(line 24))((process xapi)(filename lib/xapi-stdext-pervasives/pervasiveext.ml)(line 35))((process xapi)(filename lib/xapi-stdext-pervasives/pervasiveext.ml)(line 24))((process xapi)(filename ocaml/xapi/rbac.ml)(line 205))((process xapi)(filename ocaml/xapi/server_helpers.ml)(line 95)))"
  },
  "message": "VDI_CBT_ENABLED(OpaqueRef:aeaa21fc-344d-45f1-9409-8e1e1cf3f515)",
  "name": "XapiError",
  "stack": "XapiError: VDI_CBT_ENABLED(OpaqueRef:aeaa21fc-344d-45f1-9409-8e1e1cf3f515)
    at Function.wrap (file:///usr/local/lib/node_modules/xo-server/node_modules/xen-api/_XapiError.mjs:16:12)
    at default (file:///usr/local/lib/node_modules/xo-server/node_modules/xen-api/_getTaskResult.mjs:13:29)
    at Xapi._addRecordToCache (file:///usr/local/lib/node_modules/xo-server/node_modules/xen-api/index.mjs:1033:24)
    at file:///usr/local/lib/node_modules/xo-server/node_modules/xen-api/index.mjs:1067:14
    at Array.forEach (<anonymous>)
    at Xapi._processEvents (file:///usr/local/lib/node_modules/xo-server/node_modules/xen-api/index.mjs:1057:12)
    at Xapi._watchEvents (file:///usr/local/lib/node_modules/xo-server/node_modules/xen-api/index.mjs:1230:14)"
}```

jimmymiller

@olivierlambert Okay. I'll give it a shot.

jimmymiller

@HolgiB For this use, it's actually a virtual TrueNAS instance sitting on a LUN mapped to the source XCP-ng pool. I know there are in-OS options using zfs send|receive, but the point is to get an understanding of what we would do without that convenience.

I know Xen and VMware do things differently, but having VMFS in the mix allowed us to unmount a datastore, move the mapping to a new host, mount that datastore, then just point that host at the existing LUN and quickly import the VMX (for a full VM config) or VMDK (with configuring a new VM to use those existing disks). This completely eliminated the need to truly copy the data--we were just changing the host that had access to it. We didn't use it very often because VMware handled moving VMs with big disks pretty well, but it was our ace-in-the-hole if the option for storage vMotion wasn't available.

jimmymiller

@olivierlambert Well even a cold migration seemed to fail. Bah!

The LUN/SR is dedicated to just the one VM. 1 x 16g disk for the OS, 20 x 1.5T disks for data. Inside the VM, I'm using ZFS as a stripe to bring them all together into a single zpool. I know because this is zfs I could theoretically do a zfs replication job to another VM, but I'm also using this as a test to figure out how I'm going to move those larger VMs we have that don't have the convenience of in-OS replication option. For our larger VMs we almost always dedicate LUNs specifically them and we have block-based replication options on our arrays so in theory we should be able to fool any device into thinking the replica is a legit pre-existing SR.

No snaps -- the data on this VM is purely an offsite backup target so we didn't feel the need to backup the backup of the backup.

Let me try testing the forget SR + connect to different pool. I swear I tried this before but when I went to try creating the SR, it flagged the LUN as having a preexisting SR, but it forced me to reprovision a new SR and wouldn't let me map the existing.

jimmymiller

Re: Shared SR between two pools?

I have a need to move a sizable (i.e. 30T) VM between two pools of compute nodes. I can do this move cold, but I'd rather not have the VM offline for several days, which is what it's going to look like if I do a standard cold VM migration.

As I understand it, SRs are essentially locked to a specific pool (particularly the pool master). Is it possible to basically unmount (i.e. forget) the SR from one pool, remount it on the target pool then just import the VM while still basically continuing to reside on the same LUN?

VMware made this pretty easy with VMFS/VMX/VMDKs, but it seems like Xen may not be as flexible.

jimmymiller

Well I guess that was the coalesce process because now that one has just stopped. Any ideas on how to find out why it worked fine for essentially 2 days then stopped?

I get the impression a live migration of a 30T VM may not be the best way to go about moving something of this size between pools? Maybe a warm migration will work better, but I'm also curious how much capacity we're going to need on the source in order to complete this move.

jimmymiller

@Danp

Hrm. xe task-list is showing nothing, but there is clearly something still happening based on the stats.

Screenshot 2024-06-19 at 11.47.39.png

jimmymiller

I'm in the process of live migrating a large VM (~30T) from one host to another. The process had been going smoothly for the last 2 days, but now the task no longer appears in XO. The VM status is showing "Busy (migrate_send)", but the task isn't visible in the XO task list. Is there a timeout in XO for tasks running a long time? Is there a way to actually verify the status of the task and whether the VM is still moving? According to the states, there is still IO on the SR so it appears to still be in progress.

jimmymiller

Has anyone out there gotten the XO OIDC plugin to work with EntraID? My security folks are in search of any documentation that could help them configure things on their end to help things work with XO. Tech support folks are looking into this as well, but I figured I'd put something out there to see if the broader community has made it work.

jimmymiller

@stevewest15 Not to call this a "fix" because I know they aren't always the latest 'n greatest, but have you tried just installing from the EPEL repo? Seems to work for us.

dnf install epel-release-9-5.el9.noarch -y
yum install xe-guest-utilities-latest.x86_64 -y
systemctl enable xe-linux-distribution.service
systemctl start xe-linux-distribution.service

jimmymiller

We have a request from our security folks to force expiration of tokens if users go idle for a specific amount of time. Is there a means in XO to do this?

jimmymiller

Is there a means of croning the LDAP group sync process? We'd obviously prefer any LDAP changes to be instant, but because XO uses a manual sync, it'd be nice if we could tell our customers "it will happen at x & y each day."

jimmymiller

@AlexanderK Yes. The hosts and CS Management server are on the same L2/no firewalls. I think it has to do with me wanting to use an "L2 network topology" out the gate but from the reading I'm doing, this is definitely atypical. I've blown away the zone and reconstructed it in what I think is a "normal" CS config now.

I've gotten the hosts attached at least, but now I'm seeing issues with getting the SystemVMs up and running. I did check the 'use local host storage' option in the zone config and I can see CS is at least touching/renamed the local SR.

Secondary Storage Vm creation failure in zone [xxxxxx]. Error details: Unable to allocate capacity on zone [2] due to [null].
Console proxy creation failure. Zone [xxxxxx]. Error details: Unable to orchestrate start VM instance {"id":20,"instanceName":"v-20-VM","type":"ConsoleProxy","uuid":"f138df9e-92c4-4cfd-9333-7d8396c436e6"} due to [Unable to acquire lock on VMTemplateStoragePool: 31].

My storage network is completely private and everything is hanging off a single bond0 where I have VLANs carved down to the XCP-ng host, but I haven't configured any networks within XO. I assume CS was supposed to take care of that?

I'm also not seeing the VMs show up in XO. Am I supposed to see instances and SystemVMs in XO?

Thanks for any help.

jimmymiller

@AlexanderK Just started playing with CS. Trying to see if this might be a better fit for our endusers than the current XOA. XOA is okay, but definitely more focused and defined for admins of the full XCP-ng stack vs. just wanting to deploy VMs. It's possible XO 6 might solve a lot of my issues, but we're looking around for other options that will utilize XCP-ng underneath the covers while maybe giving a better frontend experience for customers.

At this point I'm just trying to get a rather vanilla XCP-ng host in my lab connected to CS so I can learn the architecture better but I'm running into issues. I won't deny that I'm a noob with CS so some of the components are still foreign to me but simply connecting a host is giving me troubles and documentation seems a little spotty. Any ideas? This might need to go into a new thread.

Error:
"Cannot transit agent status with event AgentDisconnected for host....Unable to transition to a new state from Creating via AgentDisconnected"

Screenshot 2024-05-12 at 12.52.47.png

jimmymiller

How can we create a template from a VM that provides the same install options as those provided with the default XO templates? In our case we usually create empty shells then roll with PXE to install the OS, whether it be Windows or Linux, etc. I'm trying to find a way to take a default template, tweak it a little (i.e. mainly wanting to enable HA as a default) then redeploy that as a template with the exact same options during the VM create process as those provided by the built-in templates. In particular the main issue I'm seeing is the "install settings" are different and I also want to deselect the "fast clone" option as a default.

Default template replica:
Screenshot 2024-05-04 at 09.41.41.png

Template created from a VM using the same template: Screenshot 2024-05-04 at 09.42.27.png

jimmymiller

@olivierlambert Same behavior. Should I open a case with support?

I have noticed if I do a manual health check outside of the backup job, it behaves as I'd expect....restores VM>>power-on>>management tools seen>>power-off>>destroy.

jimmymiller

@olivierlambert -- stable channel
Current version: 5.92.1 - XOA build: 20240311

node: 18.19.1
npm: 10.2.4
xen-orchestra-upload-ova: 0.1.6
xo-cli-premium: 0.26.0
xo-server: 5.138.1
xo-server-audit-premium: 0.10.6
xo-server-auth-github-premium: 0.3.1
xo-server-auth-google-premium: 0.3.1
xo-server-auth-ldap-premium: 0.10.8
xo-server-auth-oidc-premium: 0.3.0
xo-server-auth-saml-premium: 0.11.0
xo-server-backup-reports-premium: 0.18.0
xo-server-load-balancer-premium: 0.8.1
xo-server-netbox-premium: 1.4.0
xo-server-netdata-premium: 0.2.0
xo-server-perf-alert-premium: 0.3.6
xo-server-sdn-controller-premium: 1.0.8
xo-server-telemetry: 0.5.0
xo-server-transport-email-premium: 1.0.0
xo-server-transport-icinga2-premium: 0.1.2
xo-server-transport-nagios-premium: 1.0.1
xo-server-transport-slack-premium: 0.0.1
xo-server-transport-xmpp-premium: 0.1.3
xo-server-usage-report-premium: 0.10.5
xo-server-web-hooks-premium: 0.3.3
xo-server-xoa: 0.27.0
xo-web-premium: 5.140.0
xoa-cli: 0.38.1
xoa-updater: 0.48.1

jimmymiller

@olivierlambert And a Windows machine...this one is admittedly installed with the agent I pulled from Citrix (https://www.xenserver.com/downloads). Same backup job/schedule though. Screenshot 2024-04-09 at 11.40.38.png

jimmymiller

@olivierlambert From what I can see, the management agent appears to be operational. I'm up for other ideas to check, if you have some?
Screenshot 2024-04-09 at 11.28.58.png

jimmymiller

Apologies if this has been addressed somewhere else, a quick search didn't yield any keyword responses.

I've created a delta backup job that's been running fine for some time. When I tick the "health check" in the schedule (and select the destination SR), the health check portion of the job ultimately fails. It performs the backup like it should, performs the restore and boots the VM to the point that the management agent is recognized, but then just pauses. After ~10 minutes it will power down the VMs and report a job failure. I know there is supposedly a way to execute a script (https://xen-orchestra.com/docs/backups.html#backup-health-check), but 1) I'm happy enough with just the boot check & 2) I'm not seeing a method of tagging, or in this case untagging, the restored VMs to remove the script method. The restored VM objects are given a "restored from backup" and a "xo:no-bak=Health Check" tag.

I see there is a way to restore based on tag, so I can see adding a "xo-backup-healthcheck-xenstore" to an existing VM to force the script method restore, but in this case I don't want the script method.

What am I missing?