Cannot use cloudConfig while creating VM with RPC call
Hello everyone. I'm currently having issues with using Terraform to create a new VM (using provider
) and I think it might be actually an issue here instead of the provider.
I've described the details in the issue here but the TLDR is that it seems that the provider correctly forwards my inputted cloud config into thevm.create
RPC call, but it doesn't take any effect in the newly created VM. I've described all that I've tried and noticed and I'm at my wits' end. Once again I'm sorry for crossposting the issue from a different repository, but I think I'd end up here anyway -
@TheiLLeniumStudios I messed up my previous fix
, a new one has been merged yesterday:
That's a question of for @ddelnano
Have you had a chance to take a look please? @ddelnano
@sMteX sorry for the late reply. I responded to your GitHub issue with some questions and comments.
Let's keep the discussion there, and we can summarize the conclusion here once we get to the bottom of it.
@ddelnano I'm running into a very similar issue. Didn't want to create a separate issue on github so added my comments on the old one here:
Would really appreciate it if you could help me out investigating the root cause and fixing it. I'll be happy to try out things and provide more details if required
Is your VM created on a slave host in a pool?
@olivierlambert Yes, the problematic VMs are on a slave host in the pool. I'm explicitly assigning affinity host + local SR for every new VM based on even odd numbering. I create a total of 6 VMs, 4 of which are on the master host which boot with the cloud-config drive just fine. 2 gets scheduled on the slave and their cloud config gets corrupted and Talos cannot read it
@olivierlambert after tracing through the xen-orchestra code, I found where the problem originates from. I see this error in XO whenever a cloud config drive is being created:
2023-05-05T22:51:22.141Z xo:xapi WARN importVdiContent: { error: Error: 404 Not Found at Object.assertSuccess (/home/node/xen-orchestra/node_modules/http-request-plus/index.js:138:19) at httpRequestPlus (/home/node/xen-orchestra/node_modules/http-request-plus/index.js:205:22) at Xapi.putResource (/home/node/xen-orchestra/packages/xen-api/src/index.js:508:22) at Xapi.importContent (/home/node/xen-orchestra/@xen-orchestra/xapi/vdi.js:138:7) at Xapi.createCloudInitConfigDrive (file:///home/node/xen-orchestra/packages/xo-server/src/xapi/index.mjs:1332:5) at Xo.<anonymous> (file:///home/node/xen-orchestra/packages/xo-server/src/api/vm.mjs:211:11) at Api.#callApiMethod (file:///home/node/xen-orchestra/packages/xo-server/src/xo-mixins/api.mjs:417:20) { originalUrl: '', url: '', pool_master: host { uuid: 'd1e916dc-8f8a-4aec-98c1-206c4c8144b0', name_label: '', name_description: 'Default install', memory_overhead: 1161101312, allowed_operations: [Array], current_operations: [Object], API_version_major: 2, API_version_minor: 20, API_version_vendor: 'XenSource', API_version_vendor_implementation: {}, enabled: true, software_version: [Object], other_config: [Object], capabilities: [Array], cpu_configuration: {}, sched_policy: 'credit', supported_bootloaders: [Array], resident_VMs: [Array], logging: {}, PIFs: [Array], suspend_image_sr: 'OpaqueRef:b90d54ba-44e7-8181-a4a2-6593276e73cd', crash_dump_sr: 'OpaqueRef:b90d54ba-44e7-8181-a4a2-6593276e73cd', crashdumps: [], patches: [], updates: [], PBDs: [Array], host_CPUs: [Array], cpu_info: [Object], hostname: '', address: '', metrics: 'OpaqueRef:655a27bd-de06-75cb-d231-8460d949d70a', license_params: [Object], ha_statefiles: [], ha_network_peers: [], blobs: {}, tags: [], external_auth_type: '', external_auth_service_name: '', external_auth_configuration: {}, edition: 'xcp-ng', license_server: [Object], bios_strings: [Object], power_on_mode: '', power_on_config: {}, local_cache_sr: 'OpaqueRef:b90d54ba-44e7-8181-a4a2-6593276e73cd', chipset_info: [Object], PCIs: [Array], PGPUs: [Array], PUSBs: [Array], ssl_legacy: false, guest_VCPUs_params: {}, display: 'enabled', virtual_hardware_platform_versions: [Array], control_domain: 'OpaqueRef:58fb5c9b-18c4-6578-803a-d9b8db989ec9', updates_requiring_reboot: [], features: [], iscsi_iqn: '', multipathing: false, uefi_certificates: '', certificates: [Array], editions: [Array], pending_guidances: [], tls_verification_enabled: true, last_software_update: '19700101T00:00:00Z', https_only: false }, SR: SR { uuid: '8c7e88d7-25b4-02e2-5a4b-6e1e3ead70cf', name_label: 'minisforum-hm80-02 SSD', name_description: '', allowed_operations: [Array], current_operations: {}, VDIs: [Array], PBDs: [Array], virtual_allocation: 25404899328, physical_utilisation: 289996800, physical_size: 901115478016, type: 'ext', content_type: 'user', shared: false, other_config: [Object], tags: [], sm_config: [Object], blobs: {}, local_cache_enabled: true, introduced_by: 'OpaqueRef:NULL', clustered: false, is_tools_sr: false }, VDI: VDI { uuid: 'c8ccacc8-38bc-4d46-9649-ecfb65e24da6', name_label: 'XO CloudConfigDrive', name_description: '', allowed_operations: [Array], current_operations: {}, SR: 'OpaqueRef:13a829ed-3408-f42e-695e-f14adf469fb3', VBDs: [], crash_dumps: [], virtual_size: 10485760, physical_utilisation: 3584, type: 'user', sharable: false, read_only: false, other_config: {}, storage_lock: false, location: 'c8ccacc8-38bc-4d46-9649-ecfb65e24da6', managed: true, missing: false, parent: 'OpaqueRef:NULL', xenstore_data: {}, sm_config: {}, is_a_snapshot: false, snapshot_of: 'OpaqueRef:NULL', snapshots: [], snapshot_time: '19700101T00:00:00Z', tags: [], allow_caching: false, on_boot: 'persist', metadata_of_pool: '', metadata_latest: false, is_tools_iso: false, cbt_enabled: false } } }
This seems to come from this specific line of code:
And there's a comment there as well by 1 of the devs which suggests that it could be a potential bug?
Basically what happens is that as a result of ignoring the warning, an empty cloudinitdrive gets created and doesn't have the correct data inside and the VM creation succeeds
I think importVdiContent is ignoring the SR id and is trying to clone the VDI on the master instead of the slave, hence the 404
await this.putResource(cancelToken, stream, '/import_raw_vdi/', { query: { format, vdi: ref, }, task: await this.task_create(`Importing content into VDI ${await this.getField('VDI', ref, 'name_label')}`), })
This returns 404. I'm trying to understand what does /import_raw_vdi/ do? Also, this task gets created on the master host instead of the slave, even for the VMs that are scheduled on the slave host. Is that why it returns a 404? If it does, then how does the filesystem for the cloudconfigdrive get created on the slave? Maybe that needs to be moved over to the master as well? I don't know if I'm understanding currently or not so feel free to correct me if this is not how it works and my assumptions are wrong
The problem is already detected by @julien-f
Expect a fix in the next days or in our next XOA patch release next week
@olivierlambert Awesome! I use XO built from git source so will monitor the commits then. Thanks!
@olivierlambert Just tried the latest changes from xen-orchestra and the problem persists. I get the exact same error produced in XO when importing CloudConfigDrive on a slave:
2023-05-13T09:03:41.719Z xo:xapi WARN importVdiContent: { error: Error: 404 Not Found at Object.assertSuccess (/home/node/xen-orchestra/packages/xen-api/node_modules/http-request-plus/index.js:140:19) at httpRequestPlus (/home/node/xen-orchestra/packages/xen-api/node_modules/http-request-plus/index.js:207:22) at Xapi.putResource (/home/node/xen-orchestra/packages/xen-api/src/index.js:508:22) at Xapi.importContent (/home/node/xen-orchestra/@xen-orchestra/xapi/vdi.js:138:7) at Xapi.createCloudInitConfigDrive (file:///home/node/xen-orchestra/packages/xo-server/src/xapi/index.mjs:1332:5) at Xo.<anonymous> (file:///home/node/xen-orchestra/packages/xo-server/src/api/vm.mjs:217:11) at Api.#callApiMethod (file:///home/node/xen-orchestra/packages/xo-server/src/xo-mixins/api.mjs:417:20) { originalUrl: '', url: '', pool_master: host { uuid: 'd1e916dc-8f8a-4aec-98c1-206c4c8144b0', name_label: '', name_description: 'Default install', memory_overhead: 1161101312, allowed_operations: [Array], current_operations: [Object], API_version_major: 2, API_version_minor: 20, API_version_vendor: 'XenSource', API_version_vendor_implementation: {}, enabled: true, software_version: [Object], other_config: [Object], capabilities: [Array], cpu_configuration: {}, sched_policy: 'credit', supported_bootloaders: [Array], resident_VMs: [Array], logging: {}, PIFs: [Array], suspend_image_sr: 'OpaqueRef:b90d54ba-44e7-8181-a4a2-6593276e73cd', crash_dump_sr: 'OpaqueRef:b90d54ba-44e7-8181-a4a2-6593276e73cd', crashdumps: [], patches: [], updates: [], PBDs: [Array], host_CPUs: [Array], cpu_info: [Object], hostname: '', address: '', metrics: 'OpaqueRef:655a27bd-de06-75cb-d231-8460d949d70a', license_params: [Object], ha_statefiles: [], ha_network_peers: [], blobs: {}, tags: [], external_auth_type: '', external_auth_service_name: '', external_auth_configuration: {}, edition: 'xcp-ng', license_server: [Object], bios_strings: [Object], power_on_mode: '', power_on_config: {}, local_cache_sr: 'OpaqueRef:b90d54ba-44e7-8181-a4a2-6593276e73cd', chipset_info: [Object], PCIs: [Array], PGPUs: [Array], PUSBs: [Array], ssl_legacy: false, guest_VCPUs_params: {}, display: 'enabled', virtual_hardware_platform_versions: [Array], control_domain: 'OpaqueRef:58fb5c9b-18c4-6578-803a-d9b8db989ec9', updates_requiring_reboot: [], features: [], iscsi_iqn: '', multipathing: false, uefi_certificates: '', certificates: [Array], editions: [Array], pending_guidances: [], tls_verification_enabled: true, last_software_update: '19700101T00:00:00Z', https_only: false }, SR: SR { uuid: '8c7e88d7-25b4-02e2-5a4b-6e1e3ead70cf', name_label: 'minisforum-hm80-02 SSD', name_description: '', allowed_operations: [Array], current_operations: {}, VDIs: [Array], PBDs: [Array], virtual_allocation: 45583695872, physical_utilisation: 290037760, physical_size: 901115478016, type: 'ext', content_type: 'user', shared: false, other_config: [Object], tags: [], sm_config: [Object], blobs: {}, local_cache_enabled: true, introduced_by: 'OpaqueRef:NULL', clustered: false, is_tools_sr: false }, VDI: VDI { uuid: '24d348fa-a8db-45b9-a623-724e64883a55', name_label: 'XO CloudConfigDrive', name_description: '', allowed_operations: [Array], current_operations: {}, SR: 'OpaqueRef:13a829ed-3408-f42e-695e-f14adf469fb3', VBDs: [], crash_dumps: [], virtual_size: 10485760, physical_utilisation: 3584, type: 'user', sharable: false, read_only: false, other_config: {}, storage_lock: false, location: '24d348fa-a8db-45b9-a623-724e64883a55', managed: true, missing: false, parent: 'OpaqueRef:NULL', xenstore_data: {}, sm_config: {}, is_a_snapshot: false, snapshot_of: 'OpaqueRef:NULL', snapshots: [], snapshot_time: '19700101T00:00:00Z', tags: [], allow_caching: false, on_boot: 'persist', metadata_of_pool: '', metadata_latest: false, is_tools_iso: false, cbt_enabled: false } } }
It stills tries to use the master and results in a 404
I'm running XO with the following version:
Please provide the commit you are using, not the version
@olivierlambert sure. Using commit 17b275629109160ea3840de0fc70a1faf41bd392
Okay that's good. Are you sure you did rebuild everything?
@olivierlambert said in Cannot use cloudConfig while creating VM with RPC call:
yes I did rebuild everything properly. I even built a new container using dockerfiles from here using the commit above:
and then pushed and tested it and got the same result -
Pinging @julien-f
@julien-f what's weird is that the
points to the correct Host (Slave) but theoriginalUrl
points to the Master. I'm not sure which one receives the request since it is unclear from the warning. My suspicion is that it is 1 of the 2 (assuming that the affinity host is the Slave):- The VDI content i.e., Cloud Config gets built on the Master but when calling /import_raw_vdi it triggers the Slave endpoint i.e.,
- (Most likely) The Cloud Config gets built on the Slave but when calling /import_raw_vdi it triggers the Master endpoint i.e.,
I think #2 is most likely what's happening because I see an Import VDI hanging task on the Master host for every VM that was scheduled on the Slave. I saw it through XOA in the tasks section
- The VDI content i.e., Cloud Config gets built on the Master but when calling /import_raw_vdi it triggers the Slave endpoint i.e.,