@manilx champion!! That's excellent! Thank you so much!
Posts made by ptunstall
-
RE: 2 hosts in a pool, how to force VM to boot on a specific host. (GPUs involved)
-
2 hosts in a pool, how to force VM to boot on a specific host. (GPUs involved)
Hi! LOVE XCP-ng, its been the backbone of my little growing VFX studio.
I have a pool with two identical hosts, each have the same GPUs in the same slots and to the OS the slots have the same ID. Up until this point I have been doing 1 host per pool but decided to try and add more hosts to a pool to take advantage of the pool features. How do I assign a VM to a specific host? I have to manually assign the GPU passthrough so this kind of breaks the "fluidity" of the VM floating between hosts thing. (for example If I assign GPU c1 to a VM and I have two c1's on a pool how do I manage to assign c1 on host 1 and not c1 on host 2?).
Or maybe I'm not assigning GPUs correctly and there's a better way to manage these types of VMs?
-
RE: error -104
@tuxen No GPUs were removed. only 2 were added. The only PCIE item removed was a NIC but I didn't remove it from dom0 or assign it to any VMs, it was just in the system.
-
RE: error -104
@tjkreidl Yes, This node had 12GPUs in it running in a bare metal environment a year ago before being repurposed.
-
RE: error -104
Additionally I noticed that when SSHed into the node and working with the CLI xe commands some of them don't go through:
[16:19 gpuhost05 ~]# xe vm-list uuid ( RO) : b8e7a3c8-e68e-ac45-2dec-b04b4fc5426b name-label ( RW): vast-ws14 power-state ( RO): halted uuid ( RO) : c6b78b22-1153-4622-a5a1-1a0880b2d68f name-label ( RW): Control domain on host: gpuhost05 power-state ( RO): running [16:19 gpuhost05 ~]# xe vm-list uuid ( RO) : b8e7a3c8-e68e-ac45-2dec-b04b4fc5426b name-label ( RW): vast-ws14 power-state ( RO): halted uuid ( RO) : c6b78b22-1153-4622-a5a1-1a0880b2d68f name-label ( RW): Control domain on host: gpuhost05 power-state ( RO): running [16:19 gpuhost05 ~]# xe vm-list uuid ( RO) : b8e7a3c8-e68e-ac45-2dec-b04b4fc5426b name-label ( RW): vast-ws14 power-state ( RO): halted uuid ( RO) : c6b78b22-1153-4622-a5a1-1a0880b2d68f name-label ( RW): Control domain on host: gpuhost05 power-state ( RO): running [16:19 gpuhost05 ~]# xe vm-list Error: Connection refused (calling connect ) [16:19 gpuhost05 ~]#
I try to start a VM manually:
[16:11 gpuhost05 ~]# xe vm-start uuid=b8e7a3c8-e68e-ac45-2dec-b04b4fc5426b Lost connection to the server. [16:12 gpuhost05 ~]# xe vm-start uuid=b8e7a3c8-e68e-ac45-2dec-b04b4fc5426b Lost connection to the server. [16:12 gpuhost05 ~]# xe vm-start uuid=b8e7a3c8-e68e-ac45-2dec-b04b4fc5426b Lost connection to the server. [16:12 gpuhost05 ~]# xe vm-list uuid ( RO) : b8e7a3c8-e68e-ac45-2dec-b04b4fc5426b name-label ( RW): vast-ws14 power-state ( RO): halted uuid ( RO) : c6b78b22-1153-4622-a5a1-1a0880b2d68f name-label ( RW): Control domain on host: gpuhost05 power-state ( RO): running [16:12 gpuhost05 ~]# xe vm-start uuid=b8e7a3c8-e68e-ac45-2dec-b04b4fc5426b Lost connection to the server.
-
RE: error -104
We just encountered this again.
I added 2 new GPUs to the node and removed 1 (unused) NIC. Nothing else was changed in the system. Just 3 PCIe changes. The already installed and assigned GPUs were not removed or changed at all, full error:
server.enable { "id": "565d1ea8-582c-4596-ae1f-d96f95ef2c37" } { "errno": -104, "code": "ECONNRESET", "syscall": "write", "url": "https://10.169.4.124/jsonrpc", "call": { "method": "session.login_with_password", "params": "* obfuscated *" }, "message": "write ECONNRESET", "name": "Error", "stack": "Error: write ECONNRESET at WriteWrap.onWriteComplete [as oncomplete] (node:internal/stream_base_commons:94:16) at WriteWrap.callbackTrampoline (node:internal/async_hooks:130:17)" }
I can SSH into the node without issue.
I was looking over this: https://xcp-ng.org/docs/api.html
Tried this:
xe-toolstack-restart
I get this error now:
server.enable { "id": "88698db1-9b95-4ca8-b690-98395145f282" } { "errno": -111, "code": "ECONNREFUSED", "syscall": "connect", "address": "10.169.4.124", "port": 443, "url": "https://10.169.4.124/jsonrpc", "call": { "method": "session.login_with_password", "params": "* obfuscated *" }, "message": "connect ECONNREFUSED 10.169.4.124:443", "name": "Error", "stack": "Error: connect ECONNREFUSED 10.169.4.124:443 at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1300:16) at TCPConnectWrap.callbackTrampoline (node:internal/async_hooks:130:17)" }
I will try this suggested same version upgrade and report back.
-
RE: error -104
While I was able to solve this issue the first time it popped up for us by returning the GPUs back to the DOM, this issue happened again 2 weeks ago for us and I was unable to get it to work again. We had to re-install the HOST entirely to get it to work. I'm sure this is a user error on our part by missing something. I'd very much like to know the proper workflow to solve this as XCP-ng is our backbone to our entire virtual VFX production suite.
We used this command to push the GPUs back to the DOM
/opt/xensource/libexec/xen-cmdline --delete-dom0 xen-pciback.hide
-
RE: error -104
@olivierlambert It was a pci passthrough issue. A device changed from 83: to 84: for no reason at all... All good now!
-
error -104
Hi we're getting this error via the web interface when we try to reconnect to a host but we can SSH into the node just fine.
server.enable { "id": "0e1f7b1c-9cc5-4c31-ae55-32185fcb637d" } { "errno": -104, "code": "ECONNRESET", "syscall": "write", "url": "https://10.169.4.120/jsonrpc", "call": { "method": "session.login_with_password", "params": "* obfuscated *" }, "message": "write ECONNRESET", "name": "Error", "stack": "Error: write ECONNRESET at WriteWrap.onWriteComplete [as oncomplete] (node:internal/stream_base_commons:94:16) at WriteWrap.callbackTrampoline (node:internal/async_hooks:130:17)" }
Apologies if this has been asked before I searched for the error number and the econnreset but didn't find anything relevant to this issue.
-
RE: IMPORT_ERROR after trying to import VM I just exported from XOA
@danp Sweet that worked! Thank you!!
This things pretty kickass, I was just able to duplicate a VM that was rendering heavy CG to a completely different node and had that duplicate VM rendering in less than 5 mins...
Thank you so much for the assistance!
-
RE: IMPORT_ERROR after trying to import VM I just exported from XOA
- XOA says its: Current version: 5.64.0
- 2 separate VMs act like this.
Thank you so much for the assistance!
-
IMPORT_ERROR after trying to import VM I just exported from XOA
I export a VM using the export function. It downloads a XVA file. I goto import and try to import the same export XVA and I get this error:
IMPORT_ERROR(INTERNAL_ERROR: [ (Failure Int64.of_string) ])
Using latest version and JUST updated after starting the "trial" hoping that'd fix it.
Full dump below:
HTTP handler of vm.import undefined { "code": "IMPORT_ERROR", "params": [ "INTERNAL_ERROR: [ (Failure Int64.of_string) ]" ], "url": "https://10.169.3.20/import/?sr_id=OpaqueRef%3A804ca2b5-bca5-46be-96e9-d53c0f0560d5&session_id=OpaqueRef%3A3fb77c2a-4f6c-4515-821c-615833193a10&task_id=OpaqueRef%3Ad8efd0de-9d27-4221-81e0-4a6a84b67563", "task": { "uuid": "2fe83d05-46b4-ec3b-9aa4-aa3c08f3b42a", "name_label": "[XO] VM import", "name_description": "", "allowed_operations": [], "current_operations": {}, "created": "20211220T07:54:00Z", "finished": "20211220T07:54:00Z", "status": "failure", "resident_on": "OpaqueRef:bd272387-df2d-49ed-a73e-2d2324f089af", "progress": 1, "type": "<none/>", "result": "", "error_info": [ "IMPORT_ERROR", "INTERNAL_ERROR: [ (Failure Int64.of_string) ]" ], "other_config": {}, "subtask_of": "OpaqueRef:NULL", "subtasks": [], "backtrace": "(((process xapi)(filename lib/backtrace.ml)(line 210))((process xapi)(filename ocaml/xapi/import.ml)(line 1966))((process xapi)(filename ocaml/xapi/server_helpers.ml)(line 100)))" }, "pool_master": { "uuid": "333eb4d9-3349-4ddc-a5d1-46d2df8ada21", "name_label": "microservices02", "name_description": "Default install", "memory_overhead": 2050740224, "allowed_operations": [ "vm_migrate", "provision", "vm_resume", "evacuate", "vm_start" ], "current_operations": {}, "API_version_major": 2, "API_version_minor": 16, "API_version_vendor": "XenSource", "API_version_vendor_implementation": {}, "enabled": true, "software_version": { "product_version": "8.2.0", "product_version_text": "8.2", "product_version_text_short": "8.2", "platform_name": "XCP", "platform_version": "3.2.0", "product_brand": "XCP-ng", "build_number": "release/stockholm/master/7", "hostname": "localhost", "date": "2020-11-05", "dbv": "0.0.1", "xapi": "1.20", "xen": "4.13.1-9.6.1.xcpng8", "linux": "4.19.0+1", "xencenter_min": "2.16", "xencenter_max": "2.16", "network_backend": "openvswitch", "db_schema": "5.601" }, "other_config": { "rpm_patch_installation_time": "1639958987.567", "iscsi_iqn": "iqn.2021-12.com.example:52b13bbc", "agent_start_time": "1639947685.", "boot_time": "1639947566." }, "capabilities": [ "xen-3.0-x86_64", "xen-3.0-x86_32p", "hvm-3.0-x86_32", "hvm-3.0-x86_32p", "hvm-3.0-x86_64", "" ], "cpu_configuration": {}, "sched_policy": "credit", "supported_bootloaders": [ "pygrub", "eliloader" ], "resident_VMs": [ "OpaqueRef:fbd83ece-378e-49d3-b7ee-4e623c5d8f11", "OpaqueRef:65672e13-7237-48aa-ae86-81ca87d0d5fc" ], "logging": {}, "PIFs": [ "OpaqueRef:7528ccad-65ed-4406-8d9f-accc20579152" ], "suspend_image_sr": "OpaqueRef:804ca2b5-bca5-46be-96e9-d53c0f0560d5", "crash_dump_sr": "OpaqueRef:804ca2b5-bca5-46be-96e9-d53c0f0560d5", "crashdumps": [], "patches": [], "updates": [], "PBDs": [ "OpaqueRef:31530dbf-e1e2-49a8-9bf5-f08c17c96205", "OpaqueRef:6fb0d154-616c-4677-8046-4acce2c7ed79", "OpaqueRef:e1c4a4c0-7d2a-4e64-b150-0438006187cb", "OpaqueRef:ced26859-34f2-47cf-8dcc-04695f1c11ac", "OpaqueRef:7d8febbb-ef02-4cc4-812d-9d4a017a5854" ], "host_CPUs": [ "OpaqueRef:15f91497-9ba5-4583-9ab2-53714bb150ec", "OpaqueRef:fce4e763-b928-44b6-b1f4-1006d4c17f4c", "OpaqueRef:cd48fb2f-f82e-40a6-84f1-d1fdecca66cc", "OpaqueRef:c934f047-2771-4370-85a4-413c1cc4b996", "OpaqueRef:2d16f513-029a-42b7-8d69-f7fa9bff3774", "OpaqueRef:27514a9b-5d4f-410e-8b46-3b87b6aa5d79", "OpaqueRef:1a756cb1-9ed7-4372-b3c5-ea63a5674def", "OpaqueRef:33b8055e-3ba4-4120-8c0f-e5fad72609e6", "OpaqueRef:7849117a-f4bb-4cf3-99e5-0e683e90283d", "OpaqueRef:1c48fb98-bc13-422f-842a-2571a0489fbe", "OpaqueRef:90710810-2fff-429d-b74b-a04fbf4e646d", "OpaqueRef:a3116f7a-e241-4701-94d9-dd4ea1053379", "OpaqueRef:2f631fcc-a2e6-4fbe-a3a8-e2e03b77ad23", "OpaqueRef:9eff703c-47f1-42d8-a87f-445955104b45", "OpaqueRef:84601156-d935-4435-b1af-8d0f4e022a76", "OpaqueRef:08bb40ae-cc21-4d88-9e64-b7abac34de0f", "OpaqueRef:a4e9de92-1927-44cb-9e6e-3234cdce71d5", "OpaqueRef:07c9b725-3543-4444-a5af-e2b7f3031457", "OpaqueRef:d4a79110-f4e8-42a2-a2b0-57f487d732e7", "OpaqueRef:b3a382d1-53d4-4e6b-bc81-9cc0650abda9", "OpaqueRef:2e3a415b-0ee1-4aac-b39f-98a11ba542ab", "OpaqueRef:172cd507-5bf3-46ca-81a6-9c2dd9ec7931", "OpaqueRef:a6e34196-984c-485c-98ef-2918782d528c", "OpaqueRef:a319e5ca-a4a2-493a-8ab7-30ff66f1fe7a", "OpaqueRef:1448e470-f643-4264-83f0-78a9cb7f582c", "OpaqueRef:6c6e9316-cf05-4201-977f-48fcfd1d472e", "OpaqueRef:9616f8c3-ecd4-4b91-b4a8-ba80a678b4a6", "OpaqueRef:c1382984-0d11-4cb1-82aa-cbef62e255f9", "OpaqueRef:9e9a73ce-ce2b-4e30-9dac-0514597510d7", "OpaqueRef:bb3ec57d-cdee-4cb6-a342-55ef30e24f25", "OpaqueRef:459118c6-d095-49b0-bdbe-963e56bc9d38", "OpaqueRef:de06dfee-40c5-43d6-b16c-4b0c9a18cb85" ], "cpu_info": { "cpu_count": "32", "socket_count": "2", "vendor": "GenuineIntel", "speed": "2900.004", "modelname": "Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz", "family": "6", "model": "45", "stepping": "7", "flags": "fpu de tsc msr pae mce cx8 apic sep mca cmov pat clflush acpi mmx fxsr sse sse2 ss ht syscall nx rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid pni pclmulqdq est ssse3 cx16 sse4_1 sse4_2 popcnt aes xsave avx hypervisor lahf_lm ssbd ibrs ibpb stibp xsaveopt", "features": "1fbee3ff-bfebfbff-00000001-2c100800", "features_pv": "1fc9cbf5-96b82203-2991cbf5-00000003-00000001-00000000-00000000-00000000-00001000-8c000400-00000000-00000000-00000000-00000000-00000000", "features_hvm": "1fcbfbff-97ba2223-2d93fbff-00000403-00000001-00000000-00000000-00000000-00001000-9c000400-00000000-00000000-00000000-00000000-00000000", "features_hvm_host": "1fcbfbff-97ba2223-2d93fbff-00000403-00000001-00000000-00000000-00000000-00001000-9c000400-00000000-00000000-00000000-00000000-00000000", "features_pv_host": "1fc9cbf5-96b82203-2991cbf5-00000003-00000001-00000000-00000000-00000000-00001000-8c000400-00000000-00000000-00000000-00000000-00000000" }, "hostname": "microservices02", "address": "10.169.3.20", "metrics": "OpaqueRef:01637ccc-393e-4f31-b20e-9a17a6874380", "license_params": { "restrict_vswitch_controller": "false", "restrict_lab": "false", "restrict_stage": "false", "restrict_storagelink": "false", "restrict_storagelink_site_recovery": "false", "restrict_web_selfservice": "false", "restrict_web_selfservice_manager": "false", "restrict_hotfix_apply": "false", "restrict_export_resource_data": "false", "restrict_read_caching": "false", "restrict_cifs": "false", "restrict_health_check": "false", "restrict_xcm": "false", "restrict_vm_memory_introspection": "false", "restrict_batch_hotfix_apply": "false", "restrict_management_on_vlan": "false", "restrict_ws_proxy": "false", "restrict_vlan": "false", "restrict_qos": "false", "restrict_pool_attached_storage": "false", "restrict_netapp": "false", "restrict_equalogic": "false", "restrict_pooling": "false", "enable_xha": "true", "restrict_marathon": "false", "restrict_email_alerting": "false", "restrict_historical_performance": "false", "restrict_wlb": "false", "restrict_rbac": "false", "restrict_dmc": "false", "restrict_checkpoint": "false", "restrict_cpu_masking": "false", "restrict_connection": "false", "platform_filter": "false", "regular_nag_dialog": "false", "restrict_vmpr": "false", "restrict_vmss": "false", "restrict_intellicache": "false", "restrict_gpu": "false", "restrict_dr": "false", "restrict_vif_locking": "false", "restrict_storage_xen_motion": "false", "restrict_vgpu": "false", "restrict_integrated_gpu_passthrough": "false", "restrict_vss": "false", "restrict_guest_agent_auto_update": "false", "restrict_pci_device_for_auto_update": "false", "restrict_xen_motion": "false", "restrict_guest_ip_setting": "false", "restrict_ad": "false", "restrict_nested_virt": "false", "restrict_live_patching": "false", "restrict_set_vcpus_number_live": "false", "restrict_pvs_proxy": "false", "restrict_igmp_snooping": "false", "restrict_rpu": "false", "restrict_pool_size": "false", "restrict_cbt": "false", "restrict_usb_passthrough": "false", "restrict_network_sriov": "false", "restrict_corosync": "true", "restrict_zstd_export": "false", "restrict_pool_secret_rotation": "true" }, "ha_statefiles": [], "ha_network_peers": [], "blobs": {}, "tags": [], "external_auth_type": "", "external_auth_service_name": "", "external_auth_configuration": {}, "edition": "xcp-ng", "license_server": { "address": "localhost", "port": "27000" }, "bios_strings": { "bios-vendor": "American Megatrends Inc.", "bios-version": "02.30.66.01", "system-manufacturer": "ZTsystems", "system-product-name": "Z1100IF Config DA", "system-version": "1.0", "system-serial-number": "206307450127", "baseboard-manufacturer": "ZTsystems", "baseboard-product-name": "A9DRPF", "baseboard-version": "TBD", "baseboard-serial-number": "5F48NP1641", "oem-1": "Xen", "oem-2": "MS_VM_CERT/SHA1/bdbeb6e0a816d43fa6d3fe8aaef04c2bad9d3e3d", "oem-3": "TBD", "hp-rombios": "" }, "power_on_mode": "", "power_on_config": {}, "local_cache_sr": "OpaqueRef:NULL", "chipset_info": { "iommu": "true" }, "PCIs": [ "OpaqueRef:d88f8c21-4b98-4ad2-9954-13118da04b3b", "OpaqueRef:dd31350a-ee7e-4bd0-a082-c1751cf7bff1", "OpaqueRef:c17fa13e-ce86-4974-9742-a4f83af5604e", "OpaqueRef:263192ee-370b-4203-a371-6cd880fbd11c", "OpaqueRef:bdae9dc3-b59a-4b63-979c-ab69a6889bb9", "OpaqueRef:8a1aca48-3d28-43ac-aefb-381562ee328e" ], "PGPUs": [ "OpaqueRef:aef220da-ca48-44e6-a207-1c73692086d4" ], "PUSBs": [], "ssl_legacy": false, "guest_VCPUs_params": {}, "display": "enabled", "virtual_hardware_platform_versions": [ 0, 1, 2 ], "control_domain": "OpaqueRef:65672e13-7237-48aa-ae86-81ca87d0d5fc", "updates_requiring_reboot": [], "features": [], "iscsi_iqn": "iqn.2021-12.com.example:52b13bbc", "multipathing": false, "uefi_certificates": "", "certificates": [], "editions": [ "xcp-ng" ] }, "SR": { "uuid": "92e3e5a2-72c9-510f-4e9b-e50295917a4e", "name_label": "Local storage", "name_description": "", "allowed_operations": [ "vdi_enable_cbt", "vdi_list_changed_blocks", "unplug", "plug", "pbd_create", "vdi_disable_cbt", "update", "pbd_destroy", "vdi_resize", "vdi_clone", "vdi_data_destroy", "scan", "vdi_snapshot", "vdi_mirror", "vdi_create", "vdi_destroy", "vdi_set_on_boot" ], "current_operations": {}, "VDIs": [ "OpaqueRef:59b27083-e45c-41ea-a934-df4e8dfb898e", "OpaqueRef:e67a7a26-0e81-4ca7-9ed7-1d4537155dcb" ], "PBDs": [ "OpaqueRef:ced26859-34f2-47cf-8dcc-04695f1c11ac" ], "virtual_allocation": 75161927680, "physical_utilisation": 75329699840, "physical_size": 979634225152, "type": "lvm", "content_type": "user", "shared": false, "other_config": { "i18n-original-value-name_label": "Local storage", "i18n-key": "local-storage" }, "tags": [], "sm_config": { "allocation": "thick", "use_vhd": "true", "devserial": "scsi-1ATA_InIand_SATA_SSD_IBYTMC211100100101" }, "blobs": {}, "local_cache_enabled": false, "introduced_by": "OpaqueRef:NULL", "clustered": false, "is_tools_sr": false }, "message": "IMPORT_ERROR(INTERNAL_ERROR: [ (Failure Int64.of_string) ])", "name": "XapiError", "stack": "XapiError: IMPORT_ERROR(INTERNAL_ERROR: [ (Failure Int64.of_string) ]) at Function.wrap (/usr/local/lib/node_modules/xo-server/node_modules/xen-api/src/_XapiError.js:16:12) at _default (/usr/local/lib/node_modules/xo-server/node_modules/xen-api/src/_getTaskResult.js:11:29) at Xapi._addRecordToCache (/usr/local/lib/node_modules/xo-server/node_modules/xen-api/src/index.js:898:24) at forEach (/usr/local/lib/node_modules/xo-server/node_modules/xen-api/src/index.js:932:14) at Array.forEach (<anonymous>) at Xapi._processEvents (/usr/local/lib/node_modules/xo-server/node_modules/xen-api/src/index.js:922:12) at Xapi._watchEvents (/usr/local/lib/node_modules/xo-server/node_modules/xen-api/src/index.js:1087:14)" }