Resolved. Patched the rest of the hosts. Restarted all the toolstacks for good measure. I can move the master role now, and deploy new VMs.
Posts
-
RE: botched pool patching and now we can't change pool master
-
RE: botched pool patching and now we can't change pool master
@Danp To answer your earlier question about the state of the patched hosts, three hosts are currently fully patched with one of them as the current pool master. I've tried to promote both of the other two with the same "Cannot restore" error.
For the sake of completeness I just tried promoting one of the non-patched hosts. Same error.
And I'm still unable to deploy a new VM ("NOT_SUPPORTED_DURING_UPGRADE").
-
RE: botched pool patching and now we can't change pool master
@Danp I believe so. I was not driving at the time.
-
RE: botched pool patching and now we can't change pool master
@Danp I was attempting to change the host as we'll be retiring the current master. In any case if I can't promote another host this would suggest our production pool is otherwise wedged. The same for not being able to create a new VM.
-
RE: botched pool patching and now we can't change pool master
@Danp rebooted it again to be sure. No change, I still get the first error
-
botched pool patching and now we can't change pool master
One of our engineers who should know better patched a host in our 8.3.0 pool before patching the master and then tried to promote it. Later they emptied, patched and rebooted the current master. I'm not sure exactly what else they did in their flailing before I stepped in a few hours ago.
Right now the current master is patched but I'm unable to change the master, I get an error "Cannot restore on this host because it was saved on an incompatible version." I tried restarting the toolstack on every host in the pool and the next attempt returned a different error "Cannot forward messages because the host cannot be contacted. The host may be switched off or there may be network connectivity problems" but that may be just because the dust hadn't settled from the toolstack restarts yet? I waited ten minutes and tried again and got the first error again.
I could continue emptying, patching, rebooting the rest of the hosts but I don't want to leave this pool in an unknown state and find out later on things are broken under the hood.
I'd be grateful for any guidance; this is our production pool and if I'm not confident it's healthy we'll start the painful process of creating a new pool and slow cross-pool VM migrations.
Update: I also can't create new VMs.
-
RE: Change VM attribute "name" (NOT "name-label")
After reading up on xl you're not wrong. I'll fix the script and firmly chastise the guy that wrote it (in the mirror). I think I went with xl because it was easier to split the results in python.
-
RE: Change VM attribute "name" (NOT "name-label")
In this case I'm only using it to return VM info for an export script. The script takes the VM name as an argument and then walks the hosts and vm lists looking for it. But the VM was cloned and has a "name" of <name-label><snapshot_stamp><_COPY>
-
Change VM attribute "name" (NOT "name-label")
I'm trying to change the "name" attribute of a VM that's returned by the "xl vm-list" command. This is not the same as the "name-label" attribute returned by "xe vm-list" and shown in XOA. No sign of any such attribute in the auto completion for the xe vm-param-set command.
I could clone the VM to the correct name but that feels like overkill.
-
RE: Live Migrate fails with `Failed_to_suspend` error
@randyrue Confirmed that after cold booting a VM to a new host I can then migrate it live to another new host.
-
RE: Live Migrate fails with `Failed_to_suspend` error
We're having the same trouble. xcp-ng 8.2.1 and xoa 5.95.2. VMs are ubuntu 20, 22, and 24 LTS, some SuSE. Failed migrations are keeping us from vacating hosts for patching. Shutting down the VM lets us move it but some production VMs require significant planning to take down.
-
RE: PXE Boot a VM and use HTTP for kernel/initrd download
I did just find this: you should realize that GRUB may be relying on the network support of the UEFI firmware at that point. To support boot over HTTP, the firmware needs to support UEFI specification version 2.5 or greater
Finding even more information. uefi.org says v2.6 was released Jan 2016. And VMs use the tianocore UEFI code, don't know what version or how to tell.
Anybody know what UEFI version a VM is running in its "firmware?"
-
PXE Boot a VM and use HTTP for kernel/initrd download
Hello,
We're unable to PXE boot a VM in UEFI for an unattended install as the tftp download of the kernel and initrd file download is so slow it times out. The kernel loads in a few minutes and initrd fails after 45 minutes or so. This is not a new problem and I've found many posts describing it and none with a real solution. Some posts claim the problem is block size and I've tried every suggestion with no change.
Other posts say to use http instead of tftp but if I specify an http source, grub just reads it as a local source and I get "file not found."
I wasn't initially sure this was even an xcp-ng question. Seems like it might depend on whether the grubx64.efi I'm using supports http as a device. FWIW I'm using the efi file that came with the Ubuntu 22.04 LTS install media. But it looks like this is also determined by whether the VM's "BIOS" supports it. Anybody know?
I'm also finding references to iPXE as an alternative. That can involve flashing the PXE firmware on a physical device but they also have some kind of chainloader alternative.
Has anybody made pxe installing a UEFI VM work?
-
RE: xcp-ng Pro "alert" in the XOA pools page
You make a good point about costs not just disappearing because we externalize them. Gonna have to think about how we can pay our way without actually being able to pay. Active participation in the forums, maybe.
-
RE: xcp-ng Pro "alert" in the XOA pools page
Thank you for your reply. And my bad for using the word "freeware" carelessly. And yes, while I have opinions on cluttering an interface with big red alerts that aren't actually problem, along the lines of a signal to noise thing and desensitizing users to actual problems, I can live with it as a cost of getting this stack for free. Have I mentioned how glad we are to be rid of Citrix? Not just the cost, but their increasingly convoluted, noisy and unusable sales and support systems?
-
xcp-ng Pro "alert" in the XOA pools page
There's a new great big red "bang" in the pools page that links to sales info for paid support for xcp-ng.
I recognize that folks need to make a living and that freeware comes with tradeoffs but my understanding is that xcp-ng came about as a response to Citrix making their hypervisor into crippleware. We switched to xcp-ng and XOA because we're a non-profit with almost literally no money for IT. Should we be concerned that xcp-ng might go the same way Citrix did?
And is there a way to acknowledge / disable those red alerts in the XOA interface? This is not a "critical error" situation.