After installing updates: 0 bytes free, Control domain memory = 0B
-
@yann
6 hosts had originally been running 8.3 for half a year. A bit of time had passed since I had last checked for updates when I noticed several were waiting to be installed. I accepted this in XO, and honestly did not check what the individual updates did. The names and texts are greek to me (and I speak Norwegian :-).
When the updates were installed I only rebooted one of the six hosts: XCP-ng-005. I do not remember if this was because I was interrupted by a phone call or colleague or something.When I checked back after the reboot I noticed that none of the VMs had come up. Then I discovered all the other symptoms.
Since no VM could be started on the rebooted host - and all were running fine on the remaining 5 hosts - I chose to not reboot any of the other hosts.Fearing that the same thing that happened to XCP-ng-005 will happen to the rest upon reboot I will delay doing so until Vates tell me when it suits them best. Perhaps monday morning is extra busy because week-end has passed. If so then I will wait until tuesday or wednesday.
If it turns out that the remaining hosts reboots fine and the sole reason for my worries was XCP-ng-005 broken mirror then I will LOL. Then ask if we can have some more visible notification in XO when a RAID breaks so we won't have to scour logs to discover such important events.
RAID install:
The only servers I had 8.2 on originally were XCP-ng-001 and 002. Those I could install 8.2 fine. They had ony one drive each. -Could not afford more back then.
XCP-ng-001 died after some time. This was a small testing/learning server. No big loss.
Then I put 003, 004 and 005 together using different hardware than 001 and 002. Here: Trying to install 8.2 failed : Black screen midway during install.
Reading the forums I read that 8.3 had better driver support for modern PC hardware. This worked fine.Two more small servers were added later to host only a handful of VMs (SQL, file server, file backup). These were also installed with 8.3 to "standardize" on this version which I had come to rely on. Had forgotten that this was a beta.
I had long planned on upgrading XCP-ng-002 to 8.3. But since this host only has one drive I planned on emptying VMs over to other hosts, then install the second drive and clean-install it with 8.3 instead of upgrading it. Installing XCP-ng is a walk in the park, and I have had better experience with clean-installs rather than upgrades when it comes to OSes.
In short: No currently running hosts have been upgraded.
Only time I've even tried upgrading was recently when experimenting on XCP-ng-008.
Learning that v8.3 is beta and not recommended for production environments I was desperate since only one of my hosts could install 8.2 without black screen.
Then stormi gave me an 8.2.1 image with additional drivers. This installed fine on the hardware I use.
I then did some testing with upgrading and downgrading between 8.2 and 8.3 to learn what worked and not. Learning that downgrading will not preserve VMs (it states this clearly in the installation routine) I did a clean-install of 8.2.1 on XCP-ng-008 before re-creating all VMs from the failed 005 host on 008 (took me 22 hours of hard concentrated work).So I have been presented with the RAID step during upgrade from 8.2 to 8.3, and remember having navigated into it to see that both drives were still selected. But the servers in production are all clean-installed and not upgraded.
I fear I have not replied to your question. If you do not find the answer in my text then please help me understand
-
@Dataslak said in After installing updates: 0 bytes free, Control domain memory = 0B:
Since this happened on six servers simultaneously when applying updates through XO I guess we may have found an error ?
Nothing tells us it happened on 6 servers. For now, all we know for sure is it happened on one. The rest, you didn't reboot, so they are in a state where it's normal that you can't start new VMs, since Xen was updated from version 4.13 to 4.17 and requires a reboot.
-
@stormi
I am not as experienced as you with what the tabs of a host may display when updates have been installed and host is waiting for reboot.I have delayed restarts before until nightfall to reduce burden on customers. And then I have not seen hosts behave like this :
And:
etc.
If these are possible behaviors after updates are applied then I will add this to my experience and not be so worried next time.
In my ignorance I calculated bad odds for:
- Me randomly choosing to restart the one host with a broken mirror (1/6)
- Host "chose" to boot from the drive that had not been updated properly due to drive failure (1/2)
- But updated properly enough so the update engine did not report any updates (1/A)
(or perhaps did not do so because the host was in service?) (1/1) - Grub chose to boot from the failed drive (1/2)
- And succeeded because drive did not fail enough for the RAID-system to choose the fully functional drive (est. 1/4)
- XO does not report with a red triangle that RAID has been broken (Would that be a nice feature?) (1/B)
Odds: 1 : 96xAxB
Another nice feature would be if this alert system could provide S.M.A.R.T. alerts to better help operators mange drives in their hosts. Currently I feel a bit in the dark on how drives are faring in the hosts.
(I hope you will forgive me for my word salad.)
Can you confidently tell me that restarting another host will work fine?
If the VMs do not come back up then I have no more vacant hardware to establish new ones on (008 was the one I had for backup).
And it will cost me >20 hours of hard labour to do so.
And my colleagues will have to deal with angry customers who may not feel safe purchasing more VMs from me.I aim to build a business, that in turn will pay 1000 USD per host per year to you. I just need help to start up this arm of the company.
If the sales team will get in touch with me so I can negotiate a subscription cost with them, then I hope to unlock features in XO that allow me to better utilize the hardware I have, reduce risks, and free up time I can use to develop my product and sell more.
-
I'm not sure to get it, this cluster running on 8.3 beta is a production cluster?
-
@olivierlambert said in After installing updates: 0 bytes free, Control domain memory = 0B:
I'm not sure to get it, this cluster running on 8.3 beta is a production cluster?
Yeah seems like.
@Dataslak do you have a host that you can reboot without any big impact?
If yes, please do so to see if we can reproduce this issue, might be a bug in the 8.3 or just with the host that you rebooted - time will tell -
@nikade
I do not have any more hosts I can reboot without impact (as in 10 to 20 hours of work + annoyed customers). The only one I had (008) has been used to re-construct the VMs on the failed 005 host. 005 has now become the backup-host.If I know I can get help within (a couple of) hours then I will absolutely reboot another host to see if this reproduces the issue; When do you recommend I do so? (Guesstimating what time periods are least busy for you?)
-
@olivierlambert wrote "I'm not sure to get it, this cluster running on 8.3 beta is a production cluster? "
The first two hosts I built ran 8.2 great. Then I got more customers and built three more hosts with newer components (due to shortage of the ones I used initially). But on these newer machines the installation of 8.2 failed: Black screen midway through the installation. In the forums I read about others having the same problem, solving it by using 8.3.
I do not remember what I read or thought about this being a beta, or if I believed it to be a newer version having arrived. I tried 8.3 and it has worked beautifully.Would it be an idea to inform/remind about 8.3 being in beta like this?
Stormi has recently given me an 8.2.1 image that I installed on one of the newer hosts (008) and it worked. I then spent two full days with ~2hrs sleep to re-create the VMs from 005 on it.
I will start converting all hosts to 8.2.1 as soon as I learn what level of XO I need to buy and what tool I must use in order to be able to migrate VMs from 8.3 to 8.2.
If this is not possible then I will be very happy to hear your recommendation on what to do / not do. Re-creating all VMs will be a task I hope to avoid. Would it then be best to keep running 8.3, but use a test-server to install any updates and test function before rollout?(As I get more income to this business and can pay down debts then I plan on procuring a test environment to do things on before implementing on production. If Vates would grant customers rebated licenses on such non-production servers then that would lower the threshold.)
I am very grateful and happy for your help - as well as your colleagues and contributors.
-
@Dataslak good to hear its starting to work out, always a pain when something like this happends in production.
Are those hosts in the same datacenter and/or network? If yes, you can probably migrate the VM's between your hosts to avoid downtime and having to re-create all the VM's when you want to rebuild a host.We do that at work all the time and it works pretty good, if the VM is larger than 500Gb we try to avoid it but anything below that seems to be fine. Just remember a VM migration may not go on for longer than 24h, so hence my question if the hosts are in the same network
-
-
@nikade
Thank you for your kind words!
Yes, the hosts are located physically in the same room. But due to a mix of issues with slow performance from the 1Gbps network cards (cheap non-intel cards with poor driver availability) and me not having the money to purchase XO license initially + other things and worries I have yet to gather the hosts into a pool. I will get to that when I find room to breatheThank you for the tip about VM migration can not go on >24hrs. Luckily I run small VMs of 35GB.
May I ask if you have any experience migrating hosts between 8.2.1 and 8.3?
-
@yann
With you having to rummage through lots of forum threads reading large amounts of text of high intricacy written by very different people struggling to explain their problems/questions I would be amazed if you caught 100% of intent and meaning at all times. You are human and not AI ?
I usually say that the sender has the greatest responsibility in a conversation since they know their question/issue the best. If you think you misread something then it is most likely me who failed to explain well enough. -
@Dataslak said in After installing updates: 0 bytes free, Control domain memory = 0B:
If these are possible behaviors after updates are applied then I will add this to my experience and not be so worried next time.
It's definitely possible behaviour that an update requires a reboot immediately after applying. But it's rare, limited to major upgrades, and here it happened because it is still a beta release.
-
So, to summarize:
- The root cause of the symptoms on host 005 is grub booting from an unsynced disk. Why exactly this happened is still a mystery, unless I've skipped important data when I catched up with the messages above.
- There's no reason to think this would happen to the other hosts, since we established that the disks are correctly synced. A reboot should be safe and not cause the same issues as what happened on 005.
- To limit the risks, it would be good to migrate VMs away from one host before rebooting it. However, with the major change in version in the Xen component, I can't promise migration will really work as expected (VM crashing, VM refusing to move in one direction or the other...).
The safest is probably to fix host 005 first. I assume removing the bad disk and then booting should be enough?
-
@Dataslak said in After installing updates: 0 bytes free, Control domain memory = 0B:
@nikade
Thank you for your kind words!
Yes, the hosts are located physically in the same room. But due to a mix of issues with slow performance from the 1Gbps network cards (cheap non-intel cards with poor driver availability) and me not having the money to purchase XO license initially + other things and worries I have yet to gather the hosts into a pool. I will get to that when I find room to breatheThank you for the tip about VM migration can not go on >24hrs. Luckily I run small VMs of 35GB.
May I ask if you have any experience migrating hosts between 8.2.1 and 8.3?
From 8.2.1 should be fine, from 8.3 to 8.2.1 might be a problem.
I think that if your VM's are only 35Gb it should be fine to migrate them between hosts even on 1Gbps network, also you can do it without having them in a pool so I think you should try with a test VM and see how long it takes. -
@nikade @olivierlambert @stormi @Danp @yann
Just wanted to say to you all:
Thank you for your contributions and kind helpful assistance which has helped me through this crisis.
I would have been in deep trouble without you. I respect your expertise, and appreciate deeply that you are working so hard to help us dumb users. I have learned a lot, and hope one day to become skilled enough to at least help other new users on this forum.
Best wishes
Aslak -
@Dataslak said in After installing updates: 0 bytes free, Control domain memory = 0B:
@nikade @olivierlambert @stormi @Danp @yann
Just wanted to say to you all:
Thank you for your contributions and kind helpful assistance which has helped me through this crisis.
I would have been in deep trouble without you. I respect your expertise, and appreciate deeply that you are working so hard to help us dumb users. I have learned a lot, and hope one day to become skilled enough to at least help other new users on this forum.
Best wishes
AslakHappy everything worked out, this is what this community is all about.
I've gotten a lot of help and given some too, it's all about helping out with the things that you can.
With time you'll be able to help out more and more and more