@manilx For host.stop
, what does the bypassBackupCheck
do? I think bypassEvacuate
is pretty clear, that must mean "don't try to migrate running VMs to another server before stopping." I feel like I would want to bypass whatever the backup check is because I've lost power and the server needs to be shut down.
Posts
-
RE: Why does emergencyShutdown take a lot longer than shutting down the host from the console?
-
RE: Why does emergencyShutdown take a lot longer than shutting down the host from the console?
@manilx That is great to hear, thank you.
I will definitely share it with the community. Still trying to iron out some of the wrinkles. Testing requires me to let all my servers get shut down so I'm limited in how frequently I can test the solution.
This weekend was my third pull-the-plug test and it was the closest to totally working. In fact, I think this test did totally work, but due to network changes I had to shut down xcp because I've found it gets really mad if you change anything about its network while it's running. It was using the physical console to shut it down that I was shocked at how fast it shut down. That's why I posted to ask about the difference.
I think changing my script to just use
host.stop
will resolve my last concerns about the script. Having a faster shutdown for xcp might also allow me to go back to my original design when I let the more important VMs live a bit longer. I originally staged when VMs got shut down so the important once could survive a 5 or 10 minute power outage. Turns out with xcp taking 8 minutes to shutdown after the VMs were down, I had to change my script to start to close everything as soon as it was clear that this wasn't just a small power blip.When I post it though, everyone will need to recognize that I'm no bash coder. I write code but this is the only bash stuff I've done so it could be rough.
-
RE: Why does emergencyShutdown take a lot longer than shutting down the host from the console?
@olivierlambert I understood what
emergencyShutdownHost
does, I was surprised that it seems to take a long time even if all the VMs were stopped before executing it. There should be nothing to suspend but it still takes about 8 minutes for the server to finish the shutdown.I will start using
host.stop
instead ofhost.emergencyShutdownHost
for the hosts that have no running VMs. Realistically, when having NUT shut it down, I'd rather the host just issue a clean shutdown command to any running VMs. I'm not sure whathost.stop
will do if there is a running VM. If it would politely ask it to stop then that would be perfect, if it yanks the virtual power cord then I wouldn't like that.The ideal is that no host has a running VM by the time I want to shut it down but since NUT runs in a VM at the moment, one host will have that one running VM. I'll be in a bit of a race condition if I issue
host.stop
immediately followed byshutdown now
. It's virtual murder-suicide but the murderer's life depends on the murdered. Can Linux shut down before xcp kills it? Might be an interesting test. -
RE: Performing automated shutdown during a power failure using a USB-UPS with NUT - XCP-ng 8.2
@nomad What do you mean by the grand reconfiguration? Is there a new version that changes how things work?
-
RE: Why does emergencyShutdown take a lot longer than shutting down the host from the console?
@olivierlambert No, I don't use any VM as an SR in XO. Other than the local storage SRs, DVD, removable media and such, the only SRs are hosted on a Synology array or on my UNRAID server.
-
Why does emergencyShutdown take a lot longer than shutting down the host from the console?
This weekend I was working on my servers and I needed to shut them down. I closed all the VMs, then from the physical console I used the UI to tell the host to shut down. It shut down in about 30 to 60 seconds and powered off.
In testing my NUT scripts, they shut down all the VMs then issue the host.emergencyShutdown command and it takes the hosts about 8 minutes to shut down.
Any reason for that difference? Is there a command I can issue through the xo-cli that would cause the faster shutdown?
Another advantage is that the shutdown command from the console actually turns the power off while emergencyShutdown shuts everything down but doesn't power down the hardware, at least it never has for me.
-
RE: XO instance UI unreachable during backups
@Danp Ah, thank you for that.
I need to restructure things a bit but I was already thinking I would do that. The issue is that this VM also runs NUT so it's the last VM running before shutting down the servers. I reduced the memory because it takes a LONG time to suspend a VM with 16GB RAM but doesn't take long to shut one down. Between the 8 to 10 minutes it takes for XCP-ng to shut down and the time it takes to suspend a VM with 16GB RAM, I don't think my batteries will last that long.
I'll have to move NUT into a leaner VM that doesn't handle backups. That's something I was thinking I would do anyway because if there was a power outage during a backup I don't think my NUT script would be able to do what it needs to do. Based on the cron job I run to make sure my xo-cli registration is good, the xo-cli stuff won't run when the VM is hammered like this.
Thanks for helping me understand why this happens and how to fix it.
-
XO instance UI unreachable during backups
I've noticed recently that when my backups are running, they totally slam the CPUs and the web UI is inaccessible. What can I do to improve this? Should I give the instance more than 4 CPU cores? I used to give the instance 16GB RAM but it never went higher than 2GB so I reduced it. Could that cause this?
I can still SSH into the instance but I have little way to know how much of the backup is complete or which backups are finished. I've seen this multiple times a week for the last month or so.
This goes on for hours and without the web UI I can't even gauge how much time might be left. I came in this weekend because I'm trying to improve my network setup to hopefully help with things like this but I can't shut down the servers and tear the network apart when XO is at some unknown point in the backup. Going to try to come back tomorrow and see if it's finished, well, I'll be smarter tomorrow and check the status from home first.
Currently running XO from source commit 1bc0f (two commits behind current due to taking a couple days off last week).
-
RE: NVIDIA Tesla M40 for AI work in Ubuntu VM - is it a good idea?
@gskger said in NVIDIA Tesla M40 for AI work in Ubuntu VM - is it a good idea?:
Nvidia RTX A2000 12GB
I am curious how your testing goes. That sounds like a great card. Not as expensive as the T4 so might be more reasonable for me to consider.
-
RE: Moving management network to another adapter and backups now fail
I'm bummed to hear that it isn't tolerant of changes. When I set up xcp originally, I gave it the first 10Gb port as the management interface and that's on the main LAN, no VLAN. Now I was wanting to move management off of the main LAN and onto a dedicated VLAN on the second 10Gb port. I've been nervous to make that change because I don't want to break something, it seems that concern was well founded. I was actually planning on posting today to ask about how to best move the management interface into a VLAN on a separate port.
Feels like I just have to live with everything on the same port and I won't be able to isolate the management or backup traffic like I want to. Maybe I could move the backups onto a separate VLAN or does that happen through the management interface? I think I need to dive back into the docs.
-
RE: How can I duplicate backup settings to a different XO instance? Should I?
@austinw I have three instances, all on different hosts, one on a host that isn't running XCP.
Is that a weird form of the 3, 2, 1 rule for backups? 3 instances on 2 different host OSs, 1 outside the xcp pool.
I've had times that I needed to tweak the XO instance so it's definitely nice to have another one to do it with. I just realize that if that XO instance went down, like if I lost the host it's running on, then I'd have a tougher time restoring backups because the other instances don't know about the backups. Note, the host running that XO instance is not one my main hosts for the pool, it doesn't run any business critical VMs. Lest someone chastise me for handling backups on the hardware that's running the VMs being backed up.
-
How can I duplicate backup settings to a different XO instance? Should I?
I have XO running in a few different places. One of them handles backups and I think of that XO as my main version. It occurred to me that if I lost that instance for some reason then it might be tough to restore one of my backups. Therefore, does it make sense to somehow duplicate the backup configuration to other installs of XO so they can see the backups or would they just be confused because they have no record of the ID used for the backup.
Would it make sense to use the XO Config under settings to just copy the entire XO config from one install to another? Is there a downside to that?
It's my intent that the various XO installs all manage the same pool. I certainly don't want them all performing the same backups though so I'd obviously have to disable those backups on the other installs.
Good idea? Bad idea?
-
RE: NVIDIA Tesla M40 for AI work in Ubuntu VM - is it a good idea?
@gskger Yeah, looks like it would be too tight. Ouch, those T4s are an order of magnitude more expensive. I'm definitely not interested in going that route.
-
RE: NVIDIA Tesla M40 for AI work in Ubuntu VM - is it a good idea?
@gskger I'm so glad you put pics of your server in that other post. I looked at it last month when you posted it but when I looked at it again yesterday, I realized that those cards might not fit in my R730xd because I have the center drive rack with 4 extra internal drives. I'm concerned that one of those cards would not work with those drives in place since they basically totally cover the memory and CPUs. I'm also concerned that with those center drives, the airflow out of the row of case fans is less direct and might cause heat problems for a video card like that.
-
Unexpected VM autostart behavior on pool reboot
I want to make sure it's not my expectations that are off.
I'm running XO from source and it's up to date, currently commit 0603b.
I came in on a Sunday because there were 44 patches released for xcp-ng. I did the pool update so it updated all three hosts in my pool (Pool Master, Server A and Server B). Then I shut down all the VMs except the one VM running XO. The VM running XO is not on the Pool Master, it's on Server B. I do not have shared storage so I did not want to do a rolling pool reboot.
I rebooted Pool Master. When it came back up and the VMs it ran started to come up, I rebooted Server A. Then I switched to an XO instance (also from source and also current) that isn't running on any xcp host and I used it to shut down the XO instance running on Server B. Soon after I did that, two of the VMs on Server B booted back up. Both of those VMs are set to auto-start.
First question: Did those VMs likely start because the pool master had restarted so it was dutifully restarting the auto-start VMs that weren't running? Or maybe the pool master reboot caused Server B's toolstack to restart which auto-started the VMs?
Then I shut those auto-started VMs down again then waited for the XO instance on Server B to shut down, it was taking quite a while for some reason. Then using the separate XO instance I rebooted Server B.
After Server B booted up, none of the three VMs with auto-start set started automatically. I thought maybe there was a delay for some reason so I waited about 5 minutes but those VMs didn't start.
Second question: Is that expected behavior because the pool master is really in charge of restarting VMs or was it because I had manually shut them down before the reboot so they were trying to maintain that same state?
In retrospect I think I should have used the non-xcp-hosted instance of XO to do all of this stuff so I could have shut down all the VMs then performed a pool reboot without it trying to manually migrate a VM between machines. The all three hosts would have rebooted at the same time and maybe that would have been better.
I have a fairly small infrastructure so I manage that kind of stuff manually and a migration for the remaining VM would have taken a long time. I think this is only my second or third time to patch xcp so I'm still learning the best way to handle it.
-
RE: VDIs attached to control domain can't be forgotten because they are attached to the control domain
@rtjdamen I have become suspicious that my backups might not be as messed up as I thought. Yesterday I noticed that my backups were again in
started
state long after they would normally be completed. Because some of those backups go to the UNRAID server I mentioned, I decided to do some digging. That UNRAID server does have incoming network activity so I became suspicious that the backups are working just very slow. I checked it again this morning and one of the backups completed after 16 hours, the other one completed after 21 hours. In this case I think the issue is that UNRAID is using the 1Gb adapter instead of the 10Gb adapter.In the future I'm going to be more careful about deciding that a backup is stuck. I'd like to figure out if there's a way to get more insight into what is happening in the backup, like an ongoing percentage complete and a data transfer speed or total. Would be nice not to have to look at the receiving side for traffic and assume that's the backup, plus some of my backup targets aren't as easy to tell the incoming traffic.
Now I have to dare rebooting the UNRAID server again to see if I can get it to use the right network connection. It must have gotten out of whack when I reset the BIOS and I need to get it back in whack.
-
RE: VDIs attached to control domain can't be forgotten because they are attached to the control domain
@rtjdamen Good to know, thank you. I'm not likely to try to do that without fully understanding what I'm doing. I will just continue to reboot the host if this happens again. It did happen again a week or two ago. My infrastructure isn't so complex that it's impossible to reboot the host, it's just annoying because I have to stay late so nobody is using the servers and I've had servers that failed to boot up after a restart that should have been trivial so I'm always a bit nervous. That was years ago and not when running XCP-ng but it left an emotional scar. I just had the same thing happen with my UNRAID server last Friday, had to clear the BIOS settings to get it to boot again.
-
RE: VDIs attached to control domain can't be forgotten because they are attached to the control domain
@andrewperry Sorry for the delay in responding; I think you posted while I was on vacation. I ended up rebooting the host and have not had the problem return since. Uh oh, I hope I didn't just jinx myself.
-
RE: NVIDIA Tesla M40 for AI work in Ubuntu VM - is it a good idea?
@gskger Great to know, thank you.
I actually already have Ollama installed and running with Open WebUI so I can ask it ahead of time to see what it would be like. I've installed some more code specific models that are better suited to that kind of question.
Running the case fans full blast all the time would be a non-starter so I'm glad you let me know that as I count the costs of attempting this.