@Forza Sorry, you were correct, I just mixed in another new issue. NFS is currently used only for backups. All my SRs are in local storage. It just happened that I now have backups failing not just because of the NFS issue but because of the VDI issue but I think it's a side-effect of the NFS problem causing the backup to get interrupted so now the VDI is stuck attached to dom0. I should have made that more clear or never mentioned the VDI issue at all.
Best posts made by CodeMercenary
-
RE: All NFS remotes started to timeout during backup but worked fine a few days ago
-
RE: All NFS remotes started to timeout during backup but worked fine a few days ago
@Forza Seems you are correct about
showmount
. On UNRAID running v4showmount
says there are no clients connected. I previously assumed that meant XO only connected during the backup. When I look at/proc/fs/nfsd/clients
I see the connections.On Synology, running v3,
showmount
does show the XO IP connected. Synology is supposed to support v4 and I have the share set to allow v4 but XO had trouble connecting that way. Synology is pretty limited in what options it lets me set for NFS shares.
Synology doesn't have
rcpdebug
available. I'll see if I can figure out how to get more logging info about NFS. -
RE: One VM backup was stuck, now backups for that VM are failing with "parent VHD is missing"
@olivierlambert Well, last night the backup completed just fine despite me taking no action.
I updated the XO to the latest commit when I got in this morning so hopefully the issue I had back in June don't come back.
-
RE: Possible to use "xo-cli vm.set blockedOperarations=<object>"?
@julien-f Thank you, that's super helpful and even easier than I thought it would be.
-
RE: Import from ESXi 6 double importing vmdk file?
@florent Yeah, it's a lot of data, thankfully my other VMs are not nearly as large. I'm still not sure why it failed when none of the virtual drives are 2TB. The largest ones are configured with a 1.82TB max so even the capacity of the drive is less than the max.
I'm moving ahead with a file level sync attempt to see if that works.
To be clear, this post was as much or more about helping you figure out what's wrong so other people don't have the same issue, than it is about making this import work for my VM. With the flood you are getting from VMware refugees, I figure I'm not the only person with large drives to import. In other words, if there's something I can do to help you figure out why it fails then I'm willing to help.
-
RE: NVIDIA Tesla M40 for AI work in Ubuntu VM - is it a good idea?
@gskger Yeah, looks like it would be too tight. Ouch, those T4s are an order of magnitude more expensive. I'm definitely not interested in going that route.
-
RE: Seeking advice on debugging unexplained change in server fan speed
@DustinB Nothing useful yet. I rebooted the servers and explored a bit in the BIOS to see if there were any settings, or to at least tweak some things to see if it would reset whatever went wrong in the reboot in mid December. While doing that I found that one of the two impacted servers was a version behind for the BIOS as well as for the iDRAC so I updated both of them. Unfortunately, that made no change to the fan speeds.
I've been out sick all of this week, so far, but I'll be looking into this more when I get back to the office. I've read about ways to manually control the fans but I'd rather not have to depend on a script running somewhere that makes those kinds of decisions, I'd much rather have iDRAC, or whatever normally controls it, handle it like it used to.
-
RE: Help: Clean shutdown of Host, now no network or VMs are detected
@olivierlambert It was DR. I was testing DR a while ago and after running it once I disabled the backup job so these backups have just been sitting on the server. I don't think I've rebooted that server since running that backup.
Latest posts made by CodeMercenary
-
RE: Seeking advice on debugging unexplained change in server fan speed
@DustinB I forgot to mention that I did look for firmware for the fans and I see nothing on Dell's downloads for the R630 that indicate that there is any fan related firmware at all. That's why I started trying to tweak the settings in the BIOS and iDRAC related to power and cooling, to see if I could get it to go back to the way it was.
-
RE: Seeking advice on debugging unexplained change in server fan speed
@DustinB Nothing useful yet. I rebooted the servers and explored a bit in the BIOS to see if there were any settings, or to at least tweak some things to see if it would reset whatever went wrong in the reboot in mid December. While doing that I found that one of the two impacted servers was a version behind for the BIOS as well as for the iDRAC so I updated both of them. Unfortunately, that made no change to the fan speeds.
I've been out sick all of this week, so far, but I'll be looking into this more when I get back to the office. I've read about ways to manually control the fans but I'd rather not have to depend on a script running somewhere that makes those kinds of decisions, I'd much rather have iDRAC, or whatever normally controls it, handle it like it used to.
-
RE: Seeking advice on debugging unexplained change in server fan speed
@DustinB I wish I had asked the question here earlier. I asked it a little while ago on ServerFault.com, figuring that was the best place for this question since it has nothing to do with XCP-ng. Nobody has answered and one person even downvoted it without saying why.
If you use ServerFault and you answer over there, I'll mark it as an answer if this works, so you can get some internet points.
https://serverfault.com/questions/1169753/what-might-cause-server-fans-to-double-in-rpm-after-a-simple-reboot -
RE: Seeking advice on debugging unexplained change in server fan speed
@DustinB Interesting, I'll see if there's fan firmware I can update. It's so strange that they were fine and a reboot make them do this. One of the systems is running the fans at full speed which makes them have a high-pitched whine, it's rather annoying, also not great for the fans I imagine.
-
Seeking advice on debugging unexplained change in server fan speed
Back in mid-December I came into the office on a weekend to test my power-outage handling via NUT. I unplugged the UPSs, monitored all the VMs getting shut down and then the servers shut down. I never allowed the UPSs to totally lose power, just kept them unplugged long enough to trigger the server shutdown.
I have two PowerEdge R630 and one R730. When I rebooted the servers, the R630s seemed louder than normal. That's typical on startup but they continued to be louder once booted. The R730 did not seem any different.
I have LibreNMS set up to monitor the servers and the graphs of fan speed confirmed my feelings of them being louder. The fan speeds have doubled on one server and increased by four times on the other but the CPU workload has not changed at all.
The other server is even more dramatic and it is the more lightly loaded of the servers.
As you can see, the fan speeds have remained high ever since the reboot.Over this last weekend we had a power outage so the servers shut down. After rebooting the fans are still running fast so it wasn't just a simple reboot needed to fix this.
LibreNMS isn't capturing CPU usage for some reason but here's the CPU usage from XO. It has not changed significantly in months.
The system board and CPU temps dropped at the same time of course, with all that extra airflow. Note, those temps are in F, not C.
Any ideas of things to look for in the BIOS, iDRAC and/or LibreNMS that might indicate why this would have changed? There were no updates of the BIOS or anything associated with that reboot in December and another reboot has not changed it back. Are there possibly BIOS settings that would tell the server to run fans full speed and maybe those settings randomly changed?
Our servers are near our offices so this significant increase in sound output annoys people. I don't mind when servers are loud because they need to be loud but doubling the noise without any reason is quite annoying.
-
RE: Why does emergencyShutdown take a lot longer than shutting down the host from the console?
@manilx For
host.stop
, what does thebypassBackupCheck
do? I thinkbypassEvacuate
is pretty clear, that must mean "don't try to migrate running VMs to another server before stopping." I feel like I would want to bypass whatever the backup check is because I've lost power and the server needs to be shut down. -
RE: Why does emergencyShutdown take a lot longer than shutting down the host from the console?
@manilx That is great to hear, thank you.
I will definitely share it with the community. Still trying to iron out some of the wrinkles. Testing requires me to let all my servers get shut down so I'm limited in how frequently I can test the solution.
This weekend was my third pull-the-plug test and it was the closest to totally working. In fact, I think this test did totally work, but due to network changes I had to shut down xcp because I've found it gets really mad if you change anything about its network while it's running. It was using the physical console to shut it down that I was shocked at how fast it shut down. That's why I posted to ask about the difference.
I think changing my script to just use
host.stop
will resolve my last concerns about the script. Having a faster shutdown for xcp might also allow me to go back to my original design when I let the more important VMs live a bit longer. I originally staged when VMs got shut down so the important once could survive a 5 or 10 minute power outage. Turns out with xcp taking 8 minutes to shutdown after the VMs were down, I had to change my script to start to close everything as soon as it was clear that this wasn't just a small power blip.When I post it though, everyone will need to recognize that I'm no bash coder. I write code but this is the only bash stuff I've done so it could be rough.
-
RE: Why does emergencyShutdown take a lot longer than shutting down the host from the console?
@olivierlambert I understood what
emergencyShutdownHost
does, I was surprised that it seems to take a long time even if all the VMs were stopped before executing it. There should be nothing to suspend but it still takes about 8 minutes for the server to finish the shutdown.I will start using
host.stop
instead ofhost.emergencyShutdownHost
for the hosts that have no running VMs. Realistically, when having NUT shut it down, I'd rather the host just issue a clean shutdown command to any running VMs. I'm not sure whathost.stop
will do if there is a running VM. If it would politely ask it to stop then that would be perfect, if it yanks the virtual power cord then I wouldn't like that.The ideal is that no host has a running VM by the time I want to shut it down but since NUT runs in a VM at the moment, one host will have that one running VM. I'll be in a bit of a race condition if I issue
host.stop
immediately followed byshutdown now
. It's virtual murder-suicide but the murderer's life depends on the murdered. Can Linux shut down before xcp kills it? Might be an interesting test. -
RE: Performing automated shutdown during a power failure using a USB-UPS with NUT - XCP-ng 8.2
@nomad What do you mean by the grand reconfiguration? Is there a new version that changes how things work?
-
RE: Why does emergencyShutdown take a lot longer than shutting down the host from the console?
@olivierlambert No, I don't use any VM as an SR in XO. Other than the local storage SRs, DVD, removable media and such, the only SRs are hosted on a Synology array or on my UNRAID server.