@Forza Sorry, you were correct, I just mixed in another new issue. NFS is currently used only for backups. All my SRs are in local storage. It just happened that I now have backups failing not just because of the NFS issue but because of the VDI issue but I think it's a side-effect of the NFS problem causing the backup to get interrupted so now the VDI is stuck attached to dom0. I should have made that more clear or never mentioned the VDI issue at all.
Best posts made by CodeMercenary
-
RE: All NFS remotes started to timeout during backup but worked fine a few days ago
-
RE: All NFS remotes started to timeout during backup but worked fine a few days ago
@Forza Seems you are correct about
showmount
. On UNRAID running v4showmount
says there are no clients connected. I previously assumed that meant XO only connected during the backup. When I look at/proc/fs/nfsd/clients
I see the connections.On Synology, running v3,
showmount
does show the XO IP connected. Synology is supposed to support v4 and I have the share set to allow v4 but XO had trouble connecting that way. Synology is pretty limited in what options it lets me set for NFS shares.
Synology doesn't have
rcpdebug
available. I'll see if I can figure out how to get more logging info about NFS. -
RE: One VM backup was stuck, now backups for that VM are failing with "parent VHD is missing"
@olivierlambert Well, last night the backup completed just fine despite me taking no action.
I updated the XO to the latest commit when I got in this morning so hopefully the issue I had back in June don't come back.
-
RE: Possible to use "xo-cli vm.set blockedOperarations=<object>"?
@julien-f Thank you, that's super helpful and even easier than I thought it would be.
-
RE: Import from ESXi 6 double importing vmdk file?
@florent Yeah, it's a lot of data, thankfully my other VMs are not nearly as large. I'm still not sure why it failed when none of the virtual drives are 2TB. The largest ones are configured with a 1.82TB max so even the capacity of the drive is less than the max.
I'm moving ahead with a file level sync attempt to see if that works.
To be clear, this post was as much or more about helping you figure out what's wrong so other people don't have the same issue, than it is about making this import work for my VM. With the flood you are getting from VMware refugees, I figure I'm not the only person with large drives to import. In other words, if there's something I can do to help you figure out why it fails then I'm willing to help.
-
RE: NVIDIA Tesla M40 for AI work in Ubuntu VM - is it a good idea?
@gskger Yeah, looks like it would be too tight. Ouch, those T4s are an order of magnitude more expensive. I'm definitely not interested in going that route.
-
RE: Seeking advice on debugging unexplained change in server fan speed
@DustinB Nothing useful yet. I rebooted the servers and explored a bit in the BIOS to see if there were any settings, or to at least tweak some things to see if it would reset whatever went wrong in the reboot in mid December. While doing that I found that one of the two impacted servers was a version behind for the BIOS as well as for the iDRAC so I updated both of them. Unfortunately, that made no change to the fan speeds.
I've been out sick all of this week, so far, but I'll be looking into this more when I get back to the office. I've read about ways to manually control the fans but I'd rather not have to depend on a script running somewhere that makes those kinds of decisions, I'd much rather have iDRAC, or whatever normally controls it, handle it like it used to.
-
RE: Help: Clean shutdown of Host, now no network or VMs are detected
@olivierlambert It was DR. I was testing DR a while ago and after running it once I disabled the backup job so these backups have just been sitting on the server. I don't think I've rebooted that server since running that backup.
Latest posts made by CodeMercenary
-
RE: Our future backup code: test it!
@flakpyro Oh thank you. I try to go through every update post but I must have missed that one.
It worked
Makes a big difference for this VM if I back up a partially filled 250GB disk vs a largely full 8,250GB set of disks when 8TB of that doesn't need to be backed up.
-
RE: Our future backup code: test it!
Will this add the ability to control which disks on the VM will be backed up? I'd love to be able to select specific disks on a VM to backup and leave others out.
Maybe configured at the VM level rather than the backup level. Flag a disk as not needing backup and then the regular backup procedure would ignore it. However, I could also see why it might be better to control it by creating a specific backup for that VM so you could have different backup schedules, some that backup those extra disks and some that don't. I have no need to ever backup the extra disks at the moment though.
-
RE: Help: Clean shutdown of Host, now no network or VMs are detected
Looking through the kern.log files I found stuff I thought might be interesting but as I scroll, I see it happening a lot so I then wonder if it's normal. I see sets of three events:
Out of memory: Kill process ###
Killed process ###
oom_reaper: reaped process ###
I wonder if it was starving for memory and having to kill off processes to survive then eventually died. I see these 90+ times in one log file and over 200 times in another. Don't know if this is just normal activity or indication of a problem.
-
RE: Help: Clean shutdown of Host, now no network or VMs are detected
@olivierlambert Having trouble finding the reboot in the log files because I don't know what to look for. I have nearly 1GB of log files from that day and unfortunately, I don't recall when the reboot happened. Is there something I can grep for in the log that would indicate the reboot and I can backtrack from there?
Tried feeding the GB of log files to Phi4 in my Ollama server but so far it has not been any help finding anything either. Well, it found how to make my office a lot louder by running the server fans at full speed for a few minutes but that wasn't helpful.
-
RE: What's the recommended way to reboot after applying XCP-ng patches?
@DustinB Yeah, that's what I will try to remember to do next time. This is still worlds easier for me than trying to keep my ESXi hosts up to date was back when I used them. So very much better than that. I very rarely updated those at all.
-
RE: Seeking advice on debugging unexplained change in server fan speed
So, a bit after I originally posted this, one of the two servers fans slowed back down and I don't know why. I only noticed it weeks later.
Then this morning we had a power outage and all the servers were shut down. When I booted them back up when power was restored, the other server was running the fans at normal speed. No idea why it went back, I didn't do anything to fix it.
After that reboot though, just a single fan in the server that originally didn't have that problem, is now running fast. That makes me wonder if the fan is failing so I'm looking to find some spares to keep around.
Any future reboots are going to make me a bit stressed wondering if the fans will speed up again.
I did install the pool patches today and that reboot didn't impact the fans, thankfully. I wish I understood what happened but if it happens again I might use this docker container to take over control of them: https://github.com/tigerblue77/Dell_iDRAC_fan_controller_Docker
-
RE: What's the recommended way to reboot after applying XCP-ng patches?
With the newest patches I decided to try the way you guys suggested. Install Pool Patches followed by Rolling Pool Reboot.
The Rolling Pool Reboot gives me an error:
CANNOT_EVACUATE_HOST(VM_REQUIRES_SR:<OpaqueRef guid list>
So, it appears that I can't use that when I'm running local storage. I tried just rebooting the master and got the same error.
Shutting down all the VMs first allowed me to use the Rolling Pool Reboot. That's still better than my prior attempts. I only have about a dozen VMs running on three servers so it's not a huge deal to shut them down individually. However, it rebooted the master then rebooted one of the other hosts, then when it tried to reboot the third host I got the
CANNOT_EVACUATE_HOST
error again because the master had restarted all the VMs on the third host.I think I'll just need to install the patches on the pool, disable all the hosts, reboot the pool then enable the hosts again. That's probably the smoothest method without having shared storage.
-
RE: Help: Clean shutdown of Host, now no network or VMs are detected
@olivierlambert I do have a /var/crash folder but it has nothing it in except a file from a year ago named
.sacrificial-space-for-logs
.The Xen logs are verbose. Any suggestions of text to grep to find what I'm looking for? Other than the obvious
error
I should search for.Currently looking in xensource.log.* for error lines to see if I can figure anything out.
-
RE: Help: Clean shutdown of Host, now no network or VMs are detected
Just want to document that this happened again, on the same host.
My XO (from source) that manages my backups, ran the backups last night. I know it was active up until at least around 5:30am but by the time I got into the office it was inaccessible by browser, ssh and ping. Other XO instances showed that it was running but the Console tab didn't give me access to its console.
A few hours later I found that other VMs on that same host had become inaccessible in the same fashion and also had no console showing in XO.
An hour or two later I found that XO showed the host as being missing from the pool, which it had not been earlier in the day. When I checked the physical console for that host, I found it had red text "<hostname> login: root (automatic login)" and did not respond to the keyboard other than putting more red text on screen of whatever I typed. I hit ctrl-alt-del and it didn't seem to do anything so I typed random things, then XCP-ng started rebooting. I'm guessing it was 60 to 120 seconds after my first ctrl-alt-delete.
When it came back up it could not find the pool master and said it had no network interfaces. I was able to solve it by doing another emergency network reset.
Would be nice if this wouldn't happen, makes me super nervous about stability. Thankful that the two times this has happened, it was on my least mission critical server. However, it's the server that handles backups so it's still stressful to have it go down. Also makes me wonder if there might be something wrong with that server's hardware.
-
RE: Common Virtualization Tasks in XCP-ng
A powershell SDK? Oh, you just made me a rather happy guy.