@Forza Sorry, you were correct, I just mixed in another new issue. NFS is currently used only for backups. All my SRs are in local storage. It just happened that I now have backups failing not just because of the NFS issue but because of the VDI issue but I think it's a side-effect of the NFS problem causing the backup to get interrupted so now the VDI is stuck attached to dom0. I should have made that more clear or never mentioned the VDI issue at all.
Best posts made by CodeMercenary
-
RE: All NFS remotes started to timeout during backup but worked fine a few days ago
-
RE: All NFS remotes started to timeout during backup but worked fine a few days ago
@Forza Seems you are correct about
showmount
. On UNRAID running v4showmount
says there are no clients connected. I previously assumed that meant XO only connected during the backup. When I look at/proc/fs/nfsd/clients
I see the connections.On Synology, running v3,
showmount
does show the XO IP connected. Synology is supposed to support v4 and I have the share set to allow v4 but XO had trouble connecting that way. Synology is pretty limited in what options it lets me set for NFS shares.
Synology doesn't have
rcpdebug
available. I'll see if I can figure out how to get more logging info about NFS. -
RE: One VM backup was stuck, now backups for that VM are failing with "parent VHD is missing"
@olivierlambert Well, last night the backup completed just fine despite me taking no action.
I updated the XO to the latest commit when I got in this morning so hopefully the issue I had back in June don't come back.
-
RE: Possible to use "xo-cli vm.set blockedOperarations=<object>"?
@julien-f Thank you, that's super helpful and even easier than I thought it would be.
-
RE: Import from ESXi 6 double importing vmdk file?
@florent Yeah, it's a lot of data, thankfully my other VMs are not nearly as large. I'm still not sure why it failed when none of the virtual drives are 2TB. The largest ones are configured with a 1.82TB max so even the capacity of the drive is less than the max.
I'm moving ahead with a file level sync attempt to see if that works.
To be clear, this post was as much or more about helping you figure out what's wrong so other people don't have the same issue, than it is about making this import work for my VM. With the flood you are getting from VMware refugees, I figure I'm not the only person with large drives to import. In other words, if there's something I can do to help you figure out why it fails then I'm willing to help.
-
RE: NVIDIA Tesla M40 for AI work in Ubuntu VM - is it a good idea?
@gskger Yeah, looks like it would be too tight. Ouch, those T4s are an order of magnitude more expensive. I'm definitely not interested in going that route.
-
RE: Seeking advice on debugging unexplained change in server fan speed
@DustinB Nothing useful yet. I rebooted the servers and explored a bit in the BIOS to see if there were any settings, or to at least tweak some things to see if it would reset whatever went wrong in the reboot in mid December. While doing that I found that one of the two impacted servers was a version behind for the BIOS as well as for the iDRAC so I updated both of them. Unfortunately, that made no change to the fan speeds.
I've been out sick all of this week, so far, but I'll be looking into this more when I get back to the office. I've read about ways to manually control the fans but I'd rather not have to depend on a script running somewhere that makes those kinds of decisions, I'd much rather have iDRAC, or whatever normally controls it, handle it like it used to.
-
RE: Help: Clean shutdown of Host, now no network or VMs are detected
@olivierlambert It was DR. I was testing DR a while ago and after running it once I disabled the backup job so these backups have just been sitting on the server. I don't think I've rebooted that server since running that backup.
Latest posts made by CodeMercenary
-
RE: Help: Clean shutdown of Host, now no network or VMs are detected
Looking through the kern.log files I found stuff I thought might be interesting but as I scroll, I see it happening a lot so I then wonder if it's normal. I see sets of three events:
Out of memory: Kill process ###
Killed process ###
oom_reaper: reaped process ###
I wonder if it was starving for memory and having to kill off processes to survive then eventually died. I see these 90+ times in one log file and over 200 times in another. Don't know if this is just normal activity or indication of a problem.
-
RE: Help: Clean shutdown of Host, now no network or VMs are detected
@olivierlambert Having trouble finding the reboot in the log files because I don't know what to look for. I have nearly 1GB of log files from that day and unfortunately, I don't recall when the reboot happened. Is there something I can grep for in the log that would indicate the reboot and I can backtrack from there?
Tried feeding the GB of log files to Phi4 in my Ollama server but so far it has not been any help finding anything either. Well, it found how to make my office a lot louder by running the server fans at full speed for a few minutes but that wasn't helpful.
-
RE: What's the recommended way to reboot after applying XCP-ng patches?
@DustinB Yeah, that's what I will try to remember to do next time. This is still worlds easier for me than trying to keep my ESXi hosts up to date was back when I used them. So very much better than that. I very rarely updated those at all.
-
RE: Seeking advice on debugging unexplained change in server fan speed
So, a bit after I originally posted this, one of the two servers fans slowed back down and I don't know why. I only noticed it weeks later.
Then this morning we had a power outage and all the servers were shut down. When I booted them back up when power was restored, the other server was running the fans at normal speed. No idea why it went back, I didn't do anything to fix it.
After that reboot though, just a single fan in the server that originally didn't have that problem, is now running fast. That makes me wonder if the fan is failing so I'm looking to find some spares to keep around.
Any future reboots are going to make me a bit stressed wondering if the fans will speed up again.
I did install the pool patches today and that reboot didn't impact the fans, thankfully. I wish I understood what happened but if it happens again I might use this docker container to take over control of them: https://github.com/tigerblue77/Dell_iDRAC_fan_controller_Docker
-
RE: What's the recommended way to reboot after applying XCP-ng patches?
With the newest patches I decided to try the way you guys suggested. Install Pool Patches followed by Rolling Pool Reboot.
The Rolling Pool Reboot gives me an error:
CANNOT_EVACUATE_HOST(VM_REQUIRES_SR:<OpaqueRef guid list>
So, it appears that I can't use that when I'm running local storage. I tried just rebooting the master and got the same error.
Shutting down all the VMs first allowed me to use the Rolling Pool Reboot. That's still better than my prior attempts. I only have about a dozen VMs running on three servers so it's not a huge deal to shut them down individually. However, it rebooted the master then rebooted one of the other hosts, then when it tried to reboot the third host I got the
CANNOT_EVACUATE_HOST
error again because the master had restarted all the VMs on the third host.I think I'll just need to install the patches on the pool, disable all the hosts, reboot the pool then enable the hosts again. That's probably the smoothest method without having shared storage.
-
RE: Help: Clean shutdown of Host, now no network or VMs are detected
@olivierlambert I do have a /var/crash folder but it has nothing it in except a file from a year ago named
.sacrificial-space-for-logs
.The Xen logs are verbose. Any suggestions of text to grep to find what I'm looking for? Other than the obvious
error
I should search for.Currently looking in xensource.log.* for error lines to see if I can figure anything out.
-
RE: Help: Clean shutdown of Host, now no network or VMs are detected
Just want to document that this happened again, on the same host.
My XO (from source) that manages my backups, ran the backups last night. I know it was active up until at least around 5:30am but by the time I got into the office it was inaccessible by browser, ssh and ping. Other XO instances showed that it was running but the Console tab didn't give me access to its console.
A few hours later I found that other VMs on that same host had become inaccessible in the same fashion and also had no console showing in XO.
An hour or two later I found that XO showed the host as being missing from the pool, which it had not been earlier in the day. When I checked the physical console for that host, I found it had red text "<hostname> login: root (automatic login)" and did not respond to the keyboard other than putting more red text on screen of whatever I typed. I hit ctrl-alt-del and it didn't seem to do anything so I typed random things, then XCP-ng started rebooting. I'm guessing it was 60 to 120 seconds after my first ctrl-alt-delete.
When it came back up it could not find the pool master and said it had no network interfaces. I was able to solve it by doing another emergency network reset.
Would be nice if this wouldn't happen, makes me super nervous about stability. Thankful that the two times this has happened, it was on my least mission critical server. However, it's the server that handles backups so it's still stressful to have it go down. Also makes me wonder if there might be something wrong with that server's hardware.
-
RE: Common Virtualization Tasks in XCP-ng
A powershell SDK? Oh, you just made me a rather happy guy.
-
Long delays at 46% when creating or starting a new VM
I've been trying to create a new Ubuntu 24.04 VM today and it's behaving very weird. All my hosts have local storage. I've tried creating VMs on two of my hosts, one with a 500GB drive (thin) on SSD and one with a 50GB drive (thin) on HDD. I attached the ubuntu boot disk to the VM.
When I tell XO (from source) to create the VM, the creation process hangs at 46% for 15 or 20 minutes. It normally completes within a matter of seconds.
They were both set to autostart but neither started. Once the creation task finally finished, I tried to start them. Again, they hung at 46% for 15 or 20 minutes and then didn't start.
I realized my XO was last updated on Monday and there were more updates so I updated it and it did not change the behavior.
The only log entry that seems like it might be related is:
vm.start { "id": "4921a56a-c099-92d9-4ee4-939d8bc1adcf", "bypassMacAddressesCheck": false, "force": false } { "name": "HeadersTimeoutError", "code": "UND_ERR_HEADERS_TIMEOUT", "message": "Headers Timeout Error", "call": { "duration": 300806, "method": "VM.start", "params": [ "* session id *", "OpaqueRef:b66b9e3e-ac91-47d9-81ce-383de2ed326f", false, false ] }, "stack": "HeadersTimeoutError: Headers Timeout Error at FastTimer.onParserTimeout [as _onTimeout] (/opt/xo/xo-builds/xen-orchestra-202502270017/node_modules/undici/lib/dispatcher/client-h1.js:642:28) at Timeout.onTick [as _onTimeout] (/opt/xo/xo-builds/xen-orchestra-202502270017/node_modules/undici/lib/util/timers.js:162:13) at listOnTimeout (node:internal/timers:581:17) at processTimers (node:internal/timers:519:7)" }
There was also this but I didn't cancel anything around when this happened:
vm.start { "id": "4921a56a-c099-92d9-4ee4-939d8bc1adcf", "bypassMacAddressesCheck": false, "force": false, "host": "f23f627d-007e-499e-97cf-678574566429" } { "message": "task canceled" }
How do I go about investigating this?
Update: I figured it out. A few weeks ago I changed the IP addresses in XO for backups and SMB ISO storage so they would use the 10Gbe network adapter in that server. I've had no trouble with backups since then, however, it appears the SMB ISO storage is no longer accessible from XO/XCP. I added a new NFS ISO storage to the same location and it works. Now the VMs will boot.
I think I'll leave this here in case someone else runs into the same issue. If you have an ISO image attached to the VM and the server can't get to it, it will get stuck during boot and will kick out that timeout error without indicating that it timed out on connecting to the ISO storage.
-
RE: What's the recommended way to reboot after applying XCP-ng patches?
@kagbasi-ngc Oh, that's quite interesting. I have less than a dozen VMs that I consider production so it's not a big deal to start them manually. I've thought about putting the host into maintenance mode while doing this but my main XO instance is on one of those hosts. I do have an XO instance running in a VM on my UNRAID host just so I have a way to recover if one of the XCP hosted instances don't run.
I'll have to try the smart reboot, that's a great tip, thank you.
I've thought about offering to help our church with computer stuff but there's already a guy that handles it and I don't know if I want to go down that rabbit hole. He seems to do a good job but I don't think he's as technical as I am and I doubt he has server experience.