@R2rho Yeah that is really surprising.
I suppose it could be some kind of wider hardware incompatibility or something, but still crazy either way.
Glad you got that somewhat sorted out though.
@R2rho Yeah that is really surprising.
I suppose it could be some kind of wider hardware incompatibility or something, but still crazy either way.
Glad you got that somewhat sorted out though.
@olivierlambert Yup, I've had exactly that a few times, usually on used boards.
@R2rho if possible, however annoying, I would also take the CPU out and check for pins on the motherboard being bent with a flashlight.
@olivierlambert Yeah @R2rho I am with this, it's strange to see memtest errors at all.
May be another component causing the failures though, and not the RAM itself. Possibly the board or the mem controller on the CPU.
You don't by chance have another AM4 CPU you can swap in do you?
Yeah wish I had a better response here but this is indeed odd.
Do you by chance have a PCIe ethernet card you can swap in to use for connectivity (and just not use the X550 ports), just to test and see if the X550 is causing the crashes.
It's a longshot though if I'm honest.
@Mefosheez I would try it on that other host you have and see if you run into the same issues, just for good measure.
Are they managed by the same XO? Maybe you can just migrate it to the other host so it's the exact same VM.
@Mefosheez Hmmm ok, yeah I would check a few of the logs on XCP-ng itself. https://docs.xcp-ng.org/troubleshooting/log-files/
Also, has this host been rebooted fairly recently?
@Mefosheez Hmmm lets double check all the advanced settings then, if this doesn't do it the might be worth checking the logs.
I'll give a reference point for an Ubuntu VM I have running, this is on an AMD Threadripper 1920X for what it's worth.
CPU Mask: none
CPU Weight: Default
CPU Cap: Default
Citrix PV drivers: disabled
HA: disabled
Affinity: none
GPUs: none
NIC: Realtek RTL8139
VGA: enabled
Video RAM: 8 MiB (might be worth bumping this to 16MiB to test, IIRC I had some VMs dislike 8MiB, don't recall the specifics)
Boot Firmware: UEFI
Secure Boot: enabled (probably leave disabled for now)
Viridian: disabled
CPU Limits: 4/4
Topology: Default
Memory is 2GiB
No VUSBs and no PCIs attached. Misc is all left default.
@Mefosheez Do you have secure boot enabled on these?
I haven't had any issues with NUMA node balancing, do you need this for a specific high performance application? Generally speaking it should "just work".
There are some things you can do to optimize but I'd only go down that road if you are running into issues with very wide VMs.
Also, are we talking cross socket NUMA nodes (e.g. multiple CPUs) or just nodes within an EPYC CPU?
@R2rho Do you have a way to view the console output when this happens? Curious if you had a display attached, you may see the remnants of the crash.
And I presume you can't get ping responses from it right? The other thought is maybe it's lost network connectivity but isn't actually a fully locked up host.
There is also info here on the log files you can check.
@Mefosheez You definitely should be able to get console with UEFI, so this does sound like a configuration issue.
You aren't passing through any GPUs or anything like that right?
What OS are you trying to boot to?
I have a feeling it's hanging and isn't really something to do with XO or XCP-ng itself.
@florent Finally getting back to this post, I know it's been months, just haven't had a lot of time, sorry!
I think I am still a bit confused, but I will do some additional testing to see if I can confirm my suspicions.
My confusion is that, you can't merge the deltas into the key if the key is locked behind Object Lock, the file isn't writable so you can't do the merge operation, right?
So that being said, it sounds to me like maybe the retention of the object lock/immutability needs to be set to be less than the retention period in XOA, right?
This way the original key is not immutable and can be written to when the merge happens?
Or does XOA just "wait" until the key isn't locked and then do the merge operation?
@olivierlambert @florent Just following up on this again, I am going to be doing some more testing this week to see if I can discover any issues. Still very curious about how this should be handled though.
Been doing some testing recently and hoping to get some feedback or a better understanding of how this works.
I am using Backblaze (S3 compatible) for some backup testing, I have the bucket setup with immutable storage (Object Lock as they call it) with a 30 day retention period for testing.
I created a backup job to test with this remote with the below settings:
What I am confused about is, what happens when a merge needs to happen? If it's a locked object, you can't merge data into the full backup.
So once the 30 backup retention period is hit, how does the delta backup merge blocks into the full backup of the chain? That should be impossible if the object is write restricted.
However, I am not getting any errors when running this backup job beyond the 30 backup retention setting. Shouldn't it error out since it can't write to the full backup VHD?
Or there is maybe something I'm not understanding here?
I'm also wondering how this is managed long term, since the objects can't be deleted, XOA will (I presume) try to delete them, that will fail, and then they are just there in the bucket forever since XOA isn't aware of the immutable retention period to go cleanup later.
@FTSSupport Yeah that's really annoying, oof, sorry to hear that.
Maybe the only option is going to be getting a new physical host and then using Hyper-V on that? Hate to say it, but sounds like it could be the final result.
I did that with ESXi for this one vendor since I didn't have a choice at the time, it was lame, but the company understood the need for the expense, though it helped that we were going to need another (albeit not as powerful) host anyway.
@arc1 Think you can give it a try with a Windows VM just to see if the problem goes away (not SQL but just pinging)? Would help diagnose if it's your infrastracture somehow or an XCP-ng specific thing with just certain Linux VMs.
I so far haven't seen behavior like this though.
@FTSSupport That's an interesting way to do it instead of just having an OVA file lol.
I'm honestly a little surprised any vendors require Hyper-V too, like, if you're going to require something, why not use the industry standard that is ESXi?
And if that was the case, it would be an OVA which would be something you could natively import to most hypervisors anyway.
What an interesting situation lol.
@FTSSupport Got it! I'm not sure there is a way around hyper-v complaining on that front unfortunately.
@FTSSupport Should be easier to move back to 8.2 than Proxmox, yes. I'm still not sure it's going to work that well though, either in 8.2 or in Proxmox. I've just never had a good experience with nested virt.
I got it working well on a Hyper-V setup once, but the nested VM still had some odd issues, bad latency, and a few other things, and that was all Windows based stuff so it was kinda a best case scenario.
Good luck with the vendor, hopefully they can be convinced that it will work just fine on other hypervisors.
One vendor I worked with that required ESXi for their Windows VM finally changed their minds and worked w/ me to do some validation on XCP-ng. After we ran through a ton of testing (this was a very high bandwidth/data usage platform with strict requirements) the engineers were flabbergasted because XCP-ng performed so much better lol. I was like "you're really surprised something is faster than bloated ESXi??"
@stormi Excited to see more progress on this for sure.
Still should never be done in a production setup though so I don't think there should be any rush ha!
Would be very cool to have in lab environments though.