Application on VM causing BSOD
-
Hey guys,
Running into what we feel is a very unique situation we have never seen with any hypervisor before.
What we have is an application that if it is installed on a VM running on XCP-ng host, it will cause the VM to BSOD with EXCEPTION_ON_INVALID_STACK every time the application is launched.
Doing the exact same setup on an identical physical hardware host running VMware with a brand-new Server 2022 VM the application installs just fine. (We are in the process of migrating from VMware to XCP-ng which is how we have these two hosts with different hypervisors at the moment)
We have done a bunch of testing and have narrowed it down to something about XCP the application does not like or something unique about how the VM appears when installed on XCP-ng compared to VMware.
The application is being installed on a Server 2022 OS.
We have tested with PV tools installed and without PV tools installed.Looking for any possible ideas on what else we could try as possible fixes or what identifying what about the VM could be presenting different.
The application I am guessing is a unique one that I would not expect others to know. It is called RemitServer. It is a software that is used for processing checks and online deposits. I think the actual software package is called RemitPlus which is owned by a banking software company called Jack Henry.
Thanks
-
For anyone who is smarter than I am here is the Stack Text from the crash dump. From what I can decipher it has something to do with the CPU debug registers
STACK_TEXT:
ffffde00fd2bc128 fffff806500b449e : 00000000000001aa 000000000009ed00 0000000000000003 ffffde00fd2bc900 : nt!KeBugCheckEx
ffffde00fd2bc130 fffff8064fee3d88 : 000000000009ed00 fffff8065003f0d2 0000000000000000 0000000000000000 : nt!RtlpGetStackLimitsEx+0x1d0cfe
ffffde00fd2bc180 fffff8064fee8294 : fffffb02e3ca2300 ffffde00fd2bce00 fffffb02e3ca2300 0000000000000000 : nt!RtlDispatchException+0x508
ffffde00fd2bc8d0 fffff8065002f442 : 0000000000000000 0000000000000000 0000000000000000 0000000000000000 : nt!KiDispatchException+0x304
ffffde00fd2bcfb0 fffff8065002f410 : fffff80650043d47 0000000000000000 0000000000000000 ffff9508e2b7b080 : nt!KxExceptionDispatchOnExceptionStack+0x12
fffffb02e3ca2118 fffff80650043d47 : 0000000000000000 0000000000000000 ffff9508e2b7b080 fffffb02e3ca1930 : nt!KiExceptionDispatchOnExceptionStackContinue
fffffb02e3ca2120 fffff8065003edf0 : 0000000000000000 0000000000000000 0000000000000000 0000000000000000 : nt!KiExceptionDispatch+0x107
fffffb02e3ca2300 fffff8065002e847 : fffff8065003f0d2 ffff9508e2b70000 0000000000000006 000000000009fda0 : nt!KiGeneralProtectionFault+0x330
fffffb02e3ca2498 fffff8065003f0d2 : ffff9508e2b70000 0000000000000006 000000000009fda0 ffff950800000000 : nt!KiSaveDebugRegisterState+0xc7
fffffb02e3ca24a0 00000000771416e5 : 0000000000000000 0000000000000000 0000000000000000 0000000000000000 : nt!KiPageFault+0x2d2
000000000009ed00 0000000000000000 : 0000000000000000 0000000000000000 0000000000000000 0000000000000000 : 0x771416e5 -
@tsukraw disable VIRIDIAN in advanced tab of the VM ? and put network card in E1000 (I think this one is taken into account only if PV Tools are not present)
restart VM and relaunch app ?
you could also try to prevent debug registers to happen, in the VM :
bcdedit /enumto check current config
bcdedit /debug off shutdown /r /t 0to disable debug
-
-
@tsukraw if it's working on VMWARE, and same OS/same app, there is some cpu flag that is not presented in xen/xcp to the app that provoke the crash I guess
above my level of competency to know for sure how to mask CPU flags in a VM params
does bcdedit /enum tells auto for hypervisorlaunchtype ?
if yes, try to
bcdedit /set hypervisorlaunchtype off shutdown /r /t 0summoning some pro support guru ... @danp ?
-
After the VM crashes check for output from "xl dmesg" on the hypervisor via ssh. It may provide some information on why the VM crashed.
I ran into an issue with Blue Iris crashing on recent Intel CPUs and the fix was to relax MSR enforcement on the VM by running:
xe vm-param-add uuid=VM_UUID param-name=platform msr-relaxed=true
However this was after determining this was the issue via xl dmesg.
-
I tried bcdedit /set hypervisorlaunchtype off
This did not make any difference.Attached is the output form the "xl dmesg"
Not 100% sure what I would expect to see in here.
One thing i thought was odd and maybe it isnt but the fact is asy VIRIDIAN even though i turned that off under the advanced settings for the VM in question.
xl dmesg.txt -
Hot damn!!
@flakpyro
For kicks and giggles I tried what you had sentxe vm-param-add uuid=VM_UUID param-name=platform msr-relaxed=trueAnd sure enough it worked!!
Not exactly sure the technical details on what setting msr-relaxed=true does but hey if it works it works
-
@tsukraw Another thing i remember from my time troubleshooting blue iris was capturing a crash dump using Xentrace:
xentrace -D -e 0x0008f000 xentrace.dmp
From there i was able to determine the MSR related issue. Not at all saying thats the issue you are having but it may shed some light or be useful for those more knowledgeable with Xen than myself.
Hello! It looks like you're interested in this conversation, but you don't have an account yet.
Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.
With your input, this post could be even better 💗
Register Login