Windows Server 2025 on XCP-ng
-
@TS79 said in Windows Server 2025 on XCP-ng:
Also, sadly, conhost process issues have been a sporadic problem in Windows for decades...
What hasn't ??????
-
I have now found time to run a test VM on another hypervisor (Hyper-V).
It is also freshly installed and an ADDS role has been set up on it.
My observations show that the VM has now been running for 3 days and so far no hanging conhost processes have occurred.
I will continue to monitor this for a while. -
I'll have to try installing this on my lab which is XCP-NG 8.3.0 and see what I see (downloading now at very slow speeds).
Did you try the latest management tools and drivers from https://www.xenserver.com/downloads
This is version 9.3.3 and latest drivers.
-
-
I will give you a brief interim report.
Nothing has changed so far. After about 6 days I have almost 600 conhost processes again.
I have currently activated weekly reboots as a workaround.
On my Hyper-V test VM, this has still not occurred after 16 days. -
@Chemikant784 I haven't had time to check this, I don't even remember where I put the 2025 eval, but I'll get to it soon. If I put the eval on my VMware system, then I'll likely try it sooner and remember to put it on my XCP-NG system too.
-
I have gained new insights. First of all, I have to add that I have found another symptom.
Taskscheduler tasks are not executed correctly.
The tasks are started, but do not execute the action and then run into the task timeout.
This could be related to conhost processes.
After the fresh installation of the Citrix tools, these problems do not occur once.
At this point the Citrix Management Agent Service starts as intended.
But at the next reboot the problem seems to start.
I also noticed that after the reboot the Citrix Tools Management Agent Service gets stuck in the “Starting” state.
As soon as this process hangs there, the Conhost problems also start.
If I then set the Management Agent to “Disabled” so that it is not started when booting, the Schedules tasks work again.
Interestingly, no more Conhost processes are generated.
It seems that this hanging XenTools management service is blocking the task scheduler and its functions of other services.
This is why, for example, it is no longer possible to install an MSI package without deactivating the Management Agent Service.
I hope I have expressed myself reasonably clearlyEDIT:
As said in one of my first posts, this behaviour only applies to Server 2025 with ADDC Role installed on it and promoted to DC.
Memberserver or standalone servers does not show this behaviour. -
Hmmm... That interesting that it has some kind of tie in with the management agent and ADDS. I still haven't even thought about this, but I like to stay on top of the server versions so I can figure out what I need for when I need it.
Even Win11 24h2 has some oddities, and I'm running it on metal, this is the LTSC version, but shouldn't matter. I turned off hyperthreading on one machine and it was not happy compared to HT left on. Might have to do with being an older HP Elitedesk 800 G6 (intel 10500).
Since Server 2025 is based on 24h2, I might have to load up the LTSC eval or Enterprise eval as well as Server 2025 eval when I get some time.
-
Hello Greg_E
I would be interested to know if you have had the opportunity to take a look at the problem I described? Were you able to reproduce the behavior?
The server has been released in the meantime (some may even have been accidentally updated to 2025 ) so the problem should soon become more noticeable among early adopters.
-
I haven't, but I just grabbed the latest eval version which seems to be refreshed for release. I need to set up an AD with DHCP and DNS in my lab to move forward on some VMware learning, so I'll get to this soon. I just finished harvesting the vmware products from my VMUG Advantage account since Broadcom is doing their thing and killing these lab licenses (just announced a few day ago)!!! Broadcom leaving no stone unturned in their quest to really make people hate them!
Any gotchas I should look out for aside from the issue you are having? I'll run through the GUI role/feature installer and of course will install the OS with desktop.
-
I created a VM using the RTM release ISO and installed the latest Citrix tools (9.4), i've left this VM running for 5 days now and i don't see any runaway conhost.exe processes, granted the VM is just sitting there doing nothing.
-
Thanks, I didn't know there was a 9.4 yet, just grabbed that too.
-
Going to take a bit longer, need to get secure boot keys installed, reading through the directions now. This will be the first time I'm testing something that need vTPM so should be an adventure!
[edit] Wow... VERY slow on my old hardware with secure boot and vTPM active. This is running on an HP DL360p Gen8 which doesn't support TPM2 modules and doesn't support UEFI natively, not sure if that stuff passes through. It's brutally slow and not from a storage point of view either.
Going to throw another few processor cores at it once the updates finish installing and I can shut it down. Hope that makes it work better.
This is with the 9.3.3 agent and drivers to the same NFS storage that I've been using for years, running on a 10gbe connection. Reached around 3gbps during install which is funny because that seemed really slow too.
I have never seen my processor running this high.
-
It is still slow, everything lags, even the command prompt to set w32tm to use my gps ntp source and set it reliable was laggy. I think this must be a combination of my old hardware that doesn't support some of the new features, vTPM, vSecureboot, and vUEFI, mostly I think it's the old hardware I'm running this on. Using RDP there is less lag than the XCP-NG Console from XO, so maybe not a huge issue after all.
I just set up ADDS, DHCP, and DNS. After the reboot I see 5 conhost and will monitor from here. If I see it grow, I'm going to install the 9.4 agent and drivers to see if it fixes anything.
Nice to see a new Functional Level added, been a long time since the 2016 version.
-
In the few minutes between the last post and now, I was up to 8 of these and growing. XO (XCP) also didn't detect the management agent, no RAM use was showing. I'm wondering if this is part of the issue. Rebooted to see if the management agent gets straightened out and going to let it sit for a couple hours and check after dinner (almost time to go home), it will probably be 4 or 5 hours of run time before I get back to this tonight.
[edit] still no management agent, I think this might be part of the problem. It was working "fine" before installing ADDS, DHCP, and DNS. Going to reinstall 9.3.3 and see what happens later tonight or tomorrow.
-
OK, I've reached the end of my testing, it is still broke! There is something wrong with the Management Agent after you install and configure AD DS. It is in a constant state of "starting" and I'm pretty sure it is generating all the console host services. I've now tried both the 9.3.3 and the 9.4 versions, the 9.4 was downloaded about half an hour ago, can't much more fresh than that.
So where do we go? We don't run Server 2025 until the Management Agent is updated to work with it. The problem I have is that I can't think of how we alert them to the problem. Point someone to this thread maybe?
For now, I'm going back to 2022 because this isn't really worth messing with until things work. Kind of a show stopper for me. If you disable that stuck service and find a way to stop it, then maybe for messing around. I had to restart each time I disabled that service. And once disabled, all the repeating console hosts stopped in their tracks.
All of this testing was done on the latest version of XCP-NG 8.3 with either NFS or SMB for my SR and going over a 10gbe network. Truenas 24.10 is the storage host, just to be complete.
-
Hi Greg_E
Yea, that are exactly the same findings as mine
I think to disable the Service could be a workaround for the moment. 9.4.0 did not make a difference compared to 9.3.0.
We will start our new project with 2025 with workaround in place and monitor the situation when a new version become available where the issue may have been fixed.
I'm glad for any further comment on this topic when new informations become available from any the community members or the devs
Have a nice evening!
-
I'm trying to make this happen on my test VM. Are you saying your need to promote the VM to a domain controller to kick off this bug? I created my VM using the Server 2022 template which did not setup a vTPM or enable secure boot. I'm wondering if that's at all related?
-
Yes, you need to promote this to at least AD DS and go through the setup phases of this task. It was fine until I finished the AD set up and rebooted. I'm thinking there is a port that the new 2025 functional level uses that may be conflicting with the management agent. I didn't go too much farther because I need to move on with some other things in my lab.
If you disable that service, will the VM still be able to live migrate from one host to another during things like rolling pool reboot or rolling pool upgrade? Again, didn't have time to test right now, but have had issues in the past from this. Wish I remembered to try this while it was still built.
There may be one other way around this that might be worth testing... Build a Server 2022, set up AD DS and make sure everything is working. Then do an inplace upgrade to server 2025. This will keep the functional level at 2016 and check to see if everything is working. Then upgrade the functional level to 2025 and see what happens. Depending on where my other tests go, I might give this a try because I'm more likely to do an inplace upgrade on my production machines than to do a fresh install and migrate FSMO roles. But not right now.
-
@Greg_E
For the heck of it, I set up a new VM with 2025 and made it a domain controller complete with AD DS. It seems to work fine but, like yours, there are thirty something conhost processes running.