Hardware Health Monitoring
-
I'm looking at switching from ESXi. As far as I can tell there is no hardware health monitoring in XCP-ng/XO. I'm hoping that I'm just not finding it. How is this handled? This is a requirement for switching.
In ESXi I get notifications when a hard drive in a raid array or other fails, fan failures, power supply failures etc. I don't feel comfortable switching without these features. How is everybody handling this?
The only thing I found was coming in 8.3 for raid arrays. But I don't see anything else.
-
Hi,
What do you need exactly and on which server model? Because there's no universal answer. On the short term, we are working with 2CRSi to provide a range of servers with all the hardware details. It might come with other manufacturers
-
@hawk223
There is no "default-monitoring" for hardware, but this should not be a big problem.
If you are using any big vendor, there should be tools like storcli, which can be used for monitoring.--> ipmi-sensors is installed
--> storcli is available as rpm (and working find) -
@olivierlambert What I'm hoping for it to get an email if there is a hardware failure. Mostly on the raid array, power supplies, fans, ecc errors, temperature and other stuff would be nice as well.
I have some older Intel R2224gz models with the s1600gz board. I'll looking at getting some HP gen 9 stuff as well.
-
@KPS I have storcli running already. Needed it to setup the raid. I've been using it on esxi to manage the array when a drive needs replacing. I'll have to try out the ipmi stuff. I tried detect sensor and it didn't find anything on the ipmi but it did find ipmi.
While I can manually check with storcli is there a way to automate the check and email issues?
-
@hawk223 said in Hardware Health Monitoring:
While I can manually check with storcli is there a way to automate the check and email issues?
I am running a script from my monitoring software (PRTG) every 5 minutes, that is checking through storcli
-
We'll work on better hardware report/integration in XO, that's one of the target we have
-
I have a couple of Dell PowerEdge R620 (3 of them) one of them was obtained during December 2023 to become the storage for the other 2.
As for them its upgrade potential is massive (for my needs), up to 768GB of RAM, up to 96 TB storage and also is dual socket supporting with 12 Core CPU.
Has a max CPU speed on 12 Core of 2.7Ghz.
Any way would like to see better hardware health monitoring, could be aided by interfacing in both directions with Dell's iDRAC.
@hawk223 In the meantime do you have permission to configure yourself as an email alert recipient on the out of band management controller (BMC) on those servers?
The reason as it will mainly send hardware health alerts, though if you don't and the one who does isn't able or willing to add yourself. Then you may need to wait for this addition.
-
@john-c I have access to configure myself as an email recipient however, I don’t recall the reason I never did that. There is some complication with doing so. If I recall it needs a hack to configure an email server. I will investigate.
I have also been looking into something like nagios or zabbix. But then that’s something extra required. I would prefer a simple solution that doesn’t require a lot of effort and hacking.
-
@hawk223 said in Hardware Health Monitoring:
@john-c I have access to configure myself as an email recipient however, I don’t recall the reason I never did that. There is some complication with doing so. If I recall it needs a hack to configure an email server. I will investigate.
I have also been looking into something like nagios or zabbix. But then that’s something extra required. I would prefer a simple solution that doesn’t require a lot of effort and hacking.
Most modern BMC include a web interface for access, so you can navigate to it and log in via its form. From there it's fairly simple to then setup, though the account for sending the email maybe a challenge as some implementations require the sending account to not have a password!
The exact host name depends on the configuration of the IP address and/or FQDN of the BMC in your network system.