christianreiss

christianreiss

Accidently solved by myself by loading ipmi_watchdog module. Sigh.

christianreiss

Accidently solved by myself by loading ipmi_watchdog module. Sigh.

christianreiss

Hey folks,

I have a SuperMicro X11SCL-F Server. I enabled the WatchDog funtionality in the BIOS and lo and behold it reboots every 5 minutes (Duh).

I installed watchdog service and then I noticed there is no /dev/watchdog device?

Do I need to add some grub command flags or anything specific for the watchdog device to pop up?

Im sure its a tiny issue only, but I am stuck. Help.

Thanks in advance,
Christian

christianreiss

This version is not bundled with any support nor updates. Use it with caution. Why do I see this message?

When you click the link, you get a lengthy explanation on why enterprise users should pay. And how they can not be differentiated from homelab users. I get it.

But it does not answer the question why do I see this message? And it's not only there. It's also an annoying pop up. Then warning signs next to all servers. And also a "No Support" in the left hand menu. I could not push in any more annoyances If i tried.

Please understand me: I do get it. But consider these points:

Customer users who have their own IT and can self-host are not scared of this. They either pay because they need the support or they can fix their issues themselves.
It's your open source product and you need to make money, So you need to raise awareness. But consider all the open source projects that you rely on would show banners. From the Linux Kernel to the Apache Webserver (or nginx), to all the GNU/Linux stuff. They don't. And yet survive.

Even if you are yet still convinced you do need this because this give your the customers to make that more revenue; okay. But I disagree with

"However, there's no way to discriminate if this "from the sources" version is used by a company or an individual.". No? There is no technical way to do this? You supply a whole supervisor with an entire Web-Gui and you find no way of doing this? What about a "I have ready and understood this. Don't show this messages anymore." checkbox? What about, in docker, setting an ENV to homeuse=true. Or give our Home Server licenses. This is just of the top of my hat.
"So as a home user, just ignore it." - It is too annoying to ignore. Hence this lengthy post. This many warnings all around the place is just plain annoying.

Don't get me wrong. I Love XCP-NG. I have been using XenServer from 5.x when you needed a free license. I went with you when you forked and I am playing missionary everywhere. XCP-NG is all around me, my friends and some companies (that I do not run).

Thank you for all.
-Chris.

christianreiss

This issue is closed but unresolved.
I moved to a different hardware.

christianreiss

No, it hangs.

Once it stops, nothing works. To be able to see the other ttys you have to reboot the host, go to the desired tty, then run the tests again. If it crashes youre stuck there.

christianreiss

@olivierlambert Thanks for replying. It is totally dead. I can see the xsconsole, but it's dead. No extra lines or printouts. I even tried switching to tty2(3?) with system message, which remained empty.

christianreiss

Hey folks,

I am at a loss here. For some years I was happily running XCP on my Intel Nuc 7 with 32GB of ram and an SSD. For some time now, every once in a while it simply stopped working. So I went full out in analyze mode.

Here are my findings:

I can see no entries in any of the logs (//var/log/*). It just stops and then bootup messages occur.
There are no entries in the event log of the Bios (no thermal shutdown or the likes)

So I vacated all the vms (16gb used, nearly no cpu is in use) to a full sized rack server (running xcp, too). So I am kinda up and running, but that hardware cant stay here forever.

So I debugged the little NUC as best as I could:

I swapped the ram for two new modules.
I ran memtest for a day without an issue.
I installed Almalinux 9 and let cpuburn run for half a day without an issue.
I let fio really work the ssd. (Read only, tho...)

I was unable to break the system, so its not memory or cooling.

So I installed a clean XCP 8.2 and let it update to current.

I ran a single vm (1 cpu, 8gb ram) and let memtest run in it for 10 minutes. No issues.
I cloned the vm and let it run in paralell. 10 minutes, no issues.
Same for vm3.

I was able to let this run for hours. Now came vm 4. I reduced the memory size to accomodate for the dom0 memory and with ~5gb ram I let it run, too.

It near-instantly crashed after 2 minutes.

I restarted the Host and ran all four vms again, started in parallel, dead in a few moments.

At this point I wiped the host clean again, and installed a fresh 8.0 (Downgrade from 8.2.1) and after patching, let it run again. Same results.

Currently there is a single VM utilizing ALL the ram (minus the dom0 memory) and all the cpus. No issue found, yet (17 minutes in).

So to sum um:

It's probably not a memory issue (tested & swapped).
It's probably not a cpu or heat issue.
Stress test outside the scope of XCP work, even in similar OS environments.
Allocating all CPUs and Memory to Single memtest VM seems to work.
Splitting the above up to 4 vms let it crash.
No logs, just hanging.

Oh, I updated the BIOS to Version 88 (Thats from march'ish this year).

Anyone could help me debug this further or has an idea?

christianreiss

@christianreiss

Best posts made by christianreiss

Latest posts made by christianreiss