Netdata package is now available in XCP-ng
-
No, I don't know where the problem resides exactly.
-
Hello!
We have recently added some ml capabilities to netdata from netdata agent v1.32.0 onwards. Basically, its fairly lightweight unsupervised anomaly detection, so in addition to collecting raw metrics each second for about 1% cpu of one core (typically) and no extra storage the agent also can produce an "anomaly bit" that's 1 if recent data "looks" anomalous or 0 for normal.
Some of our users have asked about if this ML capabilities would work on netdata within XCP-ng.
I'm trying to figure out if/how i might go about answering that. I work on the ML part of all this so am a bit (very) naive on the packaging side of things.
I'm wondering if anyone would be able to help me try figure out if the ML features of the netdata agent would be available via XCP-ng?
-
Hello @andrewm4894
Sure, what do you need to know? Let me add @stormi in the conversation
-
@andrewm4894 I am really looking forward to seeing how this update goes because using Netdata inside of XCP-NG is something that I want to do an updated video on to discuss performance and tesing.
-
@olivierlambert basically that if I was using XCP-ng and I set
enabled = yes
in the[ml]
section of thenetdata.conf
if it would just work as normal.Once a user makes that config change they would get the new "Anomaly Detection" menu
What this means then is that within each chart there will also be the new
anomaly-bit
corresponding to each raw metric value.For example if you add
options=anomaly-bit
then you get the anomaly bit instead of the raw value, for example (will probably be all 0 since all normal on that server) https://london.my-netdata.io/api/v1/data?chart=system.cpu&options=anomaly-bitFor users who then claim their nodes into netdata cloud this powers the anomaly advisor feature.
Here is a quick example of me doing a k6 load test against our demo servers.
So i guess (apologies for rambling a bit) my questions are:
- What netdata agent version is available in XCP-ng by default?
- Do you know if there are any custom build flags passed to it as part of the packaging for XCP-ng? e.g sometimes package maintainers might pass things like
--disable-cloud
or--disable-ml
for various reasons so im just trying to double check if anything there that might preclude things.
I think its mainly down to the netdata version that's included by default and if any custom flags passed as part of that packaging (if that makes sense).
-
The netdata version is rather old because XCP-ng 8.2 is a LTS and netdata sadly doesn't have long term maintenance branches that would guarantee an absence of regressions. And I've had serious issues with netdata on XCP-ng in the past (disk full due to a bug that I reported at the time, fixed since but now I completely disable disk writes and let only netdata use a limited RAM database, just in case), so I'm not eager to update often as long as it is working and no unpatchable major security issues are found.
And given the pace at which netdata evolves, each update is a significant amount of packaging work for me
I will provide an update at some point, in the testing repository at least, but I don't know when exactly.
-
Is there a way to reduce both the risk and the packaging burden? Asking @andrewm4894
edit: thing is, the dom0 is a very sensitive piece in XCP-ng, so if we do something bad in there, this can affect all VMs.
We are working on the longer run to be able to get metrics out of the dom0 (we can imagine a dedicated domain or a central Open Metrics compatible database). I wonder if Netdata is capable to work on top of an Open Metrics by the way
-
Additional notes:
- my last update attempt ended in
curl: (6) Could not resolve host: github.com; Unknown error
because the spec file started defaulting to downloading stuff from github when a distro must build everything using only the RPM sources and our build environment has no internet access on purpose. So this means I must re-bundle sources that were unbundled upstream. I had already done such work previously for other components that are downloaded during the RPM build. - EPEL now has a netdata RPM that seems to be updated regularly (currently at 1.34.1) and which I could probably light-fork. It will probably be more suited to the needs of a distro maintainer than the upstream spec file which is tailored for the needs of the netdata project itself (you often see this dichotomy between upstream developers and downstream packagers, as in the example I gave earlier of a spec file that does what no distro would allow: download stuff dynamically at build time).
- my last update attempt ended in
-
@olivierlambert @stormi i see i see. In terms of the ML stuff it's fairly stable (admittedly i'm being subjective here - we have been dogfooding it for the last few months and had it under beta launch since March) and is disabled by default. At some stage we will have it enabled by default but that's much further down the road.
Of course it is still evolving at the same time - for example here is a big discussion about how we can extend the training window beyond the default 4 hours etc which obviously will involve more work and changes.
So in terms of where we are now the base ML functionality that's there I consider fairly stable and would be enough to be a useful feature some users might like.
But I totally get that it's a lot of work for you guys too and indeed we are building a lot of new features, especially around ML and related features and so the agent is for sure still evolving a lot.
I'm not really sure where that leaves us. I will tag in some of our SRE folks to see if anything we can do to help ease some pain at all in some way.
-
@olivierlambert yep - OpenMetrics is a big deal and totally a standard we want to handle.
https://www.netdata.cloud/blog/release-1-24/
Mainly via the Prometheus collector at the moment but iirc there is still one or two OpenMetrics types we need to do some work to cover - eg Histograms is one we working on at the moment.
-
Nice It's not for tomorrow in XCP-ng, but knowing that it will be supported in the future for Netdata is a great news!
Thanks a lot for coming here and helping to build some bridges, I personally love Netdata (as a sysadmin, and I'm not the only one around!) and I'll be happy to get closer collaboration between our 2 projects!
-
We are actually planning v1.35 stable release for next week so maybe, if not too much hassle (but I can imagine it's certainly non trivial, so defo is some hassle ), we could try see about updating to that on your end with understanding that's like a good base version for the ml stuff and then can leave it at that for a while maybe if ends up being not too painful on your end.
Totally up to you guys, but I can try help any way I can.
Qq, what would be the best way for me to try spin up a sort of test or dev XCP-ng env for me to try things out on? Or is there sort of hardware involved such that this might not be so easy. In my mind I'm imagining spinning up a VM lol which probably shows my level of naivety
-
@andrewm4894 said in Netdata package is now available in XCP-ng:
Qq, what would be the best way for me to try spin up a sort of test or dev XCP-ng env for me to try things out on? Or is there sort of hardware involved such that this might not be so easy. In my mind I'm imagining spinning up a VM lol which probably shows my level of naivety
You can run XCP-ng inside a VM, as long as the hypervisor underneath exposes nested virtualisation. The actual installation of XCP-ng is very easy. Mostly click and run.
-
Yes it's pretty simple, in a VM it's fine
https://xcp-ng.org/docs/install.html
Then, about our dev process, a recap: https://xcp-ng.org/docs/develprocess.html#development-process-tour
But @stormi can guide you
-
@olivierlambert cool, going to give it a go. Do you know of any GCP or AWS based guides or material to help?
I don't actually have any physical hardware available so all the use and boot stuff kinda throws me off (maybe I'm over thinking it). Maybe there is a way to boot a gcp VM or any images I could find in some gcp marketplace or anything?
Going to do more googling and reading but just mentioning in case you know of any specific stuff. I see @lawrencesystems also has some cool yt videos which have been really useful in learning about the concepts and ideas, but I'm kinda looking for a some gcp or aws based tutorial or steps.
-
I think it's a netinstall like this that maybe I need to start with
https://xcp-ng.org/docs/install.html#iso-installation
And then I "have" the iso on the VM and so can go as normal from there.
If so, Amy tips on what gcp VM if any might be easiest to try start with?
Cheers!
-
Ah that's a good question, I think I never tried to install it in an instance.
The easiest path would be likely to rent a cheap dedicated server. We also have a partnership with Equinix on their Cloud Metal offer.
Alternatively, if you like, we can provide you access to a test machine in there (or in our lab) so you can explore a bit
-
apologies if this was covered. is it possible to link a xcp-ng host to the netdata.cloud?
-
@stormi Does XCP-ng support the v1.37.1 of netdata now?
yum install netdata-ui
installs v1.19.0, which doesn't look compatible with netdata cloud anymore. When attempting to add the node to netdata cloud, it fails with the errorUnable to find usable claiming script. Reinstalling Netdata may resolve this.
-
@wawa Not yet, but it's planned, hoping that their latest releases remain compatible with our base system.