Netdata package is now available in XCP-ng
-
@olivierlambert Any idea where's the netdata config file location?
-
/etc/netdata/streaming.conf
IIRC. -
# do not edit, managed by XCP-ng [stream] # Enable this on slaves, to have them send metrics. enabled = yes destination = tcp:*********.****.****:19999 api key = 0b607150-79c6-11eb-9575-c210359af93e timeout seconds = 60 default port = 19999 send charts matching = * buffer size bytes = 1048576 reconnect delay seconds = 5 initial clock resync iterations = 60
I found these in stream.conf
-
Is the destination matches XO IP address?
-
@olivierlambert
Yup It does match XOA's IP Address -
So check netdata logs in
/var/log
to see if there's any error. -
This post is deleted! -
I see this, which looks good at first sight:
2021-05-05 14:53:06: netdata INFO : STREAM_SENDER[localhost] : STREAM localhost [send to tcp:XXX:19999]: connecting... 2021-05-05 14:53:06: netdata INFO : STREAM_SENDER[localhost] : STREAM localhost [send to tcp:XXX:19999]: initializing communication... 2021-05-05 14:53:06: netdata INFO : STREAM_SENDER[localhost] : STREAM localhost [send to tcp:XXX:19999]: waiting response from remote netdata... 2021-05-05 14:53:06: netdata INFO : STREAM_SENDER[localhost] : STREAM localhost [send to tcp:XXX:19999]: established communication - ready to send metrics... 2021-05-05 14:53:07: netdata INFO : PLUGIN[tc] : STREAM localhost [send]: sending metrics..
(caution: your log contains the IP address you masked earlier)
-
@stormi Oh No, have any idea what might be the problem?
-
No, I don't know where the problem resides exactly.
-
Hello!
We have recently added some ml capabilities to netdata from netdata agent v1.32.0 onwards. Basically, its fairly lightweight unsupervised anomaly detection, so in addition to collecting raw metrics each second for about 1% cpu of one core (typically) and no extra storage the agent also can produce an "anomaly bit" that's 1 if recent data "looks" anomalous or 0 for normal.
Some of our users have asked about if this ML capabilities would work on netdata within XCP-ng.
I'm trying to figure out if/how i might go about answering that. I work on the ML part of all this so am a bit (very) naive on the packaging side of things.
I'm wondering if anyone would be able to help me try figure out if the ML features of the netdata agent would be available via XCP-ng?
-
Hello @andrewm4894
Sure, what do you need to know? Let me add @stormi in the conversation
-
@andrewm4894 I am really looking forward to seeing how this update goes because using Netdata inside of XCP-NG is something that I want to do an updated video on to discuss performance and tesing.
-
@olivierlambert basically that if I was using XCP-ng and I set
enabled = yes
in the[ml]
section of thenetdata.conf
if it would just work as normal.Once a user makes that config change they would get the new "Anomaly Detection" menu
What this means then is that within each chart there will also be the new
anomaly-bit
corresponding to each raw metric value.For example if you add
options=anomaly-bit
then you get the anomaly bit instead of the raw value, for example (will probably be all 0 since all normal on that server) https://london.my-netdata.io/api/v1/data?chart=system.cpu&options=anomaly-bitFor users who then claim their nodes into netdata cloud this powers the anomaly advisor feature.
Here is a quick example of me doing a k6 load test against our demo servers.
So i guess (apologies for rambling a bit) my questions are:
- What netdata agent version is available in XCP-ng by default?
- Do you know if there are any custom build flags passed to it as part of the packaging for XCP-ng? e.g sometimes package maintainers might pass things like
--disable-cloud
or--disable-ml
for various reasons so im just trying to double check if anything there that might preclude things.
I think its mainly down to the netdata version that's included by default and if any custom flags passed as part of that packaging (if that makes sense).
-
The netdata version is rather old because XCP-ng 8.2 is a LTS and netdata sadly doesn't have long term maintenance branches that would guarantee an absence of regressions. And I've had serious issues with netdata on XCP-ng in the past (disk full due to a bug that I reported at the time, fixed since but now I completely disable disk writes and let only netdata use a limited RAM database, just in case), so I'm not eager to update often as long as it is working and no unpatchable major security issues are found.
And given the pace at which netdata evolves, each update is a significant amount of packaging work for me
I will provide an update at some point, in the testing repository at least, but I don't know when exactly.
-
Is there a way to reduce both the risk and the packaging burden? Asking @andrewm4894
edit: thing is, the dom0 is a very sensitive piece in XCP-ng, so if we do something bad in there, this can affect all VMs.
We are working on the longer run to be able to get metrics out of the dom0 (we can imagine a dedicated domain or a central Open Metrics compatible database). I wonder if Netdata is capable to work on top of an Open Metrics by the way
-
Additional notes:
- my last update attempt ended in
curl: (6) Could not resolve host: github.com; Unknown error
because the spec file started defaulting to downloading stuff from github when a distro must build everything using only the RPM sources and our build environment has no internet access on purpose. So this means I must re-bundle sources that were unbundled upstream. I had already done such work previously for other components that are downloaded during the RPM build. - EPEL now has a netdata RPM that seems to be updated regularly (currently at 1.34.1) and which I could probably light-fork. It will probably be more suited to the needs of a distro maintainer than the upstream spec file which is tailored for the needs of the netdata project itself (you often see this dichotomy between upstream developers and downstream packagers, as in the example I gave earlier of a spec file that does what no distro would allow: download stuff dynamically at build time).
- my last update attempt ended in
-
@olivierlambert @stormi i see i see. In terms of the ML stuff it's fairly stable (admittedly i'm being subjective here - we have been dogfooding it for the last few months and had it under beta launch since March) and is disabled by default. At some stage we will have it enabled by default but that's much further down the road.
Of course it is still evolving at the same time - for example here is a big discussion about how we can extend the training window beyond the default 4 hours etc which obviously will involve more work and changes.
So in terms of where we are now the base ML functionality that's there I consider fairly stable and would be enough to be a useful feature some users might like.
But I totally get that it's a lot of work for you guys too and indeed we are building a lot of new features, especially around ML and related features and so the agent is for sure still evolving a lot.
I'm not really sure where that leaves us. I will tag in some of our SRE folks to see if anything we can do to help ease some pain at all in some way.
-
@olivierlambert yep - OpenMetrics is a big deal and totally a standard we want to handle.
https://www.netdata.cloud/blog/release-1-24/
Mainly via the Prometheus collector at the moment but iirc there is still one or two OpenMetrics types we need to do some work to cover - eg Histograms is one we working on at the moment.
-
Nice It's not for tomorrow in XCP-ng, but knowing that it will be supported in the future for Netdata is a great news!
Thanks a lot for coming here and helping to build some bridges, I personally love Netdata (as a sysadmin, and I'm not the only one around!) and I'll be happy to get closer collaboration between our 2 projects!