Graphite

Using Seyren/Graphite to Monitor Cluster

Asked by RafaelRP on 2015-10-13

Hi,

In our environment, Graphite receives metrics from various nodes running CollectD. The metrics that are sent by collectD have the following format: collectd.service_<name>.cluster_<name>.<hostname>.<metric-name>
For example, system cpu percentage: collectd.service_helloworld.cluster_hello.host1.cpu-0.cpu-system

I'm using Seyren as the alerting dashboard and would like to setup an alert when the average cpu for a machine exceeds a certain threshold. One way of accomplishing this is to define a Seyren check for each host as follows:

averageSeries(collectd.service_A.cluster_A.hostname_1.cpu-*.cpu-system)
averageSeries(collectd.service_A.cluster_A.hostname_2.cpu-*.cpu-system)
averageSeries(collectd.service_A.cluster_A.hostname_3.cpu-*.cpu-system)
averageSeries(collectd.service_A.cluster_A.hostname_4.cpu-*.cpu-system)

However, as you can see, if you have a large cluster then defining and maintaining such checks becomes unwieldy.

So my question is, is it possible to formulate one graphite query that I could feed in as a target to a Seyren check that would accomplish what I described above? Has anyone else using Seyren/Graphite faced something similar? If so, how did you resolve it?

Question information

Language:: English Edit question

Status:: Solved

For:: Graphite Edit question

Assignee:: No assignee Edit question

Solved by:: Anatoliy Dobrosynets

Solved:: 2015-10-14

Last query:: 2015-10-14

Last reply:: 2015-10-13

Link existing bug

Revision history for this message

Anatoliy Dobrosynets (anatolijd) said on 2015-10-13:

its not very clear what exactly you want, but have you tried to use

averageSeriesWithWildCards(collectd.service_A.cluster_A.*.cpu-*.cpu-system,4)

averageSeriesWithWildcards(nonNegativeDerivative(collectd.service_A.cluster_A.*.cpu-*.cpu-system),4)

Revision history for this message

RafaelRP (rpolan01) said on 2015-10-14:

Hi Anatoliy,

Essentially, I would like a query that will evaluate into multiple targets. In the example I gave above, each averageSeries function would be a target for one seyren check for a given host. So in this case, I'll have 4 seyren check corresponding to 4 machines.

I just experimented with you query, averageSeriesWithWildCards(collectd.service_A.cluster_A.*.cpu-*.cpu-system,4), and this is exactly what I was looking for. Thanks.

Revision history for this message

RafaelRP (rpolan01) said on 2015-10-14:

Thanks Anatoliy Dobrosynets, that solved my question.

Revision history for this message

Jason Dixon (jason-dixongroup) said on 2015-10-14:

Are you sure that something like maxSeries() instead? It makes me sad to think you're alerting on the average of anything.

Revision history for this message

RafaelRP (rpolan01) said on 2015-10-14:

Hi Jason, I'm looking to monitor a cluster that runs a business critical service. As a first step, I decided to setup some very basic alerts for each machine, such as cpu utilization (across all cores). My thoughts were that setting an alert if the average System CPU usage is above a certain threshold(~90%) would warrant taking a look at the workload and take some action (e.g. adding more resources).

What's wrong with alerting on the average?

Revision history for this message

RafaelRP (rpolan01) said on 2015-10-14:

Moreover, I don't want to be alerted when there is a spike in utilization on one of the cores of the machine but only when there is a sustained spike upwards.

Revision history for this message

Jason Dixon (jason-dixongroup) said on 2015-10-15:

http://lmgtfy.com/?q=why+to+alert+on+percentile

Revision history for this message

RafaelRP (rpolan01) said on 2015-10-19:

Thanks. I could have Googled it my self but I was looking forward to read your explanation/opinion on the subject (blogpost suggestion?) ;)

To post a message you must log in.

Ask a question

Edit question

Graphite

Using Seyren/Graphite to Monitor Cluster

Question information

Related bugs

Related FAQ:

Subscribers