intelligent scaling for derivatives

Asked by Kevin Blackham on 2011-03-11

Is there a function (existing or planned) for an more intelligent scale()? For example, I want "events per minute". My near-term storage resolution is 300 seconds, so I scale by 0.2. If I expand that graph to 30 days and go past my near-term storage, I have to recalculate the scale based on the mid-term storage interval. The problem is I don't really know which bucket it's going to use, as storage policies and requested start/end times vary.

Something like scaleSeconds(nonNegativeDerivative(some.counter),60) would examine which storage bucket it's using, and result in 0.2 if 300 second resolution, 1 for 60 second resolution, and 30 for 1800 second resolution. Does this make sense?

I am clueful enough to contribute patches.

Question information

Language:
English Edit question
Status:
Answered
For:
Graphite Edit question
Assignee:
No assignee Edit question
Last query:
Last reply:
Revision history for this message
chrismd (chrismd) said :
#1

This almost exists already in the form of the summarize() function. Given a metric of say, 5-minute precision and a summarization interval, say "1h" it sums the 12 datapoints in each hour into coarser hourly datapoints. It sounds like what you need is a generalization of this that can also apply to finer intervals. It seems we could simply augment summarize() to check the precision of the given metric against the desired summarization interval and if the interval is coarser, apply the current logic, but if it is finer to divide each datapoint by the ratio of the interval sizes.

I think this approach would work in cases where you're graphing a large enough time period to force datapoint consolidation (ie. number of datapoints > number of pixels in width of graph area). The default consolidation method is averaging so you're graph would end up showing the average minutely rates.

Clueful patch-contributors are always welcome! :)

If you put in a request to join the graphite-dev team I can give you commit permission on trunk. I'd be happy to help if you want to take a stab at implementing this functionality.

Revision history for this message
chrismd (chrismd) said :
#2

If you want to take a peek at the current summarize() implementation or any other rendering functions, they are in webapp/graphite/render/functions.py

Revision history for this message
Shane Hathaway (shane-hathawaymix) said :
#3

I created a function called hitcount() that might do just what Kevin wants. See:

https://bugs.launchpad.net/graphite/+bug/731894
https://code.launchpad.net/~shane-hathawaymix/graphite/hitcount

Revision history for this message
chrismd (chrismd) said :
#4

Thanks for reminding me of that Shane, I hadn't had time to take a look at it yet. I'll take a look at it tomorrow. Out of curiousity, is there anything in the code that makes it specific to metrics that can be interpreted as per-second counters?

Revision history for this message
Shane Hathaway (shane-hathawaymix) said :
#5

It's probably not actually specific to hit counts. I've used it successfully to monitor other kinds of quantities. I called it hitcount() because I thought it might be specific to my case, but the more I think about it, the more I think maybe hitcount() is really just an improved version of summarize().

Revision history for this message
redbaron (ivanov-maxim) said :
#6

Question doesnt look answered to me. Indeed, we need some intellegent scaling function which can detect current data resolution.

Just to make things clearer. Lets say we have 10s resolution for 1 day and 1minute resolution for 7 days , and have a gauge metric, lets say "net.eth0.tx.host1", same number as you see in ifconfig output.

Then we want to plot a bytes/second graph. How to do it? FIrst we calculate "derivative(net.eth0.tx.hos1)", but it gives us bytes/<resolution in seconds> , which is not we want. To gets bytes/sec we need to scale derivative by 1/10 (to have 1sec value out of 10 sec), so we plot "scale(derivative(net.eth0.tx.hos1),0.1)" and it works perfect but have 2 problems:

- If we decide to change retention params we need to redo all our graphs
- If we try to display this graph for time period longer than "10s" resolution will drop to 60 seconds, derivative will return value "bytes/60seconds" therefore we need to change scale from 0.1 to 1/60

What is needed is an intelligent scale function to be able to define graphing options once and get correct values for all resolutions.

Revision history for this message
redbaron (ivanov-maxim) said :
#7

Submitted patch in Bug #937744

Revision history for this message
Nicholas Leskiw (nleskiw) said :
#8

This request doesn't make much sense to me. If you want bytes/sec, why aren't you sending that data to Graphite?

I think a much better solution would be to send the actual data you want to graph to graphite, rather than trying to calculate it after the fact.

-Nick

On Feb 21, 2012, at 3:55 AM, redbaron <email address hidden> wrote:

> Question #148743 on Graphite changed:
> https://answers.launchpad.net/graphite/+question/148743
>
> redbaron posted a new comment:
> Question doesnt look answered to me. Indeed, we need some intellegent
> scaling function which can detect current data resolution.
>
> Just to make things clearer. Lets say we have 10s resolution for 1 day
> and 1minute resolution for 7 days , and have a gauge metric, lets say
> "net.eth0.tx.host1", same number as you see in ifconfig output.
>
> Then we want to plot a bytes/second graph. How to do it? FIrst we
> calculate "derivative(net.eth0.tx.hos1)", but it gives us
> bytes/<resolution in seconds> , which is not we want. To gets bytes/sec
> we need to scale derivative by 1/10 (to have 1sec value out of 10 sec),
> so we plot "scale(derivative(net.eth0.tx.hos1),0.1)" and it works
> perfect but have 2 problems:
>
> - If we decide to change retention params we need to redo all our graphs
> - If we try to display this graph for time period longer than "10s" resolution will drop to 60 seconds, derivative will return value "bytes/60seconds" therefore we need to change scale from 0.1 to 1/60
>
> What is needed is an intelligent scale function to be able to define
> graphing options once and get correct values for all resolutions.
>
> --
> You received this question notification because you are a member of
> graphite-dev, which is an answer contact for Graphite.
>
> _______________________________________________
> Mailing list: https://launchpad.net/~graphite-dev
> Post to : <email address hidden>
> Unsubscribe : https://launchpad.net/~graphite-dev
> More help : https://help.launchpad.net/ListHelp

Revision history for this message
Dave Rawks (drawks) said :
#9

Just wanted to chime in to say that I think this is a useful addition. Not everyone has the luxury of getting their metrics as "gauges", first order derivatives. Especially if you are already getting data via something counter based, opennms, snmp, etc.. And even those who would like to convert to using "gauge" instead of counter may have a large backlog of existing metrics and not have the desire to go back and calculate a derivative and reinject into carbon. Plus the conversion from gauge to counter is inherently lossy since you can always calculate the rate from the counter, but not vice versa.

Revision history for this message
Michael Leinartas (mleinartas) said :
#10

Yeah, I agree there's a gap here. A lot of us really dont have the luxury of deciding how our metrics come in as drawks points out. There's also something to be said for storing the data in the least modified form (e.g. raw counter values). I think this is related to this question: https://answers.launchpad.net/graphite/+question/184877 and solving this perhaps solves it.

I think what Chris proposed a while back at the top of this thread with the summarize() function makes sense (at least to me).

Say you sample counters every 10 seconds. A nonNegativeDerivative gives you a rate of events/10sec. Using summarize() with a '1min' interval in 'sum' mode gives events/minute. What's proposed is that if you give summarize a '1s' interval, a events/second rate would be imputed.

This only makes sense for 'sum' mode. The other modes 'avg', 'last', 'min', 'max' would have to either error out or behave as keepLastValue()

Aman Gupta and Chris Davis did some work on the summarize() function a few weeks back. The new version (which is intended to replace the original) is called smartSummarize(). I propose that we add this functionality to it rather than creating a new function.

Can anyone spot problems with this approach? The patch from redbaron isn't off the table, it just feels to me that summarize() could gain a lot by having this - you then really dont have to know what your interval is at all to get the resolution of rates you want in either direction.

Revision history for this message
magec (magec) said :
#11

I think, the summarize approach is not enough. Now I'm experiencing another issue and I think there whould be no way of solving it. When carbon writes the points into whisper it uses the timestamp just to know what "metric slot" the data should be stored into and, as far as I know, the timestamp is then discarded.

If a proper derivative function is to be implemented it should have into account the time interval between every two points. The summarize approach will still have this problem.

My question is: Is the timestamp discarded as I say? There is another backend to store data, Does it discard the timestamp information as well? Whouldn't it be possible to store an offset for every point in the whisper file with the "drift" of the metric so a better derivative function can be implemented?

I think this issue should be well resolved cause this case is so common, counters in network appliances, cpu, memory, etc...

Revision history for this message
Nicholas Leskiw (nleskiw) said :
#12

All facets of Graphite, Carbon and Whisper assume that you're using data in
buckets, not time,value pairs. There's lots of good reasons for this as
well. It's one of those things that looks simple at first, (oh, sure, just
make 'em time value pairs.) but becomes deviously complicated in the
implementation. Changing this would practically be a complete re-write of
the whole application. Rendering, storage, and collection would all have
to change.

Why don't you calculate bytesPerTenSeconds, bytesPerMin (and maybe
megabytesPerHour) with no rollup (i.e. only one retention rate) and store
them in 3 separate .wsp files? If your data source can't produce that data,
I feel for you, but it's probably a lot easier to do some simple math
before sending the data than re-writing the entire Graphite storage system
for an edge case like this.

-Nick

On Mon, Feb 27, 2012 at 4:11 AM, magec <<email address hidden>
> wrote:

> Question #148743 on Graphite changed:
> https://answers.launchpad.net/graphite/+question/148743
>
> magec posted a new comment:
> I think, the summarize approach is not enough. Now I'm experiencing
> another issue and I think there whould be no way of solving it. When
> carbon writes the points into whisper it uses the timestamp just to know
> what "metric slot" the data should be stored into and, as far as I know,
> the timestamp is then discarded.
>
> If a proper derivative function is to be implemented it should have into
> account the time interval between every two points. The summarize
> approach will still have this problem.
>
> My question is: Is the timestamp discarded as I say? There is another
> backend to store data, Does it discard the timestamp information as
> well? Whouldn't it be possible to store an offset for every point in the
> whisper file with the "drift" of the metric so a better derivative
> function can be implemented?
>
> I think this issue should be well resolved cause this case is so common,
> counters in network appliances, cpu, memory, etc...
>
> --
> You received this question notification because you are a member of
> graphite-dev, which is an answer contact for Graphite.
>
> _______________________________________________
> Mailing list: https://launchpad.net/~graphite-dev
> Post to : <email address hidden>
> Unsubscribe : https://launchpad.net/~graphite-dev
> More help : https://help.launchpad.net/ListHelp
>

Can you help with this problem?

Provide an answer of your own, or ask Kevin Blackham for more information if necessary.

To post a message you must log in.