Best practice for counter data

Asked by Brian Candler on 2012-08-16

I am asking about the recommended best practice for dealing with "counter" data, e.g. total number of messages received, which needs to be displayed as a rate, e.g. number of messages per second.

Should I feed the raw counter values directly into Graphite, or would I be better to convert them into rates first, using something like statsd?

I understand from https://answers.launchpad.net/graphite/+question/70553 that storing raw counters can work.

However I have some additional questions:

1. What is the maximum value for a counter value? Is it stored as an integer or as floating-point? (Single-precision floating point would only have effectively 23 bits of accuracy, and double-precision 47 bits; 64-bit counters would therefore not be usable)

2. What happens when a counter wraps around? Does the derivative() function take care of this? Does it assume that values are monotonically increasing, or would I get a huge negative spike when it wraps?

3. For a series of counter data, I note that the aggregation function would have to be set to "last" instead of the default "average".
http://graphite.readthedocs.org/en/1.0/config-carbon.html
Is there a best practice for this? For example, should I add ".counter" to the end of all metrics which are counters?

4. When plotting a graph, you have to ask for "derivative" in the UI. Can I configure it so that all metrics matching a particular pattern automatically use the derivative function? Otherwise there's an extra UI step involved every time you draw one of these, which would be a good reason for storing the rate in the first place.

5. As each counter value is submitted to graphite, it has a timestamp value. Does graphite/whisper store the exact timestamp against each data point, and use it when deriving rates? Or does it just use the timestamp to assign the data point into the nearest sample bin?

If the timestamps are stored it could display more accurate rates when samples are not taken at even intervals. However I don't know if this benefit would be realised in practice or not.

Question information

Language:
English Edit question
Status:
Solved
For:
Graphite Edit question
Assignee:
No assignee Edit question
Solved by:
Michael Leinartas
Solved:
2012-08-17
Last query:
2012-08-17
Last reply:
2012-08-17
Best Michael Leinartas (mleinartas) said : #1

Whether or not you should store rates is really a matter of preference. Many do since it's the most readily useful for charting - integral() can be used to get back to an ongoing summation if necessary. When storing rates there are (at least) two ways it's commonly done - one is to do as derivative() does and take the raw difference between sample values. Another method (how statsd does it) is to store the per-second average rate - that is, the difference between sample values divided by the seconds between samples. The later makes storage-aggregation config simpler (the default of average works well) and doesn't require the user to know the specific data precisions, but the former can be more intuitive to some.

As for your questions:
1. What is the maximum value for a counter value? Is it stored as an integer or as floating-point? (Single-precision floating point would only have effectively 23 bits of accuracy, and double-precision 47 bits; 64-bit counters would therefore not be usable)
All values are stored as doubles on disk and are python floats internally - with python 2.5+ you can do python -c 'import sys; print sys.float_info' for details on the limits and precision.

2. What happens when a counter wraps around? Does the derivative() function take care of this? Does it assume that values are monotonically increasing, or would I get a huge negative spike when it wraps?
the derivative() function will give you the huge negative spike on wrap, but nonNegativeDerivative() will handle a wrap and allows specifying a max value: http://graphite.readthedocs.org/en/0.9.x/functions.html#graphite.render.functions.nonNegativeDerivative

3. For a series of counter data, I note that the aggregation function would have to be set to "last" instead of the default "average".
http://graphite.readthedocs.org/en/1.0/config-carbon.html
Is there a best practice for this? For example, should I add ".counter" to the end of all metrics which are counters?
In my opinion this (or some variation) is the only manageable way to do policies for storage-aggregation schemas. Without a clear convention like this it can be hard to keep up with new categories of metrics.

4. When plotting a graph, you have to ask for "derivative" in the UI. Can I configure it so that all metrics matching a particular pattern automatically use the derivative function? Otherwise there's an extra UI step involved every time you draw one of these, which would be a good reason for storing the rate in the first place.
No, there's currently no way to create a template like this for it to be automatically applied. Avoiding the extra step is indeed the reason many of us store rates.

5. As each counter value is submitted to graphite, it has a timestamp value. Does graphite/whisper store the exact timestamp against each data point, and use it when deriving rates? Or does it just use the timestamp to assign the data point into the nearest sample bin?
No, it's the nearest sample bin that's stored. This does introduce the possibility of some jitter, but in practice it doesn't seem to be significant enough to worry about (for my own cases at least).

Hope this helps

Brian Candler (b-candler) said : #2

Crystal clear and exactly what I was looking for. Many thanks!

Francois Mikus (fmikus) said : #3

You might want to distill this knowledge into the Graphite documentation. ;-)

Thanks for taking the time to clarify.