Default "sum" aggregation function for *.count metrics

Asked by Nikolai Grigoriev

I am wondering about the example storage-aggregation.conf. Specifically, about the counters.

[sum]
pattern = \.count$
xFilesFactor = 0
aggregationMethod = sum

I have counters that monotonically increase over time - the absolute counters, like the number of bytes going through the network interface. Occasionally they reset to zero. I use nonNegativeDeriative function to display them on the graphs.

The problem is that "sum" aggregation for these values does not work. I even did a simple test to prove myself I am not crazy :) Consider the example: I have the value reported once a minute. The value increases by 100 every minute. I have configured the aggregation so it keeps few values at 60 second precision and then aggregates at 5 minutes. So, here is what I see in the whisper file:

Archive #0 (at 60 sec):
T1: 100
T1+60: 200
T1+120: 300
T1+180: 400
T1+240: 500
T1+300: 600
T1+360: 700
etc

Archive #1 (at 300 sec):
T1+300: 1500
etc

This is wrong. In fact, "max" should be used and the value for T1+300 should be 500 (or 600 - I saw some alignment-related fluctuations). But the primary reason why I am puzzled is that it is the default in the example file. Looks like this "sum" only works for the cases when the metric represents a rate (events/minute) so it will correctly aggregate to events/5-minutes. Or it is a counter that gets reset after each report?

Either I do not understand something very obvious or, probably, this example aggregation file should have a comment about the kind of .*.count$ values it handles correctly.

Question information

Language:
English Edit question
Status:
Solved
For:
Graphite Edit question
Assignee:
No assignee Edit question
Solved by:
Nikolai Grigoriev
Solved:
Last query:
Last reply:
Revision history for this message
Denis Zhdanov (deniszhdanov) said :
#1

Agreed, not sure why we have that in the example. max or even average looks much more natural for counters IMO.

Revision history for this message
Nikolai Grigoriev (ngrigoriev) said :
#2

Yes, I think any of "max", "average" or event "last" would be OK for most of the aggregated counters where precision is not that important. I have switched mine to "max" and now the data from the archive #1 make more sense.

Revision history for this message
Denis Zhdanov (deniszhdanov) said :
#3

Yep, forgot about "last" - fits good too.