integral-functions for the long period are incorrect

Asked by Ivan Glushkov on 2011-10-05

Hi.

We have some counter for the number of function calls that are sent to the graphite.
The rate is almost the same, so we may treat it as a constant.

We have the following retention rules:

retentions = 60:10080,300:8640,900:34560,1800:86400
So we have one week with 1m data, next month of 5m data, and so on.

If I use any integral function ("integral" of "summarize") it works in such a way:

integral(series)&from=-1d -> calculate number of events for the last day
integral(series)&from=-6d -> -//- for the last 6 days

when i use 7d (or 1w), it starts using 5m-data. If I'm not mistaken, it just sums all values, and as the number of values is 5 times less, so the summ is 5 times less than real value.

I've made a screenshots with examples:
http://imageshack.us/g/812/51145588.png/

The "6d" integral is about 7.2M, so "1w" should be about 7.2M/6 *7 = 8.4M, but it shows 1.7M (=8.4M/5).

I suppose the same problem occurs on the "1m" boundary and "1y" boundary.

What am I doing wrong? Should I change smth in config files or smth?

My usual usecase is to check the following graphs:
summarize(series,"1d")&from = -1w ( and also from=-1m)
summarize(series,"1m")&from = -3m ( and also from=-1y)
etc.

How should I do this?

Best,
Ivan.

Question information

Language:
English Edit question
Status:
Answered
For:
Graphite Edit question
Assignee:
No assignee Edit question
Last query:
2011-10-07
Last reply:
2011-10-07
chrismd (chrismd) said : #1

I think the problem is that your older data in lower precision archives is averaged when you probably want it summed. If you are on a recent trunk checkout or 0.9.9 you can configure the aggregation method whisper uses to rollup datapoints into lower precision archives. To do this you can GET the url http://graphite/metrics/set-metadata?metric=foo.bar.baz&key=aggregationMethod&value=sum

You can verify it with:

http://graphite/metrics/get-metadata?metric=foo.bar.baz&key=aggregationMethod

Note that this will only affect future aggregations, it cannot recompute the current aggregate values. You can also change it en masse by POSTing application/json data like this to the set-metadata url:

{
  "operations": [
    {"metric": "foo.bar.baz", "key": "aggregationMethod", "value": "sum"},
    {"metric": "foo.bar.baz2", "key": "aggregationMethod", "value": "sum"},
    ...
  ]
}

In order to make sure metrics created in the future have the proper aggregationMethod, create a /opt/graphite/conf/storage-aggregation.conf file. It works the exact same way storage-schemas.conf does except instead of specifying 'retentions' for a set of metrics, you define 'aggregationmethod' and/or 'xfilesfactor'. For example:

[latency-metrics]
pattern = .*responseTime
aggregationmethod = average

[everything-else]
match-all = true
aggregationmethod = sum

Again this only works with 0.9.9 or recent trunk. I hope that helps.

Ivan Glushkov (gli-passw) said : #2

Hi Chris,
Thanks for your answer.

If I understand correctly, by using the specified method I wouldn't be able to check the "usual" graphics (I mean "target=series" without "integral" or "summarize")?

The thing is that I forgot to mention that we also do introspecting of the graphics (not the integral values)
so in addition to the usecases above we also do:

target=series1&from=-1month
target=series2&from=-1year

etc.

Could I use your method and don't break smth in my data?

Ivan.

Ivan Glushkov (gli-passw) said : #3

I haven't checked the code thoroughly, but it seems to me that it might be done on-the-fly:

webapp/graphite/render/functions.py
def integral(requestContext, seriesList):
...
current += val

If we could check that the "val" is not from the 1m data, but from the 5m data (or 15m data) we could multiply it by the needed number and get the correct answer.

I mean smth like that (pseudocode):
...
current += val * multiplier(timeseries)
...

def multiplier(timeseries):
      if timeseries = "5m":
             return 5
      elif timeseries = "15m":
             return 15

      ....
      elif timeseries = "1y":
             return 60 * 24 * 356;

Surely, it can be done in a more accurate way, I just made it so rough to make my idea clear.

chrismd (chrismd) said : #4

The methods I outlined change the way whisper aggregates data as it rolls into lower precision archives, so it changes what aggregate values get stored. The summarize() function lets you change how the rendering buckets datapoints and as long as your data is stored correctly (ie. you probably want to aggregate your counters with sum instead of average) then it should give you what you want. But again changing aggregationMethod will only affect future aggregations and thus won't affect any existing aggregated data. So yes I would change the aggregationMethods and use summarize. The rendering code isn't aware of aggregation configurations so we can't do what you suggest of multiplying the value based on the aggregation config.

Can you help with this problem?

Provide an answer of your own, or ask Ivan Glushkov for more information if necessary.

To post a message you must log in.