carbon-aggregator latest partial datapoint

Asked by Krzysztof

Hi,

I have a SSD based node with carbon-aggregator being in front of 5 carbon-caches. I have a problem with aggregated metrics, the latest value that is stored in whisper file is always +/- twice lower than the usual metric value.

In the following case im aggregating data from 80 nodes (cpu usage).

Whisper file for cpu-idle looks like this:

1425610260 99681.524708
1425610320 99755.657654
1425610380 51996.786829
1425610440 None

As you can see the latest not-none value is smaller than first 2 values (which are regular ones). After next datapoint come in:

1425610260 99681.524708
1425610320 99755.657654
1425610380 100871.700813
1425610440 52114.747810
1425610500 None

As you can see the latest value is again twice smaller. This becomes very confusing because there is always a fake-drop, and when the real drop appears we wont notice it and wont be able to react fast.

Im feeding graphite with collectd, with interval of 60 seconds, the aggregation rule looks like:

  * Aggregated.hosting.cpu.<cpumetric> (60) = sum Hosting.node.*.cpu-*.<cpumetric>

And my config: http://pastebin.com/Ljtrwnm7

Playing with MAX_UPDATES_PER_SECOND didnt help me. Tried also to play with the aggregation method buffering time but it didnt make the trick. When i compose graph using aggregated metrics then it looks like on the end of graph(newest data) there is always a big drop (half value). However, when I compose this graph using non-aggregated data (just aggregating cllient side in grafana using graphite queries with "*") the graph is looking alright (no drops).

What can cause this behaviour?

Question information

Language:
English Edit question
Status:
Answered
For:
Graphite Edit question
Assignee:
No assignee Edit question
Last query:
Last reply:
Revision history for this message
Jason Dixon (jason-dixongroup) said :
#1

As discussed in IRC, I think this is related to either (or both) the fact that you're on a very old version (0.9.10) or using FORWARD_ALL, which would cause it to send both the raw and aggregated metrics. This sounds very much like what you're seeing. Try upgrading to the newest stable release (0.9.13-pre1) and disabling FORWARD_ALL.

Revision history for this message
Krzysztof (kszarlej94) said :
#2

I upgraded to 0.9.12 and set FORWARD_ALL to False and the problem unfortunately still occurs.

Revision history for this message
Krzysztof (kszarlej94) said :
#3

here is from updates.log

06/03/2015 16:37:19 :: wrote 2 datapoints for Aggregated.hosting.cpu.cpu-system in 0.00020 seconds
06/03/2015 16:38:19 :: wrote 2 datapoints for Aggregated.hosting.cpu.cpu-system in 0.00010 seconds
06/03/2015 16:39:19 :: wrote 2 datapoints for Aggregated.hosting.cpu.cpu-system in 0.00023 seconds
06/03/2015 16:40:19 :: wrote 2 datapoints for Aggregated.hosting.cpu.cpu-system in 0.00012 seconds
06/03/2015 16:41:19 :: wrote 3 datapoints for Aggregated.hosting.cpu.cpu-system in 0.00014 seconds
06/03/2015 16:42:19 :: wrote 2 datapoints for Aggregated.hosting.cpu.cpu-system in 0.00011 seconds
06/03/2015 16:43:19 :: wrote 2 datapoints for Aggregated.hosting.cpu.cpu-system in 0.00009 seconds
06/03/2015 16:44:18 :: wrote 2 datapoints for Aggregated.hosting.cpu.cpu-system in 0.00024 seconds
06/03/2015 16:45:47 :: wrote 2 datapoints for Aggregated.hosting.cpu.cpu-system in 0.00008 seconds
06/03/2015 16:46:47 :: wrote 2 datapoints for Aggregated.hosting.cpu.cpu-system in 0.00006 seconds
06/03/2015 16:47:48 :: wrote 2 datapoints for Aggregated.hosting.cpu.cpu-system in 0.00008 seconds

Caches are writing 2 datapoints for aggregated metrics while for others

06/03/2015 16:49:12 :: wrote 1 datapoints for Bazy.db.db23.cpu-6.cpu-system in 0.00011 seconds
06/03/2015 16:49:19 :: wrote 1 datapoints for Hosting.node.n8.cpu-15.cpu-system in 0.00007 seconds

Can you help with this problem?

Provide an answer of your own, or ask Krzysztof for more information if necessary.

To post a message you must log in.