Aggregation rule and non aggregated data rules

Asked by Regnoult on 2016-04-13

Hi,

I'm testing the aggregator and something is unexpected.
Here is my aggregation rule:
<env>.<system>.<host>.<app>.<metrics>.sum_10 (10) = sum <env>.<system>.<host>.<app>.<<metrics>>.raw$

Now I've added this line in carbon/clients.py:321
def sendDatapoint(self, metric, datapoint):
    for destination in self.router.getDestinations(metric):
-> log.clients("Sending metric to destination %s" % (metric))
      self.client_factories[destination].sendDatapoint(metric, datapoint)
and carbon/aggregator/buffer.py:69:
        state.events.metricGenerated(self.metric_path, datapoint)
-> log.aggregator("Metric generated %s" % self.metric_path)
        state.instrumentation.increment('aggregateDatapointsSent')

I'm sending these 2 every 5 secs (in python):
"test.bash.stats", random.randint(0,100)
"TEST.boxA.something.applicationXX.QUERY_TIMES.ODB.CURSSQL.raw", random.randint(0,100)

And here are the logs when starting my aggregator in DEBUG:
-------------------------------------------------
13/04/2016 15:11:36 :: [console] Log opened.
13/04/2016 15:11:36 :: [console] twistd 16.0.0 (/usr/bin/python 2.7.6) starting up.
13/04/2016 15:11:36 :: [console] reactor class: twisted.internet.epollreactor.EPollReactor.
13/04/2016 15:11:36 :: [console] CarbonReceiverFactory starting on 2023
13/04/2016 15:11:36 :: [console] Starting factory <carbon.service.CarbonReceiverFactory instance at 0x7f1533a7c908>
13/04/2016 15:11:36 :: [console] CarbonReceiverFactory starting on 2024
13/04/2016 15:11:36 :: [console] Starting factory <carbon.service.CarbonReceiverFactory instance at 0x7f1533a7c7e8>
13/04/2016 15:11:36 :: [console] Starting factory CarbonClientFactory(10.33.21.38:2014:None)
13/04/2016 15:11:36 :: [clients] CarbonClientFactory(XX.XX.XX.XX:2014:None)::startedConnecting (XX.XX.XX.XX:2014)
13/04/2016 15:11:36 :: [clients] CarbonClientProtocol(XX.XX.XX.XX::2014:None)::connectionMade
13/04/2016 15:12:23 :: [listener] MetricPickleReceiver connection with YY.YY.YY.YY:55986 established
13/04/2016 15:12:23 :: [console] Couldn't match metric test.bash.stats with any aggregation rule. Passing on un-aggregated.
13/04/2016 15:12:23 :: [clients] Sending metric to destination test.bash.stats
13/04/2016 15:12:23 :: [aggregator] Allocating new metric buffer for LIVE.boxA.something.applicationXX.QUERY_TIMES.ODB.CURSSQL.sum_10
13/04/2016 15:12:23 :: [clients] Sending metric to destination LIVE.boxA.something.applicationXX.QUERY_TIMES.ODB.CURSSQL.raw
13/04/2016 15:12:28 :: [console] Couldn't match metric test.bash.stats with any aggregation rule. Passing on un-aggregated.
13/04/2016 15:12:28 :: [clients] Sending metric to destination test.bash.stats
13/04/2016 15:12:28 :: [clients] Sending metric to destination LIVE.boxA.something.applicationXX.QUERY_TIMES.ODB.CURSSQL.raw
13/04/2016 15:12:33 :: [console] Couldn't match metric test.bash.stats with any aggregation rule. Passing on un-aggregated.
13/04/2016 15:12:33 :: [clients] Sending metric to destination test.bash.stats
13/04/2016 15:12:33 :: [console] Couldn't match metric LIVE.boxA.something.applicationXX.QUERY_TIMES.ODB.CURSSQL.sum_10 with any aggregation rule. Passing on un-aggregated.
13/04/2016 15:12:33 :: [clients] Sending metric to destination LIVE.boxA.something.applicationXX.QUERY_TIMES.ODB.CURSSQL.sum_10
13/04/2016 15:12:33 :: [aggregator] Metric generated LIVE.boxA.something.applicationXX.QUERY_TIMES.ODB.CURSSQL.sum_10
13/04/2016 15:12:33 :: [clients] Sending metric to destination LIVE.boxA.something.applicationXX.QUERY_TIMES.ODB.CURSSQL.raw
13/04/2016 15:12:38 :: [console] Couldn't match metric test.bash.stats with any aggregation rule. Passing on un-aggregated.
13/04/2016 15:12:38 :: [clients] Sending metric to destination test.bash.stats
13/04/2016 15:12:38 :: [clients] Sending metric to destination LIVE.boxA.something.applicationXX.QUERY_TIMES.ODB.CURSSQL.raw
13/04/2016 15:12:43 :: [console] Couldn't match metric LIVE.boxA.something.applicationXX.QUERY_TIMES.ODB.CURSSQL.sum_10 with any aggregation rule. Passing on un-aggregated.
13/04/2016 15:12:43 :: [clients] Sending metric to destination LIVE.boxA.something.applicationXX.QUERY_TIMES.ODB.CURSSQL.sum_10
13/04/2016 15:12:43 :: [aggregator] Metric generated LIVE.boxA.something.applicationXX.QUERY_TIMES.ODB.CURSSQL.sum_10
13/04/2016 15:12:43 :: [console] Couldn't match metric test.bash.stats with any aggregation rule. Passing on un-aggregated.
13/04/2016 15:12:43 :: [clients] Sending metric to destination test.bash.stats
13/04/2016 15:12:43 :: [clients] Sending metric to destination LIVE.boxA.something.applicationXX.QUERY_TIMES.ODB.CURSSQL.raw
13/04/2016 15:12:48 :: [console] Couldn't match metric test.bash.stats with any aggregation rule. Passing on un-aggregated.
13/04/2016 15:12:48 :: [clients] Sending metric to destination test.bash.stats
13/04/2016 15:12:48 :: [clients] Sending metric to destination LIVE.boxA.something.applicationXX.QUERY_TIMES.ODB.CURSSQL.raw
13/04/2016 15:12:53 :: [console] Couldn't match metric LIVE.boxA.something.applicationXX.QUERY_TIMES.ODB.CURSSQL.sum_10 with any aggregation rule. Passing on un-aggregated.
13/04/2016 15:12:53 :: [clients] Sending metric to destination LIVE.boxA.something.applicationXX.QUERY_TIMES.ODB.CURSSQL.sum_10
13/04/2016 15:12:53 :: [aggregator] Metric generated LIVE.boxA.something.applicationXX.QUERY_TIMES.ODB.CURSSQL.sum_10
13/04/2016 15:12:53 :: [console] Couldn't match metric test.bash.stats with any aggregation rule. Passing on un-aggregated.
13/04/2016 15:12:53 :: [clients] Sending metric to destination test.bash.stats
13/04/2016 15:12:53 :: [clients] Sending metric to destination LIVE.boxA.something.applicationXX.QUERY_TIMES.ODB.CURSSQL.raw
13/04/2016 15:12:58 :: [console] Couldn't match metric test.bash.stats with any aggregation rule. Passing on un-aggregated.
13/04/2016 15:12:58 :: [clients] Sending metric to destination test.bash.stats
13/04/2016 15:12:58 :: [clients] Sending metric to destination LIVE.boxA.something.applicationXX.QUERY_TIMES.ODB.CURSSQL.raw
-------------------------------------------------

1. raw data is being passed over. Isn't the aggregator supposed to retain these values and pass them?
2. This line shows that the output of an aggregation is being re-aggregated:
13/04/2016 15:12:53 :: [console] Couldn't match metric LIVE.boxA.something.applicationXX.QUERY_TIMES.ODB.CURSSQL.sum_10 with any aggregation rule. Passing on un-aggregated.

So I want to send the least amount of data possible, so just the aggregated output calculated every X seconds. I've seen the config item: FORWARD_ALL = True
But if I switch it to False, here are the logs:
-------------------------------------------------
13/04/2016 15:26:46 :: [console] twistd 16.0.0 (/usr/bin/python 2.7.6) starting up.
13/04/2016 15:26:46 :: [console] reactor class: twisted.internet.epollreactor.EPollReactor.
13/04/2016 15:26:46 :: [console] CarbonReceiverFactory starting on 2023
13/04/2016 15:26:46 :: [console] Starting factory <carbon.service.CarbonReceiverFactory instance at 0x7f74fdf1f758>
13/04/2016 15:26:46 :: [console] CarbonReceiverFactory starting on 2024
13/04/2016 15:26:46 :: [console] Starting factory <carbon.service.CarbonReceiverFactory instance at 0x7f74fdf1f638>
13/04/2016 15:26:46 :: [console] Starting factory CarbonClientFactory(10.33.21.38:2014:None)
13/04/2016 15:26:46 :: [clients] CarbonClientFactory(XX.XX.XX.XX::2014:None)::startedConnecting (XX.XX.XX.XX::2014)
13/04/2016 15:26:46 :: [clients] CarbonClientProtocol(XX.XX.XX.XX::2014:None)::connectionMade
13/04/2016 15:26:52 :: [listener] MetricPickleReceiver connection with YY.YY.YY.YY:57884 established
13/04/2016 15:26:52 :: [aggregator] Allocating new metric buffer for LIVE.boxA.something.applicationXX.QUERY_TIMES.ODB.CURSSQL.sum_10
13/04/2016 15:27:02 :: [aggregator] Metric generated LIVE.boxA.something.applicationXX.QUERY_TIMES.ODB.CURSSQL.sum_10
13/04/2016 15:27:12 :: [aggregator] Metric generated LIVE.boxA.something.applicationXX.QUERY_TIMES.ODB.CURSSQL.sum_10
13/04/2016 15:27:22 :: [aggregator] Metric generated LIVE.boxA.something.applicationXX.QUERY_TIMES.ODB.CURSSQL.sum_10
13/04/2016 15:27:32 :: [aggregator] Metric generated LIVE.boxA.something.applicationXX.QUERY_TIMES.ODB.CURSSQL.sum_10
13/04/2016 15:27:42 :: [aggregator] Metric generated LIVE.boxA.something.applicationXX.QUERY_TIMES.ODB.CURSSQL.sum_10
13/04/2016 15:27:52 :: [aggregator] Metric generated LIVE.boxA.something.applicationXX.QUERY_TIMES.ODB.CURSSQL.sum_10
13/04/2016 15:28:02 :: [aggregator] Metric generated LIVE.boxA.something.applicationXX.QUERY_TIMES.ODB.CURSSQL.sum_10
-------------------------------------------------

As you can see nothing is being sent. Not the test.bash.stats, nor .raw, nor .sum10
Is there something I am doing wrong or I didn't understand the purpose of the aggregator?

Regards
Francois

Question information

Language:
English Edit question
Status:
Solved
For:
Graphite Edit question
Assignee:
No assignee Edit question
Solved by:
Regnoult
Solved:
2016-04-19
Last query:
2016-04-19
Last reply:
Denis Zhdanov (deniszhdanov) said : #1

Not sure if I understand your question completely...
>1. raw data is being passed over. Isn't the aggregator supposed to retain these values and pass them?
That's what FORWARD_ALL does. If FORWARD_ALL=False then aggregator will filter raw values, otherwise it will pass.

>2. This line shows that the output of an aggregation is being re-aggregated
Yes, that's intended, IIRC

>As you can see nothing is being sent.
Why nothing? Raw data is filtered, aggregates should coming out.

Regnoult (regnoultf) said : #2

Hi Denis,

>>As you can see nothing is being sent.
>Why nothing? Raw data is filtered, aggregates should coming out.

I've put a log line for each time something is being sent:
in class CarbonClientManager:
def sendDatapoint(self, metric, datapoint):
    for destination in self.router.getDestinations(metric):
-> log.clients("Sending metric to destination %s" % (metric))
      self.client_factories[destination].sendDatapoint(metric, datapoint)

So I'm expecting to see a line "Sending metric to destination LIVE.boxA.something.applicationXX.QUERY_TIMES.ODB.CURSSQL.sum_10" in my second example.

Regnoult (regnoultf) said : #3

Hi Denis,

Apparently the problem came from the source I was using. I pulled and used master instead of 0.9.x branch. Using the latest worked as expected