Running multiple carbon caches and rabbitmq

Asked by Krzysztof

Hi guys,

I want to run monitoring stack based on collectd+carbon+rabbitmq however I have few questions.

I will be pulling quite a lot of metrics about 500hosts to monitor with interval of 1 minute (some important metrics will be collected more often, not less than 10secs) so I assume about 200k metrics/minute. Do I need multiple carbon caches to handle this amount of traffic? Or a single carbon cache should be enough? (carbon will be on quite powerful server with 6 SSD disks).

Second question is about RabbitMQ. By default carbon creates exclusive queues - I want to use rabbit for zero downtime - when server with carbon will die rabbitmq will be collecting the metrics. But when carbon disconnects, the exclusive queue dissapears. Why by defualt carbon creates exclusive queue? I can possibly change the code of the amqp section in carbon .py files but is it a good idea?

For now I have an idea of running a single queue in rabbit and use carbon-relay to spread metrics over multiple carbon-caches? Is it a good idea?

There were also performance problems with txamqp plugin in earlier version of carbon, are those problems still occuring? And is the pickle protocol still best option for scalability?

Question information

Language:
English Edit question
Status:
Answered
For:
Graphite Edit question
Assignee:
No assignee Edit question
Last query:
Last reply:
Revision history for this message
Denis Zhdanov (denis-zhdanov) said :
#1

Hi Krzystof,

2015-01-19 14:56 GMT+01:00 Krzysztof <email address hidden>:

> Question #260936 on Graphite changed:
> https://answers.launchpad.net/graphite/+question/260936
>
> Description changed to:
> Hi guys,
>
> I want to run monitoring stack based on collectd+carbon+rabbitmq however
> I have few questions.
>
> I will be pulling quite a lot of metrics about 500hosts to monitor with
> interval of 1 minute (some important metrics will be collected more
> often, not less than 10secs) so I assume about 200k metrics/minute. Do I
> need multiple carbon caches to handle this amount of traffic? Or a
> single carbon cache should be enough? (carbon will be on quite powerful
> server with 6 SSD disks).

I would recommend to run 4-6 carbon caches (behind relay with
consistance-hashing routing) to utilize your disk horsepower wisely. IMO
single carbon instance is not able to do so.

Second question is about RabbitMQ. By default carbon creates exclusive
> queues - I want to use rabbit for zero downtime - when server with
> carbon will die rabbitmq will be collecting the metrics. But when carbon
> disconnects, the exclusive queue dissapears. Why by defualt carbon
> creates exclusive queue? I can possibly change the code of the amqp
> section in carbon .py files but is it a good idea?
>
> For now I have an idea of running a single queue in rabbit and use
> carbon-relay to spread metrics over multiple carbon-caches? Is it a good
> idea?
>
> There were also performance problems with txamqp plugin in earlier
> version of carbon, are those problems still occuring? And is the pickle
> protocol still best option for scalability?
>
I'm doubt that many people still running Graphite with AMQP transport... I
have no such experience unfortunately, but maybe someone has...

WBR,
   Denys

Revision history for this message
Denis Zhdanov (denis-zhdanov) said :
#2

2015-01-19 14:56 GMT+01:00 Krzysztof <email address hidden>:

> For now I have an idea of running a single queue in rabbit and use
> carbon-relay to spread metrics over multiple carbon-caches? Is it a good
> idea?
>
Ah, you want to use AMQP transport only for metrics balancing?
I'm doubt that you need that. Just use normal carbon relay (on same host)
with REPLICATION_FACTOR=1 and consistent-hash routing - it will distribute
load across your carbon caches.

Revision history for this message
Krzysztof (kszarlej94) said :
#3

Denis,

and what about HA when my carbon machine dies? how to avoid "breaks" in my graphs when my carbon machine is down? graphite clustering?

Revision history for this message
Denis Zhdanov (denis-zhdanov) said :
#4

Hi Krzysztof,

2015-01-19 15:21 GMT+01:00 Krzysztof <email address hidden>:

> and what about HA when my carbon machine dies? how to avoid "breaks" in
> my graphs when my carbon machine is down? graphite clustering?

It is hard to do so with AMQP. If your server with carbon caches dies it
will loose all datapoints which was still reside in memory and not yet
flushed to disk (on SSD is not that critical - but you can loose 1-2
minutes of datapoints).
You can setup cluster with another tier of relays, which will have
REPLICATION_FACTOR=2 and will distribute your load across 2 (better 3)
 backends - but if one of your nodes experienced downtime - you need to
proper sync up whisper files across nodes before enabling it - otherwise
you will have gap in graphs too.

WBR,
   Denys

Can you help with this problem?

Provide an answer of your own, or ask Krzysztof for more information if necessary.

To post a message you must log in.