process for scaling horizontally

Asked by Bruce Lsyik

I have a single machine running graphite. I'd like to add a second host to scale horizontally.

Can someone explain at a high level what the process for doing this would look like, and maybe point me in the right direction?

Question information

Language:
English Edit question
Status:
Answered
For:
Graphite Edit question
Assignee:
No assignee Edit question
Last query:
Last reply:
Revision history for this message
Michael Leinartas (mleinartas) said :
#1

Generally the first step is to ensure you're utilizing your first host fully. Python is only able to use a single CPU so it's common for the carbon-cache instance to be bottlenecked using 100% of one CPU. In this case you can configure a 2nd (or more) carbon-cache instance and distribute load to it via carbon-relay in consistent-hash mode which will ensure they're all loaded equally but don't share processing of the same metrics (which would cause the same files to be updated by multiple processes).

To go to multiple boxes you'll do the same thing but with carbon-cache instances on separate hosts. Each host will also need to have an instance of the webapp running to serve up the locally-stored metrics. The difficult part of getting this set up is dealing with existing metrics - after cutting over to a carbon-relay in consistent-hash mode serving to two hosts, half of your metrics will start sending to the new host when the historical data is still on the first. In this case you might find it simpler to use the relay-rules method to shard the data yourself based on metric names.

On the webapp side the way the setup works is that each webapp will have the other webapps configured in a CLUSTER_SERVERS setting. While browsing or rendering metrics, each server will be queried for the data and the first server found to have the metric will be used. This makes it important to ensure the metrics are completely separated - the same metric cannot live on multiple machines.

Unfortunately, this stuff isn't very well documented outside of the config files:
https://github.com/graphite-project/graphite-web/blob/master/webapp/graphite/local_settings.py.example#L161-194
https://github.com/graphite-project/carbon/blob/master/conf/carbon.conf.example#L173-199

Revision history for this message
Gerhard Lazu (gerhard-p) said :
#2

Michael, thank you for a great reply.

I'm in the process of setting up a graphite cluster on EC2, our bottleneck is EBS I/O. My plan is to provision 2**n 10GB EBS volumes (one per carbon-cache instance) and have them all loaded onto an m1.large.

How many carbon-cache instances would you say are optimal for an m1.large instance?

Would it be straightforward to add extra carbon-caches to expand total carbon storage? I would like to go with consistent-hashing, not a big fan of manual sharding.

Cheers!

Can you help with this problem?

Provide an answer of your own, or ask Bruce Lsyik for more information if necessary.

To post a message you must log in.