Scaling graphite on AWS
I have a rapidly-growing, evolving AWS deployment. My largest graphite cluster is currently one carbon-relay in front of six carbon-cache nodes using consistent hashing and memcached on each cache node. There are 450 EC2 instances sending data to the carbon-relay via Joe Miller's collectd-graphite plugin. Each cache node shows between 35k-50k metricsReceived
It's clear from that data that it's I/O bound, which is no surprise since I/O on AWS instances is notoriously quite poor (unless you go with the pricey SSD instance). The data volumes are RAID0 of the two ephemeral disks on an m1.large. It's becoming painful to rebalance the data files when adding new instances. Three more instances will be the same price as an SSD. FWIW, each cache node is doing around 600-700 IOPS.
What is the best way to scale this cluster? Should I bite the bullet and fork out cash for an SSD, or is there something else I can do that I haven't thought of?
Question information
- Language:
- English Edit question
- Status:
- Solved
- For:
- Graphite Edit question
- Assignee:
- No assignee Edit question
- Solved by:
- Ben Whaley
- Solved:
- Last query:
- Last reply: