clustered view with a different file retention policy

Asked by Pete Emerson

A lot of the time we look at sumSeries() across many servers. Keeping server-granularity metrics around for a long time is costly in terms of disk utilization, so we're also storing a cluster view with longer retention policies.

Today, our metrics are loaded into graphite en masse, and we manually calculate a sumSeries() and load that into the cluster view.

Soon, I'm planning on going direct-to-carbon.

In order to generate the cluster view, I'm envisioning a cron job that loads in all of the server metrics and calculates and sends back the cluster metrics.

Is there a slick way you can think of to make this process as easy as possible, or perhaps a totally different approach that might be better?

This ties in with my question about SCSI disk performance vs. RAM disk (https://answers.launchpad.net/graphite/+question/136096); if it's plausible to run off of SCSI with many nodes, then disk usage is much less of a concern than our current 15 GB RAM disk.

Thanks,
Pete

Question information

Language:
English Edit question
Status:
Answered
For:
Graphite Edit question
Assignee:
No assignee Edit question
Last query:
Last reply:
Revision history for this message
chrismd (chrismd) said :
#1

In general I usually send my data to a central processing app before forwarding it on to graphite exactly because I like to calculate aggregate metrics in addition to the per-server metrics. Doing sumSeries() on a large set of metrics will be much slower than using a pre-calculated aggregate metric.

Doing a cron job to pull the per-server metrics and then compute and store an aggregate will certainly work but it will be less real-time than the per-server metrics. It will also put an additional strain on Graphite to read all of your per-server metrics repeatedly to compute the aggregates. Whether this is acceptable or not depends on your requirements/resources, it is probably the easiest solution though.

If you want to calculate the aggregate metrics in real-time but don't want to implement a big scalable processing app yourself then I have another idea that might work for you. I've recently been toying around with an incredibly awesome app called Redis (http://code.google.com/p/redis/). It is like memcached except it supports real datastructures instead of just key=value pairs. Here's what you could do...

Have your clients send their metrics to carbon directly so you get your per-server metrics. But also have them push some of the metrics into redis, it will serve as an aggregation buffer, then you could have a simple cron job pull the big pile of metrics out of redis and compute the aggregate metrics and send them into Graphite. This way you only have to write redis clients (very simple) instead of a big scalable server.

Say you've got web server request metrics like request-count and response times. You could use Redi's INCR command to sum up the request counts and Redis's LPUSH to make a list of the response times. Then your cron job pulls these values out and resets the counters/lists, computes the averages, etc, sends to Graphite. It would probably only be a few dozen lines of code in all.

Can you help with this problem?

Provide an answer of your own, or ask Pete Emerson for more information if necessary.

To post a message you must log in.