Good generic storage schema?

Asked by Bruce Lsyik

I'm starting to grab metrics, and was wondering if there's a storage schema that would work for a wide variety of polling intervals? Basically something sensible. Some of my data points would come in every minute, others may even be every 20 minutes.

I'd like to be able to see detailed data for at least a few days, not losing any data points.

Any ideas?

Thanks!

Question information

Language:
English Edit question
Status:
Solved
For:
Graphite Edit question
Assignee:
No assignee Edit question
Solved by:
chrismd
Solved:
Last query:
Last reply:
Revision history for this message
Best chrismd (chrismd) said :
#1

You always want your finest precision archive to match up with the polling interval for that dataset. This typically means you'll have different storage schemas for datasets with different polling intervals. As a rule of thumb I try to do everything minutely as I've found that to be a reasonable default unit of time. It's frequent enough for active monitoring and infrequent enough to represent a meaningful amount of data. Of course it all depends on your data and your use cases.

As far as the retention, that's really just a matter of:
a) how old can datapoints get before you no longer care about them?
b) how much disk space do you have?

You can calculate how much disk space a scheme will cost with this pseudo-python logic:

datapoint_size = 12 #12 bytes per datapoint with whisper (only 8 with ceres)
schema_size = sum([datapoint_size * retention for precision,retention in configured_archives])
disk_space_required = number_of_metrics * schema_size

Revision history for this message
Bruce Lsyik (blysik) said :
#2

Thanks chrismd, that solved my question.