Distributed database for Graphite

Asked by Steve Keller

Hi - I don't use Graphite (yet), but it looks as if it can solve a big problem for me. We currently use Opsview as our monitoring system, collecting about 1.5 million data points per day. Opsview has a data warehouse capability which I plugged into our existing graphing application (an internally developed tool). However, we have latency problems that keep the database from being updated regularly (the code that the Opsview team wrote didn't help here either) because of table locks when insertions are happening.

This data is also stored locally in RRD databases - it looks to me as if I could bypass the data warehouse thing and just store the data in Whisper databases located near the monitoring systems. We have 7 monitoring systems in each of two data centers (it's the remote data center that causes the DB locks mentioned above).

We would either have 2 or 14 Whisper databases: Opsview currently has 1 "runtime" DB for each monitoring system, but the RRDs are stored locally on the Opsview system itself. But I don't know if it would be better to use one Whisper DB for each monitoring system (approximately 100K data points daily) or one for each data center (approximately 750K data points daily).

So, the million-data-point question is this - can a Graphite front end successfully integrate data from multiple remote locations? That is, can I look at data from multiple Whisper databases simultaneously, and has anyone ever implemented a system where one or more databases is located several hundred milliseconds away? (Our data centers are in San Jose and Virginia.)

And if that is possible, what is the maximum latency that is supported by this system? For various reasons, it would be better for us to have monitoring systems in multiple remote locations, some that are far away, latency wise.

Thanks so much

Question information

Language:
English Edit question
Status:
Solved
For:
Graphite Edit question
Assignee:
No assignee Edit question
Solved by:
chrismd
Solved:
Last query:
Last reply:
Revision history for this message
Best chrismd (chrismd) said :
#1

Hi Steven, sorry for the delay in getting back to you it's been a crazy weekend.

The good news is I think Graphite can help you. You can setup as many graphite installations on different servers as you need, each with its own carbon-cache (the storage backend you send your data to) and webapp instance. You can then configure the webapps to share data with one another completely transparently by defining a CLUSTER_SERVERS list in your local_settings.py. Multiple remote data centers should work fine, that's what I'm deploying myself in the next few weeks. The latency shouldn't stop you from doing anything but it can affect the responsiveness of browsing the metric hierarchy as well as how long it takes to render graphs since those things involve sharing data.

You generally don't need to manage whisper databases yourself, carbon-cache does that for you. If you want to do custom rendering you can still get at the raw data through the webapp.

As for the volume of data I think you could get by with as many or as few graphite servers as you'd like. My current production system is a 2 node cluster with each server handling 1 datapoint per minute for approximately 300,000 metrics.

Hope that helps. If you do decide to try graphite out feel free to ask questions, I'd like to know if you run into any problems.

Revision history for this message
Steve Keller (skeller-ea) said :
#2

Hi Chris,

Thanks for the reply. I'm installing it on a test VM right now, if all goes well I will install on our dev Nagios environment.

The answer for the installation problem is timely for me, I ran into the exact same issues. New documentation would be very helpful...

In any case, I'm sure you will be hearing from me :)

Thanks again,
Steve Keller

Revision history for this message
Steve Keller (skeller-ea) said :
#3

Thanks chrismd, that solved my question.