Carbon and the web interface, multiple-server setup

Asked by Milan

Hello,

I came across Graphite when I was looking for monitoring tools written in Python, and it looks quite interesting. I need to collect data from several servers and generate various graphs for it.

My first question is: On http://graphite.wikidot.com/carbon, it reads that before generating the graphs, the carbon cache is checked for data not written to disk yet. Is this already implemented? I found the file webapp/web/render/carbonlink.py and associated settings (CARBONLINK_HOSTS etc), but it doesn't seem to be used anywhere in the webif.

The second question: How does running Graphite on more than one server work? I found no documentation, and wasn't quite able to find out what the scripts in misc/carbon-relay will do.
Can I run the carbon backend on multiple hosts and provide access to all backends via a single webif? The webif seems to access the whisper data files directly as far as I could see.

Regards,
Milan

Question information

Language:
English Edit question
Status:
Solved
For:
Graphite Edit question
Assignee:
No assignee Edit question
Solved by:
Milan
Solved:
Last query:
Last reply:
Revision history for this message
chrismd (chrismd) said :
#1

Geez... my apathy knows no bounds :)

So yes, the carbonlink feature *was* implemented (in the 0.9 release) however when I refactored the codebase for the 0.9.3 release this apparently got left out :(
The code is there, but the snippet to actually *use* it is not. I hadn't noticed all this time since we're actually running a fairly old version still. Thank you for finding this, I'm pretty embarrassed to have missed it myself. I am fixing the code right now and will publish a new release as soon as I verify it works again, so later this afternoon.

As for your second question the way that we run graphite on two servers is by using a clustered filesystem (VxFS with clustering extensions). This is because, as you noticed, the webapp directly reads from the filesystem. The plan was to do this for the short term (because network filesystems really can't scale well performance-wise) and eventually federate the webapp code such that it fetches data from other webapps running on other servers with independent storage volumes. I have not had time to start working on that yet though, so for now the solution would be to use a network filesystem (preferably something with better performance than NFS, intelligent caching and low latency is important). Then again if you are a python hacker I do not think that federating the webapp would take too much work, and if you're interested I'd be happy to give you some pointers.

Revision history for this message
Milan (public-mjh) said :
#2

Great to see that there's a new release with the CarbonLink thing fixed :)

Linking several webapps together sounds interesting, but I think that I won't be able to spend real time on Graphite until in 2 or 4 weeks or so, so'll check back then.

One last question though, I wonder what approach would fit best into Graphite. Fetch the data from the remote site and render locally?

Thanks for your work and feedback.

Regards,
Milan

Revision history for this message
chrismd (chrismd) said :
#3

Yes basically we'd just need to add a django view for fetching local data for a set of given metrics. So when a webapp receives a rendering request from a user it looks for the data locally and any metrics it does not have it requests from the appropriate neighbors (using the same algorithm as carbon-cache does). The rendering would then be done locally and the final aggregated data set and rendered image would be cached in memcached.

Revision history for this message
Mark Lin (wedney2004-craig1) said :
#4

Sorry for being lame, so what do we have in graphite today as far as multiple backend goes?

This is my understanding:

It works now by connecting multiple hosts via clustered filesystem, so they are writing to the same location. When the webapp needs the graph, it reads off that file system and cache the data locally.

Let's say we don't run clustered file system and metrics comes in via round robin fashion into multiple backend. Would graphite be able to query for data from every servers in CARBONLINK_HOSTS and aggregate the data together? I looked at carbon-cache.py, but don't know if "cache.get(query,[])" only looked at cached data or also local files.

And a few more questions, hope you don't mind, regarding MEMCACHE_HOSTS and REMOTE_RENDERING. If we use multiple entries in MEMCACHE_HOSTS, would webapp just do the hashing and spread cache data across listed memcache servers? And what do we need to install for server to be used in REMOTE_RENDERING? Just graphite? The comment mentioned that it's useful when we can share storage mount, so I assume the server receiving the request will fetch the data and pass it to rendering host?

Thanks for your time,
Mark

Revision history for this message
chrismd (chrismd) said :
#5

Don't worry you're not being lame, you've actually got impeccable timing! I just made a post on the Graphite wiki about this earlier today: see http://graphite.wikidot.com/

Essentially your assessment of how using multiple backends works is correct (with the current stable release). As mentioned in the wiki post, I am working on a federated storage model such that you can split your data across multiple machines and have them share data at the application level rather than at the filesystem level. One thing I would like to clarify about that though is that you will still need to decide which servers will be storing which metrics, you can't send any data points to just any server, they have to go to the server(s) that are supposed to have all the data for that metric. This will be facilitated by a new daemon I'm writing as a part of the upcoming release called carbon-data-router.py. This will actually be the daemon your clients send all of their data to and it will route the data points to the appropriate backends based on your configuration.

Regarding MEMCACHE_HOSTS, that is actually passed right along to Django, which does the caching for the webapp. The cached data does get spread across all the hosts listed, but the list must be the same on all servers (the order has to be the same too).

As for the REMOTE_RENDERING feature, I consider this one to be of questionable value for most people. I implemented it to solve a specific problem at Orbitz because of restrictions on the hardware we had available. Here is how it works:

Say you have server A running carbon, so it has all the actual data. But perhaps this server is an old Sun machine with lots of cores but really low clock speed and a shared FPU, so it is really really slow at rendering (this was the case at Orbitz). So imagine you have server B that is a fast x86 Linux machine that renders very quickly but for some silly reason isn't allowed to be connected to the fast storage array that the Sun machine is (ahem, Orbitz). This is where REMOTE_RENDERING is useful. On server A you put REMOTE_RENDERING = ["serverB"] and this will cause server A to proxy the requests it receives on to serverB instead of rendering them locally, however the key thing is that the proxied requests will have the data bundled along with them so serverB does not actually need to have access to the data itself. This may sound weird, and it is. There is really no *good* reason to be stuck in this situation, what should have happened was that we simply connect the fast x86 Linux machine to the fast disk array, but that was impossible for political reasons. Note that when using REMOTE_RENDERING, *all* rendering requests get proxied. Graphs are only rendered locally if the remote servers become unavailable or the remote request times out. While it might be useful to modify this functionality to delegate only some requests to the remote servers (to spread out load) I have never actually run into a situation in which a fast modern machine couldn't keep up with the rendering (assuming memcached is in use).

That said, REMOTE_RENDERING will become pretty much useless once federated storage is finished because you will be able to scale both the frontend and the backend horizontally by adding servers.

Revision history for this message
Mark Lin (wedney2004-craig1) said :
#6

Didn't expect to get such detail explanation. Thank you!

btw, I've implemented rabbitmq carbon agent mentioned in http://somic.org/2009/05/21/graphite-rabbitmq-integration/, and with Graphite's kickass UI, message bus metric gathering, and federated storage once it's implemented, this feels like a next generation ops tool. Keep up the good work!

Thanks,
Mark