Graphite

how to handle high-volume data?

Asked by Daniel Lawrence on 2011-09-06

I am currently playing with graphite to get information some of the devices that i look after and running into some troubles with graphite dealing with high-volume data.

Is anyone able to point be at some documentation about dealing with high-volume data, this is not high-frequency data ( only once every 10 minutes ), the problem is that each device has 5,000 different metrics that i would like to capture ( every 10 minutes )

With only handful of the devices it turns into ~50,00-75,000 metrics very quickly.

Question information

Language:: English Edit question

Status:: Answered

For:: Graphite Edit question

Assignee:: No assignee Edit question

Last query:: 2011-09-07

Last reply:: 2011-09-09

Link existing bug

Revision history for this message

chrismd (chrismd) said on 2011-09-07:

Hi Daniel, our docs are a work in progress. You can always find the latest at http://graphite.readthedocs.org/

We do not currently have a single doc focussed on performance tuning but I would be glad to help you out. I've worked on Graphite systems that sustain close to a million datapoints per minute on a single machine so I'm sure there are some simple tweaks we can make to get you better performance. First a few questions:

1) what behavior are you seeing now that you find to be poor performance?
2) what is your performance target?
3) how are you feeding your metrics to carbon? please give as much connection-level detail as you can give here about how your clients send data

Revision history for this message

Daniel Lawrence (dannyla) said on 2011-09-07:

I think the answer is going to be around the caching for bulk updates, or how i am sending the data to carbon. as i havn't come across anything that slapped me in the face to say use the following connection type.

1) The poor performance presents it self as high iowait on the system ( about 80% in some cases ) this leads to a host of other performance issues because of the io blocking.

2) I don't really have a performance target at the moment, If i can get more of the appliance data into graphite i'll be happy say 200,000 - 250,000 points over 10 minute.

3) I am using the following bulkupdate code from https://github.com/daniellawrence/carbonclient

to summarize:

I am using a basic python socket connection to interact with the plain-text listener.
I cut up the 10,000 datapoints into 500 lines in one socket connection

socket = socket.socket()
socket.connect( ('carbon.example.com', 2003) )
socket.sendall( 500-line-string )
time.sleep(.1)
socket.sendall( 500-line-string )

Revision history for this message

chrismd (chrismd) said on 2011-09-09:

There is an easy fix for #1, in the [cache] section of carbon.conf there is a setting called MAX_UPDATES_PER_SECOND which I think is set too high by default as it is causing your I/O wait issue. I'll make sure to lower it in the next release. What this setting does is to rate-limit the write operations performed by carbon. This may seem counter-intuitive as you'd think faster would be better but the problem that creates is an excessive number of non-sequential I/O requests which slows down everything. Your disks just constantly seek to write a datapoint to every single wsp file. Try a value of 500 (the unit is writes per second) and see how that goes. Note that this does not mean carbon will lag behind, it just means that it will rely more on carbon's caching and bulk writing behavior. Ideally you want your disk to be busy enough you don't have a huge cache eating up all your memory but low enough that the disks aren't going nuts.

#2 is definitely doable.

For #3, that looks fine, using a single persistent connection instead of spawning many short-lived connections is definitely the way to go. If you find that carbon-cache is CPU-bound then you can probably reduce CPU load by switching to the pickle protocol. For that just use port 2004 and do the following:

#assume data = [(metric, datapoint), (metric, datapoint), ...]
# where metric is a string metric name
# and datapoint is (timestamp, value) both floats
import struct
import cPickle
serialized_data = cPickle.dumps(data, protocol=-1)
header = struct.pack("!L", len(serialized_data))
my_socket.sendall(header + serialized_data)

Note that the performance gain here is somewhat dependent on having reasonably large lists of datapoints not a bunch of small lists sent separately. I'd suggest aiming for 500 or so datapoints per message.

Can you help with this problem?

Provide an answer of your own, or ask Daniel Lawrence for more information if necessary.

To post a message you must log in.

Ask a question

Edit question

Graphite

how to handle high-volume data?

Question information

Related bugs

Related FAQ:

Can you help with this problem?

Subscribers