carbon-cache memory endlessly increases

Asked by David Gillies

We've got the following setup on one server:

* 1x carbon-relay
* 3x carbon-caches behind the relay with consistent-hashing set

So when we start up the relay & 3 caches, memory usage isn't too bad and all the caches use approximately the same amount of memory and the cache size is roughly the same as well. But after a period of time, one or two (but not all) of the carbon-caches blow out in memory usage and the system starts tanking.

http://imgur.com/ZgGtT
http://imgur.com/Y8qbP

The weird part is that watching IO for each carbon-cache, it seems like at the moment, one of the caches (the one with the lowest memory usage) is starving the other two for IO. Is there any way of tuning things so that one carbon-cache doesn't starve the others for IO?

Currently I've got:

MAX_UPDATES_PER_SECOND = 500
MAX_CREATES_PER_MINUTE = 400

Would tuning those values down help? How can I be sure that the IOs can be shared evenly? Is that at all possible?

Question information

Language:
English Edit question
Status:
Answered
For:
Graphite Edit question
Assignee:
No assignee Edit question
Last query:
Last reply:
Revision history for this message
David Gillies (daveg) said :
#1

And here's a screenshot of the carbon-caches and the IO that one of them is gobbling up:

http://imgur.com/R32Yd

Revision history for this message
Brian Hatfield (bmhatfield) said :
#2

David,

Some bullet-point thoughts:

   - You can try adjusting the IO Scheduler on your boxes to determine if
   that helps prevent one cache from starving the others
   - You can try reducing the size of your caches to ensure that the caches
   attempt to write smaller chunks more frequently.
   - You can turn on the "WHISPER_AUTOFLUSH" option to see if you're
   hitting PDFlush barriers that are causing your system to have spiky IO and
   never recover

But ultimately,

   - You probably have more IO than your system can keep up with.
      - SSDs would help, but be aware that graphite is the worst-case
      pattern for SSD durability
      - Perhaps consider scaling-out your hosts further.

On Thu, Dec 13, 2012 at 1:20 AM, David Gillies <
<email address hidden>> wrote:

> Question #216675 on Graphite changed:
> https://answers.launchpad.net/graphite/+question/216675
>
> David Gillies gave more information on the question:
> And here's a screenshot of the carbon-caches and the IO that one of them
> is gobbling up:
>
> http://imgur.com/R32Yd
>
> --
> You received this question notification because you are a member of
> graphite-dev, which is an answer contact for Graphite.
>
> _______________________________________________
> Mailing list: https://launchpad.net/~graphite-dev
> Post to : <email address hidden>
> Unsubscribe : https://launchpad.net/~graphite-dev
> More help : https://help.launchpad.net/ListHelp
>

Can you help with this problem?

Provide an answer of your own, or ask David Gillies for more information if necessary.

To post a message you must log in.