Limitation of Whisper sharding

Asked by Jason Dixon

Per the documentation, Whisper metrics must be "colocated" to specific carbon storage servers. Is this a limitation of carbon-relay or the web application? In other words, would the web application be able to collate the same metric from different servers if we use our own custom relay to distribute across storage servers?

Question information

Language:
English Edit question
Status:
Answered
For:
Graphite Edit question
Assignee:
No assignee Edit question
Last query:
Last reply:
Revision history for this message
Nicholas Leskiw (nleskiw) said :
#1

Each file must exist wholly on one carbon server. The files can be spread across many carbon servers, which your webapp frontend is configured to query for data.

Does this answer your question?

Jason Dixon <email address hidden> wrote:

>New question #178921 on Graphite:
>https://answers.launchpad.net/graphite/+question/178921
>
>Per the documentation, Whisper metrics must be "colocated" to specific carbon storage servers. Is this a limitation of carbon-relay or the web application? In other words, would the web application be able to collate the same metric from different servers if we use our own custom relay to distribute across storage servers?
>
>--
>You received this question notification because you are a member of
>graphite-dev, which is an answer contact for Graphite.
>
>_______________________________________________
>Mailing list: https://launchpad.net/~graphite-dev
>Post to : <email address hidden>
>Unsubscribe : https://launchpad.net/~graphite-dev
>More help : https://help.launchpad.net/ListHelp

Revision history for this message
Jason Dixon (jason-dixongroup) said :
#2

No, it didn't really answer my question. However I spoke with someone else who claims that the web app will "defer" to the first carbon who responds, rather than collating results from both. If true, this answers my question.

Revision history for this message
chrismd (chrismd) said :
#3

Here's how it works for whisper. When the webapp gets a request to render metric foo, it checks to see if foo exists locally and if it does that's all it uses (plus a cache query but thats not really relevant). If metric foo doesn't exist locally it broadcasts a 'find request' to all webapps in the cluster (the results are memcached). Once it knows which servers have metric foo, it requests the data from one webapp. The webapp cannot effectively collate whisper data from disparate sources because it would have to broadcast every request where one server might have a missing datapoint, it has no way to know if the other servers might have that datapoint. Caching only helps to a point, this would lead to a lot of shared (as opposed to distributed) load.

Here's how it works for ceres. When the webapp gets a request to render metric foo, it checks to see what intervals it has datapoints for metric foo for. If it doesn't have all the data locally it broadcasts a find request just as before but the results also include what intervals each server has datapoints for each metric, not just which metrics it has. Whisper doesn't make this data cheaply available, ceres does, and that is really the main difference between the two.

Can you help with this problem?

Provide an answer of your own, or ask Jason Dixon for more information if necessary.

To post a message you must log in.