random metrics not showing up in the graphite browser or whisper storage

Asked by Ahmed EL Zein

I am trying to keep track of the capacity usage as well as the number of files in each of the file shares I manage.

I have setup graphite with the following storage schema:
--------------
[carbon]
pattern = ^carbon\.
retentions = 60:90d

[default_1min_for_1day]
pattern = .*
retentions = 60s:1d

[hnas_stuff]
pattern = ^share\.hnas\.
retentions = 1d:3650d
---------------

I then have some perl code that creates connects to the carbon server via:
------
my $sock = IO::Socket::INET->new(
        PeerAddr => $carbon_server,
        PeerPort => 2003,
        Proto => 'tcp'
    )
----

I build up the stats in a hash and then iterate over the hash:
$sock->send("$key " . $stats->{$key}. " $time\n") ;

so each share has 2 keys. The keys look like:
share.hnas.fsact00.AAHL-SHARE.used
share.hnas.fsact00.AAHL-SHARE.files

My problem is that it is almost random as to which shares actually end up with 2 metrics in whisper. The number of shares that end up with all the metrics increase if I sleep for a second between socket send() calls. Some shares do not get any metrics at all.

I have been removing the whisper directory and re-running the script (with no sleep) and it seems to be creating exactly 49 files each run. with a one second sleep I get 139 files ( but I should get 160)

What am I doing wrong? I would really appreciate any help with this.

Question information

Language:
English Edit question
Status:
Answered
For:
Graphite Edit question
Assignee:
No assignee Edit question
Last query:
Last reply:
Revision history for this message
Aleksey Marin (asmadews) said :
#1

I faced with the same problem (with some differences) using Python sender. It looks like case carbon-cache can not manage with an amount of data arriving in short time period and simply drops some values&

Some workarounds can be done:
- Try to insert (a little) pause between updates.
- MAX_QUEUE_SIZE && MAX_DATAPOINTS_PER_MESSAGE && USE_FLOW_CONTROL in carbon.conf may be set for better performance
- You may decide to use more carbon-caches behind carbon-relay with appropriate settings

I had setup an pause betwen updates and had increased MAX_DATAPOINTS_PER_MESSAGE in confug. For now it looks like no data had lost since reconfiguration.

Revision history for this message
Ben Bonfil (bonfil) said :
#2

I know this is an old question but I'm posting here for people arriving from Google (like me)

carbon.conf has a setting called MAX_CREATES_PER_MINUTE which in my case was set to 50 by default.

This can prevent some of your metrics from being created after you remove the whisper files.

Can you help with this problem?

Provide an answer of your own, or ask Ahmed EL Zein for more information if necessary.

To post a message you must log in.