Graphite shows “None” for all data points even though I send it data [metric created, data ignored]

Asked by Jakub Holy

I have installed Graphite via Puppet (https://forge.puppetlabs.com/dwerder/graphite) with nginx and PostgresSQL. When I send it data manually, it creates the metric but all its data points are "None". This happens also if I run the example-client.py shipped with Graphite.

echo "jakub.test 42 $(date +%s)" | nc 0.0.0.0 2003 # Carbon listens at 2003
# A minute or so later:
$ whisper-fetch.py --pretty /opt/graphite/storage/whisper/jakub/test.wsp | head -n1
Sun May 4 12:19:00 2014 None
$ whisper-fetch.py --pretty /opt/graphite/storage/whisper/jakub/test.wsp | tail -n1
Mon May 5 12:09:00 2014 None
$ whisper-fetch.py --pretty /opt/graphite/storage/whisper/jakub/test.wsp | grep -v None | wc -l
0

And:

$ python /opt/graphite/examples/example-client.py
# Wait until it sends two batches of data ...
$ whisper-fetch.py /opt/graphite/storage/whisper/system/loadavg_15min.wsp | grep -v None | wc -l
0

This is, according to ngrep, the data that arrives to the port [from a later attempt] (line 3):

####
T 127.0.0.1:34696 -> 127.0.0.1:2003 [AP]
  jakub.test 45 1399362193.
####^Cexit
23 received, 0 dropped

I have installed Graphite via Puppet (https://forge.puppetlabs.com/dwerder/graphite) with nginx and PostgresSQL. When I send it data manually, it creates the metric but all its data points are "None" (a.k.a. null). This happens also if I run the example-client.py shipped with Graphite.

echo "jakub.test 42 $(date +%s)" | nc 0.0.0.0 2003 # Carbon listens at 2003
# A minute or so later:
$ whisper-fetch.py --pretty /opt/graphite/storage/whisper/jakub/test.wsp | head -n1
Sun May 4 12:19:00 2014 None
$ whisper-fetch.py --pretty /opt/graphite/storage/whisper/jakub/test.wsp | tail -n1
Mon May 5 12:09:00 2014 None
$ whisper-fetch.py --pretty /opt/graphite/storage/whisper/jakub/test.wsp | grep -v None | wc -l
0

And:

$ python /opt/graphite/examples/example-client.py
# Wait until it sends two batches of data ...
$ whisper-fetch.py /opt/graphite/storage/whisper/system/loadavg_15min.wsp | grep -v None | wc -l
0

This is, according to ngrep, the data that arrives to the port [from a later attempt] (line 3):

####
T 127.0.0.1:34696 -> 127.0.0.1:2003 [AP]
  jakub.test 45 1399362193.
####^Cexit
23 received, 0 dropped

Any idea what is wrong? Carbon's own metrics and data are displayed in the UI. Thank you!

Environment: Ubuntu 13.10 Saucy, graphite 0.9.12 (via pip).

Question information

Language:
English Edit question
Status:
Solved
For:
Graphite Edit question
Assignee:
No assignee Edit question
Solved by:
Jakub Holy
Solved:
Last query:
Last reply:
Revision history for this message
Jakub Holy (maly-velky) said :
#1

So changing the default retention from "1s:30m,1m:1d,5m:2y" to "1m:14d" does "fix" it - though this is not really an acceptable workaround, I want to keep the 1s granularity too. And I would still be happy to get tips for troubleshooting problems like this. Solution is good, but knowing how to troubleshoot problems is better.

Revision history for this message
Jakub Holy (maly-velky) said :
#2

There seems to be a problem with the N seconds period - while `1m:1d,5m:2y` works (data recored), `10s:30m,1m:1d,5m:2y` does not. Actually, from the .wsp file it seems that granularity < 1m is ignored since timestamps for the 10s:... config are still at 1 min intervals -

Wed May 7 08:17:00 2014 None
Wed May 7 08:18:00 2014 None
...

Revision history for this message
Jakub Holy (maly-velky) said :
#3

OK, so the problem is related to the aggregation policy and xFilesFactor, the (default) that applies here being average and xFilesFactor=0.5 (see /opt/graphite/conf/storage-aggregation.conf).

When I change to sum and xFilesFactor=0.1 by changing the name, the data gets stored (though the points are still at 1m freq):

$ echo -e "jakub.test.10s30m+1m1d+5m2y.count 42 $(date +%s)" | nc 0.0.0.0 2003
$ whisper-fetch.py --pretty jakub/test/10s30m+1m1d+5m2y/count.wsp | tail -n7
Wed May 7 08:43:00 2014 None
Wed May 7 08:44:00 2014 42.000000
Wed May 7 08:45:00 2014 None
Wed May 7 08:46:00 2014 None
Wed May 7 08:47:00 2014 None
Wed May 7 08:48:00 2014 None
Wed May 7 08:49:00 2014 None

Why is this still stored only at the 1m precision even though the schema asks for 10s? Why is the data ignored when using average and factor 0.5 but is stored with sum and factor 0.1?

Revision history for this message
Jakub Holy (maly-velky) said :
#4

Update: with the schema `10s:30m,1m:1d,5m:2y` and aggregation method and xFilesFactor:

agg = average, factor = 0.1 => data stored
agg = last, factor = 0.1 => data stored
agg = average, factor = 0.5 => data ignored

So xFilesFactor seems to be the important thing here.

Revision history for this message
Jakub Holy (maly-velky) said :
#5

I guess it is related to this info from the docs:

"""
xFilesFactor should be a floating point number between 0 and 1, and specifies what fraction of the previous retention level’s slots must have non-null values in order to aggregate to a non-null value. The default is 0.5.
"""

So it seems that the main question remaining is: why there are ever only data points at 1m intervals even though the highest precision in storage-schemas.conf is 1s?

For reference, this is the -schemas.conf:

    [carbon]
    pattern = ^carbon\.
    retentions = 1m:90d

    [default]
    pattern = .*
    retentions = 1s:30m,1m:1d,5m:2y

Revision history for this message
Dave Rawks (drawks.) said :
#6

Are you creating new wsp files after changing your configs? The schema of the wsp is not dynamic...

Revision history for this message
Jakub Holy (maly-velky) said :
#7

@drawks Yes, I am using a completely new metric (path) so it gets new wsp files. Anyway, my original config was with "1s:30m,1m:1d,5m:2y" and yet data points were only recorded at 1m intervals. So there is some problem somewhere not related to how I later played with the settngs.

Revision history for this message
Jakub Holy (maly-velky) said :
#8

SOLUTION

So @jlawrie at StackOverflow lead me to the solution. It turns out the data are actually there but are aggregated to nothing, the reason is double:

1. Both the UI and whisper-fetch show data aggregated to the highest precision that spans the whole query period, which defaults to 24h. I.e. anything with retention < 1d will never show in the UI or fetch unless you select a shorter period. Since my retention period for 1s was 30min, I'd need to select period of <= last 30 min to actually see the raw data at the highest precision being collected.
2. When aggregating data (from 1s to 1min in my case), Graphite requires by default that 50% (xFilesFactor = 0.5) of data points in the period have value. If not, it will ignore the existing values and aggregate it to None. So in my case I'd need to send data at least 30 times within a minute (30 is 50% of 60s = 1min) for them to show up in the aggregated 1-min value. But my app only sends data every 10s so I only have 6 out of the possible 60 values.

=> solution is to change the first precision from 1s to 10s and remember to select a shorter period when I want to see the raw data (or extend its retention to 24h to show it by default).