gaps in graphs when viewing small time frames? Bug or User error?

Asked by AcidTonic

I have a weird issue where my graphs have gaps but only when viewing small time frames.

My retention data is the following.

[system]
priority = 100
pattern = ^system
retentions = 2:86400,60:43200,900:350400

I'm running an agent which is nearly identical to the example. It collects data twice a second and sends the list every 10 seconds. The agent is local to carbon so there is no network delay. I deleted the data from graphite and made the schema change before recreating the datapoints, so its not using a precreated file's schema.

After letting it run for a few days and coming back to build some graphs I find gaps when viewing small time frames which seems to change behavior depending on my graph data points. For some graphs its happening when displaying the last 30 minutes, some are gapping at anything below 150 minutes. They look like dotted lines and the more resolution I add the distance between the points gets larger. Likewise for looking at smaller amounts of time where the datapoints are further apart.

All of these graphs work great though if viewing even a single minute above a certain number. Like 167 shows ugly gaps, but anything 168 or higher is perfectly smooth. I am not running any transforms ( i am but after removing them it still happens) and simply finding this to be annoying since I have such precise data yet I can only view 2hrs worth in a graph which smooths out my precision.

I'm running the stable release from just a few days ago. (less than 2 weeks)

Question information

Language:
English Edit question
Status:
Answered
For:
Graphite Edit question
Assignee:
No assignee Edit question
Last query:
Last reply:
Revision history for this message
chrismd (chrismd) said :
#1

Are the gaps regular or sporadic? If they are regular it is almost certainly a mismatch between the schema configuration and the frequency of datapoints being sent. I would recommend logging the data you are sending to graphite so you can check that against the data showing up in your graphs.

The reason you see smooth graphs when viewing certain time spans is because when you are trying to view more data points than can fit in the pixels of your graph, graphite aggregates datapoints by averaging them together, which conceals the missing datapoints.

Revision history for this message
AcidTonic (peoriaguy87) said :
#2

I'm positive I'm sending data too fast. No way is it coming in slow. I
basically keep adding data to a list which is only cleared when it sends out
data. Now correct me if I'm wrong but I'm allowed to send data whenever I
like as long as it contains enough entries for the proper time frequency
correct?

Below is some output. I have a datapoint for every second. I even made it
raise an exception if the current time is more than 1 second later than the
last time. And my schema requires 2 second intervals and I'm proving at
least a 1 second interval on timing.

Data is saved in the list every 1 second and the full list is sent every 10
seconds. I cant explain why there is still gaps.

system.total_phymem 799375360 1264975477
system.avail_phymem 14131200 1264975477
system.used_phymem 785244160 1264975477
system.total_virtmem 279617536 1264975477
system.avail_virtmem 278466560 1264975477
system.used_virtmem 1150976 1264975477
system.cpu_util_percent 0.00 1264975477

system.total_phymem 799375360 1264975478
system.avail_phymem 14131200 1264975478
system.used_phymem 785244160 1264975478
system.total_virtmem 279617536 1264975478
system.avail_virtmem 278466560 1264975478
system.used_virtmem 1150976 1264975478
system.cpu_util_percent 0.00 1264975478

system.total_phymem 799375360 1264975479
system.avail_phymem 14131200 1264975479
system.used_phymem 785244160 1264975479
system.total_virtmem 279617536 1264975479
system.avail_virtmem 278466560 1264975479
system.used_virtmem 1150976 1264975479
system.cpu_util_percent 0.00 1264975479

system.total_phymem 799375360 1264975480
system.avail_phymem 14131200 1264975480
system.used_phymem 785244160 1264975480
system.total_virtmem 279617536 1264975480
system.avail_virtmem 278466560 1264975480
system.used_virtmem 1150976 1264975480
system.cpu_util_percent 0.00 1264975480

sending :147 updates
system.total_phymem 799375360 1264975481
system.avail_phymem 14135296 1264975481
system.used_phymem 785240064 1264975481
system.total_virtmem 279617536 1264975481
system.avail_virtmem 278466560 1264975481
system.used_virtmem 1150976 1264975481
system.cpu_util_percent 0.00 1264975481

system.total_phymem 799375360 1264975482
system.avail_phymem 14135296 1264975482
system.used_phymem 785240064 1264975482
system.total_virtmem 279617536 1264975482
system.avail_virtmem 278466560 1264975482
system.used_virtmem 1150976 1264975482
system.cpu_util_percent 0.00 1264975482

system.total_phymem 799375360 1264975483
system.avail_phymem 14139392 1264975483
system.used_phymem 785235968 1264975483
system.total_virtmem 279617536 1264975483
system.avail_virtmem 278466560 1264975483
system.used_virtmem 1150976 1264975483
system.cpu_util_percent 0.00 1264975483

system.total_phymem 799375360 1264975484
system.avail_phymem 14139392 1264975484
system.used_phymem 785235968 1264975484
system.total_virtmem 279617536 1264975484
system.avail_virtmem 278466560 1264975484
system.used_virtmem 1150976 1264975484
system.cpu_util_percent 0.00 1264975484

system.total_phymem 799375360 1264975485
system.avail_phymem 14086144 1264975485
system.used_phymem 785289216 1264975485
system.total_virtmem 279617536 1264975485
system.avail_virtmem 278466560 1264975485
system.used_virtmem 1150976 1264975485
system.cpu_util_percent 2.00 1264975485

system.total_phymem 799375360 1264975486
system.avail_phymem 14077952 1264975486
system.used_phymem 785297408 1264975486
system.total_virtmem 279617536 1264975486
system.avail_virtmem 278466560 1264975486
system.used_virtmem 1150976 1264975486
system.cpu_util_percent 2.00 1264975486

system.total_phymem 799375360 1264975487
system.avail_phymem 14086144 1264975487
system.used_phymem 785289216 1264975487
system.total_virtmem 279617536 1264975487
system.avail_virtmem 278466560 1264975487
system.used_virtmem 1150976 1264975487
system.cpu_util_percent 0.00 1264975487

system.total_phymem 799375360 1264975488
system.avail_phymem 14086144 1264975488
system.used_phymem 785289216 1264975488
system.total_virtmem 279617536 1264975488
system.avail_virtmem 278466560 1264975488
system.used_virtmem 1150976 1264975488
system.cpu_util_percent 0.00 1264975488

system.total_phymem 799375360 1264975489
system.avail_phymem 14086144 1264975489
system.used_phymem 785289216 1264975489
system.total_virtmem 279617536 1264975489
system.avail_virtmem 278466560 1264975489
system.used_virtmem 1150976 1264975489
system.cpu_util_percent 0.01 1264975489

On Thu, Feb 4, 2010 at 5:34 PM, chrismd <<email address hidden>
> wrote:

> Your question #99926 on Graphite changed:
> https://answers.launchpad.net/graphite/+question/99926
>
> Status: Open => Answered
>
> chrismd proposed the following answer:
> Are the gaps regular or sporadic? If they are regular it is almost
> certainly a mismatch between the schema configuration and the frequency
> of datapoints being sent. I would recommend logging the data you are
> sending to graphite so you can check that against the data showing up in
> your graphs.
>
> The reason you see smooth graphs when viewing certain time spans is
> because when you are trying to view more data points than can fit in the
> pixels of your graph, graphite aggregates datapoints by averaging them
> together, which conceals the missing datapoints.
>
> --
> If this answers your question, please go to the following page to let us
> know that it is solved:
> https://answers.launchpad.net/graphite/+question/99926/+confirm?answer_id=0
>
> If you still need help, you can reply to this email or go to the
> following page to enter your feedback:
> https://answers.launchpad.net/graphite/+question/99926
>
> You received this question notification because you are a direct
> subscriber of the question.
>

Revision history for this message
chrismd (chrismd) said :
#3

Hm... I think it would be beneficial to look at the raw data in the database.

Can you run the following:

whisper-fetch.py --from 1264975477 --until=1264975489 $GRAPHITE_ROOT/storage/whisper/system/cpu_util_percent.wsp

The output should reflect the data below (derived from your output above).

Value Timestamp
---------------------------
0.00 1264975477
0.00 1264975478
0.00 1264975479
0.00 1264975480
0.00 1264975481
0.00 1264975482
0.00 1264975483
0.00 1264975484
2.00 1264975485
2.00 1264975486
0.00 1264975487
0.00 1264975488
0.01 1264975489

If the output matches this data then the data is being stored properly and it must be a rendering issue (assuming graphs of this particular time span show gaps). If the output does not match this data then we know something is going wrong in the storage process, and an approximate timestamp of an error.

Revision history for this message
AcidTonic (peoriaguy87) said :
#4

it doesnt look as expected but I'm confused how it could be doing this...

1265224860 0.000500
1265224920 0.000500
1265224980 0.000000
1265225040 0.001500
1265225100 0.000000
1265225160 0.001500
1265225220 0.101500
1265225280 0.000500
1265225340 0.001000
1265225400 0.002000
1265225460 0.001000
1265225520 0.000000
1265225580 0.002500
1265225640 0.001500
1265225700 0.079000
1265225760 0.001500
1265225820 0.001500
1265225880 0.102500
1265225940 0.101500
1265226000 0.002000
1265226060 0.300500
1265226120 0.079000
1265226180 0.002500
1265226240 0.001000
1265226300 0.001000
1265226360 0.301000
1265226420 0.001500
1265226480 0.000000
1265226540 0.001500
1265226600 0.000500
1265226660 0.400500
1265226720 0.003000
1265226780 0.000000
1265226840 0.402000
1265226900 0.001000
1265226960 0.402500
1265227020 0.040500
1265227080 0.000000
1265227140 0.000500
1265227200 0.100500
1265227260 0.001000
1265227320 0.000000
1265227380 0.000500
1265227440 0.201000
1265227500 0.001500
1265227560 0.322500
1265227620 0.000000
1265227680 0.000500
1265227740 0.000000
1265227800 0.000000
1265227860 0.001500
1265227920 0.001500
1265227980 0.000000
1265228040 0.000000
1265228100 0.000500
1265228160 0.000000
1265228220 0.001000
1265228280 0.000500
1265228340 0.399000
1265228400 0.000500
1265228460 0.000000
1265228520 0.001000
1265228580 0.600500
1265228640 0.001000
1265228700 0.001500
1265228760 0.121500
1265228820 0.101500
1265228880 0.400000
1265228940 0.000500
1265229000 0.100000
1265229060 0.000500
1265229120 0.000000
1265229180 0.302000
1265229240 0.000000
1265229300 0.000500
1265229360 0.000000
1265229420 0.000000
1265229480 0.602500
1265229540 0.000000
1265229600 0.100000
1265229660 0.200000
1265229720 0.200500
1265229780 0.000000
1265229840 0.200000
1265229900 0.003000
1265229960 0.006000
1265230020 0.000500
1265230080 0.001000
1265230140 0.000000
1265230200 0.000000
1265230260 0.000000
1265230320 0.101500
1265230380 0.000000
1265230440 0.000000
1265230500 0.499500
1265230560 0.000000
1265230620 0.101000
1265230680 0.000000
1265230740 0.001500
1265230800 0.275000
1265230860 0.000000
1265230920 0.000000
1265230980 0.201500
1265231040 0.201500
1265231100 0.500500
1265231160 0.003500
1265231220 0.000500
1265231280 0.001000
1265231340 0.000000
1265231400 0.600500
1265231460 0.000000
1265231520 0.201000
1265231580 0.000000
1265231640 0.000000
1265231700 0.001000
1265231760 0.000000
1265231820 0.100500
1265231880 0.000000
1265231940 0.000500
1265232000 0.000000
1265232060 0.002000
1265232120 0.000000
1265232180 0.000500
1265232240 0.306500
1265232300 0.103500
1265232360 0.000000
1265232420 0.604000
1265232480 0.503000
1265232540 0.200500
1265232600 0.100000
1265232660 0.078500
1265232720 0.101500
1265232780 0.000000
1265232840 0.101500
1265232900 0.100500
1265232960 0.099000
1265233020 0.501000
1265233080 0.000500
1265233140 0.001000
1265233200 0.200500
1265233260 0.000000
1265233320 0.600500
1265233380 0.200000
1265233440 0.102000
1265233500 0.199500
1265233560 0.001000
1265233620 0.000000
1265233680 0.000000
1265233740 0.104000
1265233800 0.007000
1265233860 0.102000
1265233920 0.099500
1265233980 0.100000
1265234040 0.001500
1265234100 0.199500
1265234160 0.000000

That isnt all of it just a section.... Ok also here is my program that is
capturing data. This was just something to play around with and get to know
graphite, pardon the ugliness :)

#!/usr/bin/python

import sys
import time
from socket import socket

import psutil

import types

CARBON_SERVER = '127.0.0.1'
CARBON_PORT = 2003

data_delay = 0.5
transmit_delay = 5
last_transmit = 0

class Metrics():
    def __init__(self):
    self.setup_mongodb()

    def get_loadavg(self,lines, now):
    #We're gonna report all three loadavg values
    (loadavg_1, loadavg_5, loadavg_15) =
open('/proc/loadavg').read().strip().split()[:3]
    lines.append("system.loadavg_1min %s %d" % (loadavg_1,now))
    lines.append("system.loadavg_5min %s %d" % (loadavg_5,now))
    lines.append("system.loadavg_15min %s %d" % (loadavg_15,now))

    def get_memory(self,lines, now):
    #Physical
    lines.append("system.total_phymem %d %d" % (psutil.TOTAL_PHYMEM,now))
    lines.append("system.avail_phymem %d %d" % (psutil.avail_phymem(),now))
    lines.append("system.used_phymem %d %d" % (psutil.used_phymem(),now))

    #Virtual
    lines.append("system.total_virtmem %d %d" %
(psutil.total_virtmem(),now))
    lines.append("system.avail_virtmem %d %d" %
(psutil.avail_virtmem(),now))
    lines.append("system.used_virtmem %d %d" % (psutil.used_virtmem(),now))

    def get_cpu(self,lines, now):
    lines.append("system.cpu_util_percent %.2f %d" % (psutil.cpu_percent()
,now))

    def setup_mongodb(self):
    from pymongo import Connection
    self.con = Connection('127.0.0.1', 27017)
    self.db = self.con.devices

    def add_list(self,lines,now,results,node):
    for key in results:
        value = results[key]
        if type(value) == types.DictType or type(value) == types.ListType:
        self.add_list(lines,now,value,'%s.%s' % (node,key))
        else:
        lines.append("%s.%s %d %d" % (node,key,results[key],now))

    def get_mongodb(self,lines,now):
    results = self.db.command({'serverStatus':1})
    self.add_list(lines,now,results,'system.db.mongo')
    #{u'mem': {u'resident': 3, u'supported': True, u'virtual': 153,
u'mapped': 80}, u'uptime': 77237.0, u'ok': 1.0, u'globalLock':
{u'totalTime': 77236380176.0, u'lockTime': 849270.0, u'ratio':
1.0995725046471008e-05}}

sock = socket()
try:
  sock.connect( (CARBON_SERVER,CARBON_PORT) )
except:
  print "Couldn't connect to localhost on port %d, is carbon-agent.py
running?" % CARBON_PORT
  sys.exit(1)

linecount = 0
metrics = Metrics()
while True:
  now = int( time.time() )
  lines = []

  metrics.get_loadavg(lines,now)
  metrics.get_memory(lines,now)
  metrics.get_cpu(lines, now)
  metrics.get_mongodb(lines,now)

  linecount += len(lines)
  message = '\n'.join(lines) + '\n' #all lines must end in a newline
  print message
  if now - last_transmit > transmit_delay:
    print 'sending :%s updates' % linecount
    sock.sendall(message)
    linecount = 0
    last_transmit = now
  time.sleep(data_delay)

On Mon, Feb 8, 2010 at 1:20 AM, chrismd <<email address hidden>
> wrote:

> Your question #99926 on Graphite changed:
> https://answers.launchpad.net/graphite/+question/99926
>
> Status: Open => Needs information
>
> chrismd requested for more information:
> Hm... I think it would be beneficial to look at the raw data in the
> database.
>
> Can you run the following:
>
> whisper-fetch.py --from 1264975477 --until=1264975489
> $GRAPHITE_ROOT/storage/whisper/system/cpu_util_percent.wsp
>
> The output should reflect the data below (derived from your output
> above).
>
> Value Timestamp
> ---------------------------
> 0.00 1264975477
> 0.00 1264975478
> 0.00 1264975479
> 0.00 1264975480
> 0.00 1264975481
> 0.00 1264975482
> 0.00 1264975483
> 0.00 1264975484
> 2.00 1264975485
> 2.00 1264975486
> 0.00 1264975487
> 0.00 1264975488
> 0.01 1264975489
>
> If the output matches this data then the data is being stored properly
> and it must be a rendering issue (assuming graphs of this particular
> time span show gaps). If the output does not match this data then we
> know something is going wrong in the storage process, and an approximate
> timestamp of an error.
>
> --
> To answer this request for more information, you can either reply to
> this email or enter your reply at the following page:
> https://answers.launchpad.net/graphite/+question/99926
>
> You received this question notification because you are a direct
> subscriber of the question.
>

Revision history for this message
chrismd (chrismd) said :
#5

Interesting... the output of the whisper-fetch command shows that the data is being stored in 1-minute intervals so maybe your intended storage configuration was not in effect when the wsp file was created. You can run whisper-info.py on the same wsp file to verify. Then again it could just be that a lower precision archive covered the time range you queried.

Basically the approach I would take in debugging this is as follows:

First log the output of your script for a period of time to see what data it is sending.
Then run whisper-fetch on the wsp file to see what data was being actually stored for that period of time (you should specify the time range explicitly). I would recommend doing this in as close to realtime as possible to ensure only the highest precision archive in the database is involved.

Your script looks like it should work, collecting data every half a second, and sending everything every 5 seconds. Since your archive is only configured with 2-second precision you will only be actually storing 1/4th of your data points.

Revision history for this message
AcidTonic (peoriaguy87) said :
#6

Thanks for the support advice...

I rant the following command to double check I deleted everything when
making the schema change.

whisper-info.py storage/whisper/system/cpu_util_percent.wsp
maxRetention: 315360000
lastUpdate: 1264983378
xFilesFactor: 0.5
fileSize: 5760052

Archive 0
retention: 172800
secondsPerPoint: 2
points: 86400
size: 1036800
offset: 52

Archive 1
retention: 2592000
secondsPerPoint: 60
points: 43200
size: 518400
offset: 1036852

Archive 2
retention: 315360000
secondsPerPoint: 900
points: 350400
size: 4204800
offset: 1555252

So I really only want 2 second precision but I'm gathering every .5 to
really make sure I'm getting data there quick enough. The script truly is
sending data fast enough, any ideas?

I actually hit another bug trying to save graphs with cpu information. They
wont show up like my other graphs that I've saved. I can open a separate
issue if you'd like.

For now I've exposed my graphite page to the net so you can fiddle with it.
It's not production so I really dont care about the data.... Just dont
delete any other graphs I've saved and feel free to build your own for
testing.

My graphite url.... http://216.150.226.11:81/

Cheers,

On Mon, Feb 8, 2010 at 4:08 PM, chrismd <<email address hidden>
> wrote:

> Your question #99926 on Graphite changed:
> https://answers.launchpad.net/graphite/+question/99926
>
> Status: Open => Answered
>
> chrismd proposed the following answer:
> Interesting... the output of the whisper-fetch command shows that the
> data is being stored in 1-minute intervals so maybe your intended
> storage configuration was not in effect when the wsp file was created.
> You can run whisper-info.py on the same wsp file to verify. Then again
> it could just be that a lower precision archive covered the time range
> you queried.
>
> Basically the approach I would take in debugging this is as follows:
>
> First log the output of your script for a period of time to see what data
> it is sending.
> Then run whisper-fetch on the wsp file to see what data was being actually
> stored for that period of time (you should specify the time range
> explicitly). I would recommend doing this in as close to realtime as
> possible to ensure only the highest precision archive in the database is
> involved.
>
> Your script looks like it should work, collecting data every half a
> second, and sending everything every 5 seconds. Since your archive is
> only configured with 2-second precision you will only be actually
> storing 1/4th of your data points.
>
> --
> If this answers your question, please go to the following page to let us
> know that it is solved:
> https://answers.launchpad.net/graphite/+question/99926/+confirm?answer_id=4
>
> If you still need help, you can reply to this email or go to the
> following page to enter your feedback:
> https://answers.launchpad.net/graphite/+question/99926
>
> You received this question notification because you are a direct
> subscriber of the question.
>

Revision history for this message
chrismd (chrismd) said :
#7

So the problem is with the client script I believe. I ran it on my system and I saw that it was outputting metrics every 0.5 seconds (to my terminal) and also that my data had the same gaps as yours. Then I commented out the "print message" line near the bottom and saw that while every 0.5 seconds data is being collected, data is only sent once every 5 seconds (because of the transmit_delay setting), which explains the gaps in the graphs. If you set transmit_delay to 0.5 then the graphs are smooth.

As for your CPU info graphs they did not load for me either, it is because of a javascript exception caused by the presence of the % character in your graph's title. I can add some code to search and replace user inputs' % with the proper escape sequence to avoid this issue in the future, but for now the quick fix is to just not use %.

Revision history for this message
AcidTonic (peoriaguy87) said :
#8

Wait though..... I thought thats why carbon needed the time stamp? The
graphite page even somewhat advertises that as a feature for quickly
importing old data.

So I cant send data whenever I'd like as long as the time stamps match up?

My understanding was the graphs would have a slight delay until the
remaining data got there but the order and delay didn't matter.

Am I confused or is this a bug of some sort? If data needs to come over
right away then why do I need the time? The receiving side should just stamp
the current time on it and store then?

That is less functional than what I was expecting. What are your thoughts? I
planned to gather data from machines like laptops not currently on the
corporate network and send it all once they reconnect filling in the gaps.
Sounds like that wont work now.

On Tue, Feb 9, 2010 at 3:03 AM, chrismd <<email address hidden>
> wrote:

> Your question #99926 on Graphite changed:
> https://answers.launchpad.net/graphite/+question/99926
>
> Status: Open => Answered
>
> chrismd proposed the following answer:
> So the problem is with the client script I believe. I ran it on my
> system and I saw that it was outputting metrics every 0.5 seconds (to my
> terminal) and also that my data had the same gaps as yours. Then I
> commented out the "print message" line near the bottom and saw that
> while every 0.5 seconds data is being collected, data is only sent once
> every 5 seconds (because of the transmit_delay setting), which explains
> the gaps in the graphs. If you set transmit_delay to 0.5 then the graphs
> are smooth.
>
> As for your CPU info graphs they did not load for me either, it is
> because of a javascript exception caused by the presence of the %
> character in your graph's title. I can add some code to search and
> replace user inputs' % with the proper escape sequence to avoid this
> issue in the future, but for now the quick fix is to just not use %.
>
> --
> If this answers your question, please go to the following page to let us
> know that it is solved:
> https://answers.launchpad.net/graphite/+question/99926/+confirm?answer_id=6
>
> If you still need help, you can reply to this email or go to the
> following page to enter your feedback:
> https://answers.launchpad.net/graphite/+question/99926
>
> You received this question notification because you are a direct
> subscriber of the question.
>

Revision history for this message
chrismd (chrismd) said :
#9

No, I think you misunderstood what I was trying to say. Graphite does not care when data is received, only the timestamp you actually send with the value matters. The problem with the script is that if you look at the while loop at the bottom it starts with this:

lines = []

So each time the loop iterates, the lines list is initialized as an empty list. Then data gets put into the lines list. Then it checks to see if its been 5 seconds since the last transmission, if it hasn't the loop repeats. The next time around that lines = [] line means the previously collected data is lost and only the data collected for that loop iteration can be transmitted. So the data is being collected every 0.5 seconds, but all of it is discarded except one batch of data every 5 seconds. You can see this more clearly if you put a print statement stating when data is collected and another stating when data is sent (and if you look at the contents of what is sent the timestamps will be 5 seconds apart).

Revision history for this message
AcidTonic (peoriaguy87) said :
#10

So from your last reply regarding my script I thought you said it was
sending correctly just not often enough.

Now that I look at the script I indeed spotted my mistake. That one is
embarrasing. Somehow that was overlooked and now I feel bad wasting your
time for that.

So now that is fixed and you also answered my other question I'm moving
along quite nicely.

I also had a few other questions regarding the best way to retrieve a
listing from graphite. I want to dynamically fetch graphs in my application
to display on dashboard and reporting views. I wish to use this from a
django view and also a python daemon to somehow *ask* carbon/whisper what
datapoints it has similar to the tree view in your webui.

I was looking at your code and it seemed like the directory was crawled but
I'm wanting to make sure I use the correct method which can show all
datapoints across all connected carbon relays. I basically require this
functionality for detecting new devices sending datapoints so I can generate
alerts and make some django database rows. Besides walking the directory is
there a way to go through the python api?

Once again thanks for such a direct human to human troubleshooting process.
I'm ranking your support very high compared to other projects.

Cheers,
Zach Davis

On Wed, Feb 10, 2010 at 12:03 AM, chrismd <
<email address hidden>> wrote:

> Your question #99926 on Graphite changed:
> https://answers.launchpad.net/graphite/+question/99926
>
> Status: Open => Answered
>
> chrismd proposed the following answer:
> No, I think you misunderstood what I was trying to say. Graphite does
> not care when data is received, only the timestamp you actually send
> with the value matters. The problem with the script is that if you look
> at the while loop at the bottom it starts with this:
>
> lines = []
>
> So each time the loop iterates, the lines list is initialized as an
> empty list. Then data gets put into the lines list. Then it checks to
> see if its been 5 seconds since the last transmission, if it hasn't the
> loop repeats. The next time around that lines = [] line means the
> previously collected data is lost and only the data collected for that
> loop iteration can be transmitted. So the data is being collected every
> 0.5 seconds, but all of it is discarded except one batch of data every 5
> seconds. You can see this more clearly if you put a print statement
> stating when data is collected and another stating when data is sent
> (and if you look at the contents of what is sent the timestamps will be
> 5 seconds apart).
>
> --
> If this answers your question, please go to the following page to let us
> know that it is solved:
> https://answers.launchpad.net/graphite/+question/99926/+confirm?answer_id=8
>
> If you still need help, you can reply to this email or go to the
> following page to enter your feedback:
> https://answers.launchpad.net/graphite/+question/99926
>
> You received this question notification because you are a direct
> subscriber of the question.
>

Revision history for this message
chrismd (chrismd) said :
#11

Glad to help. If you want to search or navigate the metric hierarchy (especially when you have a clustered setup) you could do it locally on the graphite server by using the graphite.storage module (I really need to document this API, so here's a first stab at it).

bash$ export PYTHONPATH=/opt/graphite/webapp
bash$ export DJANGO_SETTINGS_MODULE=graphite.settings
bash$ python
>>> from django.conf import settings
>>> import time
>>> for metric in settings.STORE.find("servers.*.cpuUsage"):
... print metric.metric_path # servers.foo.cpuUsage
... print metric.fs_path # /opt/graphite/storage/whisper/servers/foo/cpuUsage.wsp
... print metric.name # cpuUsage
... print metric.isLeaf() # True, which means we can fetch() data
... (timeInfo, values) metric.fetch(startTime, endTime) # both unix epoch times
... (start, end, step) = timeInfo
... t = start
... for value in values:
... print time.ctime(), value
... t += step

This will work even in a clustered configuration.

If you need to lookup metrics from a remote machine however there is a HTTP API as well. Basically you can GET /metrics/?query=servers.*.cpuUsage&format=pickle (or format=treejson for a JSON object that can be used by an Ext tree). Hope that helps.

Revision history for this message
AcidTonic (peoriaguy87) said :
#12

Perfect! This is exactly what I wanted. Now I'm glad I asked since I would
have wasted a lot of time hacking around directories.

A few other questions come from your post.

How do i do a STORE.find which will match anything and everything? I tried
using some regex but i kept getting 0 results. *.* would show me the 2nd
level for everything, I just want everything similar to an ls. I found
find_all but it too acts weird and wont show me everything.

Also i wanted to know if there is a similar option to metric.fetch? I want a
way to grab everything. I tried really small/large values hoping to get
everything but it doesnt work. Is there an easy way to get the last X
datapoints? (with a way to get everything I'll just write one if you haven't
already)

This little undocumented api is stunning. Exactly what i was going to
reproduce on my own, now I can get back to my side of the project :)

I am using extjs for my app also, I've wanted to allow my users to craft
their own graphs. Any tips on getting the composer switched to an extjs
fixed sized panel instead of the current window design? I plan to tweak it
and embed it inside my webapp. I'm picking up extjs for this project and
have some basic JSON data grids working, and panels which load fixed sized
graphite graphs.

Of course you'll hear back from me once I'm ready to release any changes I
made to graphite back over for you guys.

Cheers,
Zach

On Thu, Feb 11, 2010 at 5:07 PM, chrismd <
<email address hidden>> wrote:

> Your question #99926 on Graphite changed:
> https://answers.launchpad.net/graphite/+question/99926
>
> Status: Open => Answered
>
> chrismd proposed the following answer:
> Glad to help. If you want to search or navigate the metric hierarchy
> (especially when you have a clustered setup) you could do it locally on
> the graphite server by using the graphite.storage module (I really need
> to document this API, so here's a first stab at it).
>
> bash$ export PYTHONPATH=/opt/graphite/webapp
> bash$ export DJANGO_SETTINGS_MODULE=graphite.settings
> bash$ python
> >>> from django.conf import settings
> >>> import time
> >>> for metric in settings.STORE.find("servers.*.cpuUsage"):
> ... print metric.metric_path # servers.foo.cpuUsage
> ... print metric.fs_path #
> /opt/graphite/storage/whisper/servers/foo/cpuUsage.wsp
> ... print metric.name # cpuUsage
> ... print metric.isLeaf() # True, which means we can fetch() data
> ... (timeInfo, values) metric.fetch(startTime, endTime) # both unix
> epoch times
> ... (start, end, step) = timeInfo
> ... t = start
> ... for value in values:
> ... print time.ctime(), value
> ... t += step
>
> This will work even in a clustered configuration.
>
> If you need to lookup metrics from a remote machine however there is a
> HTTP API as well. Basically you can GET
> /metrics/?query=servers.*.cpuUsage&format=pickle (or format=treejson for
> a JSON object that can be used by an Ext tree). Hope that helps.
>
> --
> If this answers your question, please go to the following page to let us
> know that it is solved:
>
> https://answers.launchpad.net/graphite/+question/99926/+confirm?answer_id=10
>
> If you still need help, you can reply to this email or go to the
> following page to enter your feedback:
> https://answers.launchpad.net/graphite/+question/99926
>
> You received this question notification because you are a direct
> subscriber of the question.
>

Revision history for this message
chrismd (chrismd) said :
#13

Unfortunately no the Store API doesn't let you iterate everything currently. There are a few ways you could go about doing that, os.walk, Store.find('*') + Store.find('*.*') + ..., or better yet just use the index file. So yet another undocumented feature is that there is a script in graphite trunk, misc/rebuild_index.sh. Edit it to specify your graphite installation location then run it and it will generate a file $GRAPHITE_ROOT/storage/index that will be a simple text listing of all of your metrics. That would probably be the cheapest way to iterate everything, assuming your data doesn't change often or you rebuild the index often. You can easily lookup the index file in your code by using the INDEX_FILE setting.

With metric.fetch() however there is a simple way to fetch everything, simply specify a startTime of 0 and an endTime of time.time(). You can also directly use the whisper.fetch() library call the same way if you don't end up using the Store API.

As for your new Ext UI, I've made many different UI's for graphite using Ext and it is pretty straight forward. One bit of code you might find useful to borrow from Graphite is the ParameterizedURL prototype defined in webapp/content/js/composer.js. It gives you simple methods for manipulating your graph's url parameters.

I've been thinking about redoing Graphite's Composer UI myself lately, I've made some other UI's on another project that I think work better. So I would definitely be interested in seeing what you come up with if you are at liberty to release it.

Revision history for this message
AcidTonic (peoriaguy87) said :
#14

I would be happy to release it. Even though I'm writing something that will
be closed with a possible open source foundation, I plan to release all
additions to 3rd party tools/libraries to the owners back under their
license terms.

It's the least I can do for such a generous license on your part, and the
time it saves me from wasting :)

I'm no extjs guru, I spent about 1 week making my first JSON grid to work. I
still have many issues picking it up, i cant seem to figure out an easy way
to debug it.... if i mess up the whole widget just never renders and it
makes extjs a little tough. Using firebug and whatnot but it doesnt always
generate errors.

On Fri, Feb 12, 2010 at 12:35 AM, chrismd <
<email address hidden>> wrote:

> Your question #99926 on Graphite changed:
> https://answers.launchpad.net/graphite/+question/99926
>
> Status: Open => Answered
>
> chrismd proposed the following answer:
> Unfortunately no the Store API doesn't let you iterate everything
> currently. There are a few ways you could go about doing that, os.walk,
> Store.find('*') + Store.find('*.*') + ..., or better yet just use the
> index file. So yet another undocumented feature is that there is a
> script in graphite trunk, misc/rebuild_index.sh. Edit it to specify your
> graphite installation location then run it and it will generate a file
> $GRAPHITE_ROOT/storage/index that will be a simple text listing of all
> of your metrics. That would probably be the cheapest way to iterate
> everything, assuming your data doesn't change often or you rebuild the
> index often. You can easily lookup the index file in your code by using
> the INDEX_FILE setting.
>
> With metric.fetch() however there is a simple way to fetch everything,
> simply specify a startTime of 0 and an endTime of time.time(). You can
> also directly use the whisper.fetch() library call the same way if you
> don't end up using the Store API.
>
> As for your new Ext UI, I've made many different UI's for graphite using
> Ext and it is pretty straight forward. One bit of code you might find
> useful to borrow from Graphite is the ParameterizedURL prototype defined
> in webapp/content/js/composer.js. It gives you simple methods for
> manipulating your graph's url parameters.
>
> I've been thinking about redoing Graphite's Composer UI myself lately,
> I've made some other UI's on another project that I think work better.
> So I would definitely be interested in seeing what you come up with if
> you are at liberty to release it.
>
> --
> If this answers your question, please go to the following page to let us
> know that it is solved:
>
> https://answers.launchpad.net/graphite/+question/99926/+confirm?answer_id=12
>
> If you still need help, you can reply to this email or go to the
> following page to enter your feedback:
> https://answers.launchpad.net/graphite/+question/99926
>
> You received this question notification because you are a direct
> subscriber of the question.
>

Revision history for this message
AcidTonic (peoriaguy87) said :
#15

I would be happy to release it. Even though I'm writing something that will
be closed with a possible open source foundation, I plan to release all
additions to 3rd party tools/libraries to the owners back under their
license terms.

It's the least I can do....

On Fri, Feb 12, 2010 at 12:35 AM, chrismd <
<email address hidden>> wrote:

> Your question #99926 on Graphite changed:
> https://answers.launchpad.net/graphite/+question/99926
>
> Status: Open => Answered
>
> chrismd proposed the following answer:
> Unfortunately no the Store API doesn't let you iterate everything
> currently. There are a few ways you could go about doing that, os.walk,
> Store.find('*') + Store.find('*.*') + ..., or better yet just use the
> index file. So yet another undocumented feature is that there is a
> script in graphite trunk, misc/rebuild_index.sh. Edit it to specify your
> graphite installation location then run it and it will generate a file
> $GRAPHITE_ROOT/storage/index that will be a simple text listing of all
> of your metrics. That would probably be the cheapest way to iterate
> everything, assuming your data doesn't change often or you rebuild the
> index often. You can easily lookup the index file in your code by using
> the INDEX_FILE setting.
>
> With metric.fetch() however there is a simple way to fetch everything,
> simply specify a startTime of 0 and an endTime of time.time(). You can
> also directly use the whisper.fetch() library call the same way if you
> don't end up using the Store API.
>
> As for your new Ext UI, I've made many different UI's for graphite using
> Ext and it is pretty straight forward. One bit of code you might find
> useful to borrow from Graphite is the ParameterizedURL prototype defined
> in webapp/content/js/composer.js. It gives you simple methods for
> manipulating your graph's url parameters.
>
> I've been thinking about redoing Graphite's Composer UI myself lately,
> I've made some other UI's on another project that I think work better.
> So I would definitely be interested in seeing what you come up with if
> you are at liberty to release it.
>
> --
> If this answers your question, please go to the following page to let us
> know that it is solved:
>
> https://answers.launchpad.net/graphite/+question/99926/+confirm?answer_id=12
>
> If you still need help, you can reply to this email or go to the
> following page to enter your feedback:
> https://answers.launchpad.net/graphite/+question/99926
>
> You received this question notification because you are a direct
> subscriber of the question.
>

Revision history for this message
AcidTonic (peoriaguy87) said :
#16

Can I reorganize .wsp files by moving them and sending data to the new
location?

Is there anything already developed around sorting incoming data in some
way.... Such as handling machines that move or change hostnames? I'd like
to move the data to the new spot when the hostname or ip changes. Honoring
the new hostnames regex in the schemas.conf. Data would be moved but reduced
to the precision of the new names schema match, then new data would go there
as usual as if it never moved.

I also wish to trap the create event so I can make django database rows for
the new device and trigger any alerts. I see I could use the api you sent
previously to recurse through the datapoints looking for new ones but it
would be much more efficient to be notified resource wise.

Any thoughts?

On Fri, Feb 12, 2010 at 1:03 PM, AcidTonic <
<email address hidden>> wrote:

> Your question #99926 on Graphite changed:
> https://answers.launchpad.net/graphite/+question/99926
>
> You gave more information on the question:
> I would be happy to release it. Even though I'm writing something that will
> be closed with a possible open source foundation, I plan to release all
> additions to 3rd party tools/libraries to the owners back under their
> license terms.
>
> It's the least I can do....
>
> On Fri, Feb 12, 2010 at 12:35 AM, chrismd <
> <email address hidden>> wrote:
>
> > Your question #99926 on Graphite changed:
> > https://answers.launchpad.net/graphite/+question/99926
> >
> > Status: Open => Answered
> >
> > chrismd proposed the following answer:
> > Unfortunately no the Store API doesn't let you iterate everything
> > currently. There are a few ways you could go about doing that, os.walk,
> > Store.find('*') + Store.find('*.*') + ..., or better yet just use the
> > index file. So yet another undocumented feature is that there is a
> > script in graphite trunk, misc/rebuild_index.sh. Edit it to specify your
> > graphite installation location then run it and it will generate a file
> > $GRAPHITE_ROOT/storage/index that will be a simple text listing of all
> > of your metrics. That would probably be the cheapest way to iterate
> > everything, assuming your data doesn't change often or you rebuild the
> > index often. You can easily lookup the index file in your code by using
> > the INDEX_FILE setting.
> >
> > With metric.fetch() however there is a simple way to fetch everything,
> > simply specify a startTime of 0 and an endTime of time.time(). You can
> > also directly use the whisper.fetch() library call the same way if you
> > don't end up using the Store API.
> >
> > As for your new Ext UI, I've made many different UI's for graphite using
> > Ext and it is pretty straight forward. One bit of code you might find
> > useful to borrow from Graphite is the ParameterizedURL prototype defined
> > in webapp/content/js/composer.js. It gives you simple methods for
> > manipulating your graph's url parameters.
> >
> > I've been thinking about redoing Graphite's Composer UI myself lately,
> > I've made some other UI's on another project that I think work better.
> > So I would definitely be interested in seeing what you come up with if
> > you are at liberty to release it.
> >
> > --
> > If this answers your question, please go to the following page to let us
> > know that it is solved:
> >
> >
> https://answers.launchpad.net/graphite/+question/99926/+confirm?answer_id=12
> >
> > If you still need help, you can reply to this email or go to the
> > following page to enter your feedback:
> > https://answers.launchpad.net/graphite/+question/99926
> >
> > You received this question notification because you are a direct
> > subscriber of the question.
> >
>
> --
> You received this question notification because you are a direct
> subscriber of the question.
>

Revision history for this message
chrismd (chrismd) said :
#17

Yes you can move wsp files around all you want. Since you want to change their retention schema at the same time what you should actually do is resize the files first (using the whisper-resize.py script under whisper/bin/ in trunk), then move them.

As for sending events when new metrics are created, that is an *excellent* idea. Carbon now supports AMQP for receiving data and I think it would also serve as a very natural mechanism for events like this. Unfortunately the only "event-driven" way to do this now would be to tail the storage/log/carbon-cache/creates.log file.

Revision history for this message
AcidTonic (peoriaguy87) said :
#18

I'd propose a hook mechanism through the python graphite API. Importing the
API then calling some function to register my script as a listener. Then I
can basically define a callback to be called with the first datapoint and
it's schema object, first value, and time as arguments. My callback could
then connect into my django db and make my models for the new data and make
the graph models show up on my users dashboard.

Thoughts? I might be interested in assisting with this feature should you
move forward.

On Mon, Feb 15, 2010 at 2:38 AM, chrismd <
<email address hidden>> wrote:

> Your question #99926 on Graphite changed:
> https://answers.launchpad.net/graphite/+question/99926
>
> Status: Open => Answered
>
> chrismd proposed the following answer:
> Yes you can move wsp files around all you want. Since you want to change
> their retention schema at the same time what you should actually do is
> resize the files first (using the whisper-resize.py script under
> whisper/bin/ in trunk), then move them.
>
> As for sending events when new metrics are created, that is an
> *excellent* idea. Carbon now supports AMQP for receiving data and I
> think it would also serve as a very natural mechanism for events like
> this. Unfortunately the only "event-driven" way to do this now would be
> to tail the storage/log/carbon-cache/creates.log file.
>
> --
> If this answers your question, please go to the following page to let us
> know that it is solved:
>
> https://answers.launchpad.net/graphite/+question/99926/+confirm?answer_id=16
>
> If you still need help, you can reply to this email or go to the
> following page to enter your feedback:
> https://answers.launchpad.net/graphite/+question/99926
>
> You received this question notification because you are a direct
> subscriber of the question.
>

Revision history for this message
chrismd (chrismd) said :
#19

I've created Bug #526872 to track this feature request

Can you help with this problem?

Provide an answer of your own, or ask AcidTonic for more information if necessary.

To post a message you must log in.