Determine why a line is invalid

Asked by Barry Morrison

I _think_ my data is in the correct format, in fact I have had one successful import.

but in /listener.log

I have this:

30/12/2011 15:04:10 :: invalid line received from client 127.0.0.1:40247, ignoring
30/12/2011 15:04:10 :: invalid line: 'us.production.mind.requests.total 50301 1301209200\n'
30/12/2011 15:04:20 :: invalid line received from client 127.0.0.1:40247, ignoring
30/12/2011 15:04:20 :: invalid line: 'us.production.mind.requests.total 50301 1301209200\n'
30/12/2011 15:04:30 :: invalid line received from client 127.0.0.1:40247, ignoring
30/12/2011 15:04:30 :: invalid line: 'us.production.mind.requests.total 50301 1301209200\n'
30/12/2011 15:04:40 :: invalid line received from client 127.0.0.1:40247, ignoring
30/12/2011 15:04:40 :: invalid line: 'us.production.mind.requests.total 50301 1301209200\n'
30/12/2011 15:04:50 :: invalid line received from client 127.0.0.1:40247, ignoring
30/12/2011 15:04:50 :: invalid line: 'us.production.mind.requests.total 50301 1301209200\n'

Here is what I'm sending it:

['us.production.mind.requests.total 50301 1301209200\n', 'us.production.mind.requests.total 50348 1301209260\n', 'us.production.mind.requests.total 48798 1301209320\n', 'us.production.mind.requests.total 49425 1301209380\n', 'us.production.mind.requests.total 47898 1301209440\n', 'us.production.mind.requests.total 47486 1301209500\n', 'us.production.mind.requests.total 47378 1301209560\n', 'us.production.mind.requests.total 49055 1301209620\n', 'us.production.mind.requests.total 48428 1301209680\n', 'us.production.mind.requests.total 48662 1301209740\n', 'us.production.mind.requests.total 48693 1301209800\n', 'us.production.mind.requests.total 48561 1301209860\n', 'us.production.mind.requests.total 43759 1301209920\n', 'us.production.mind.requests.total 48766 1301209980\n', 'us.production.mind.requests.total 48537 1301210040\n', 'us.production.mind.requests.total 47781 1301210100\n', 'us.production.mind.requests.total 47507 1301210160\n', 'us.production.mind.requests.total 48723 1301210220\n', 'us.production.mind.requests.total 48498 1301210280\n', 'us.production.mind.requests.total 48767 1301210340\n', 'us.production.mind.requests.total 47721 1301210400\n', 'us.production.mind.requests.total 48451 1301210460\n', 'us.production.mind.requests.total 47160 1301210520\n', 'us.production.mind.requests.total 48488 1301210580\n', 'us.production.mind.requests.total 49108 1301210640\n', 'us.production.mind.requests.total 49271 1301210700\n', 'us.production.mind.requests.total 48328 1301210760\n', 'us.production.mind.requests.total 49832 1301210820\n', 'us.production.mind.requests.total 48314 1301210880\n', 'us.production.mind.requests.total 48750 1301210940\n', 'us.production.mind.requests.total 49387 1301211000\n', 'us.production.mind.requests.total 49573 1301211060\n', 'us.production.mind.requests.total 48919 1301211120\n', 'us.production.mind.requests.total 48267 1301211180\n', 'us.production.mind.requests.total 49387 1301211240\n', 'us.production.mind.requests.total 47475 1301211300\n', 'us.production.mind.requests.total 47814 1301211360\n', 'us.production.mind.requests.total 48651 1301211420\n', 'us.production.mind.requests.total 48456 1301211480\n', 'us.production.mind.requests.total 48831 1301211540\n', 'us.production.mind.requests.total 47082 1301211600\n', 'us.production.mind.requests.total 47483 1301211660\n', 'us.production.mind.requests.total 47864 1301211720\n', 'us.production.mind.requests.total 48184 1301211780\n', 'us.production.mind.requests.total 47350 1301211840\n', 'us.production.mind.requests.total 47480 1301211900\n', 'us.production.mind.requests.total 47509 1301211960\n', 'us.production.mind.requests.total 48019 1301212020\n', 'us.production.mind.requests.total 47866 1301212080\n', 'us.production.mind.requests.total 48035 1301212140\n']

Here is my import/collection script:
#!/usr/bin/python
import fileinput
import glob
import sys
import time
import os
import platform
import subprocess
from socket import socket

CARBON_SERVER = '127.0.0.1'
CARBON_PORT = 2003
delay = 10
sock = socket()
try:
  sock.connect( (CARBON_SERVER,CARBON_PORT) )
except:
  print "Couldn't connect to %(server)s on port %(port)d, is carbon-agent.py running?" % { 'server':CARBON_SERVER, 'port':CARBON_PORT }
  sys.exit(1)

files = glob.glob('/opt/graphite/storage/import_data/*')

while True:
        for file in files:
                with open(file) as f:
                        message = str(f.readlines())
                        print '-' * 80
                        print message
                        print
                        sock.sendall(message)
                        time.sleep(delay)

Python 2.6.6
[root@dhcp-10-100-46-73 bin]# pip freeze
Django==1.3.1
Twisted==11.1.0
distribute==0.6.24
django-tagging==0.3.1
iniparse==0.3.1
mod-python==3.3.1
pycurl==7.19.0
pygpgme==0.1
python-ldap==2.3.10
python-memcached==1.43
simplejson==2.3.1
urlgrabber==3.9.1
virtualenv==1.7
virtualenvwrapper==2.11
whisper==0.9.9
yum-metadata-parser==1.1.2
yum-presto==0.4.4
zope.interface==3.8.0

Any help would be greatly appreciated!!!

Question information

Language:
English Edit question
Status:
Solved
For:
Graphite Edit question
Assignee:
No assignee Edit question
Solved by:
Barry Morrison
Solved:
Last query:
Last reply:
Revision history for this message
Brian Hatfield (bmhatfield) said :
#1

Just looking at it, perhaps your "\n" is not being resolved as a newline
character, and instead is being sent/received as an actual "\n"?

On Fri, Dec 30, 2011 at 6:30 PM, basketcase <
<email address hidden>> wrote:

> New question #183344 on Graphite:
> https://answers.launchpad.net/graphite/+question/183344
>
> I _think_ my data is in the correct format, in fact I have had one
> successful import.
>
> but in /listener.log
>
> I have this:
>
> 30/12/2011 15:04:10 :: invalid line received from client 127.0.0.1:40247,
> ignoring
> 30/12/2011 15:04:10 :: invalid line: 'us.production.mind.requests.total
> 50301 1301209200\n'
> 30/12/2011 15:04:20 :: invalid line received from client 127.0.0.1:40247,
> ignoring
> 30/12/2011 15:04:20 :: invalid line: 'us.production.mind.requests.total
> 50301 1301209200\n'
> 30/12/2011 15:04:30 :: invalid line received from client 127.0.0.1:40247,
> ignoring
> 30/12/2011 15:04:30 :: invalid line: 'us.production.mind.requests.total
> 50301 1301209200\n'
> 30/12/2011 15:04:40 :: invalid line received from client 127.0.0.1:40247,
> ignoring
> 30/12/2011 15:04:40 :: invalid line: 'us.production.mind.requests.total
> 50301 1301209200\n'
> 30/12/2011 15:04:50 :: invalid line received from client 127.0.0.1:40247,
> ignoring
> 30/12/2011 15:04:50 :: invalid line: 'us.production.mind.requests.total
> 50301 1301209200\n'
>
> Here is what I'm sending it:
>
> ['us.production.mind.requests.total 50301 1301209200\n',
> 'us.production.mind.requests.total 50348 1301209260\n',
> 'us.production.mind.requests.total 48798 1301209320\n',
> 'us.production.mind.requests.total 49425 1301209380\n',
> 'us.production.mind.requests.total 47898 1301209440\n',
> 'us.production.mind.requests.total 47486 1301209500\n',
> 'us.production.mind.requests.total 47378 1301209560\n',
> 'us.production.mind.requests.total 49055 1301209620\n',
> 'us.production.mind.requests.total 48428 1301209680\n',
> 'us.production.mind.requests.total 48662 1301209740\n',
> 'us.production.mind.requests.total 48693 1301209800\n',
> 'us.production.mind.requests.total 48561 1301209860\n',
> 'us.production.mind.requests.total 43759 1301209920\n',
> 'us.production.mind.requests.total 48766 1301209980\n',
> 'us.production.mind.requests.total 48537 1301210040\n',
> 'us.production.mind.requests.total 47781 1301210100\n',
> 'us.production.mind.requests.total 47507 1301210160\n',
> 'us.production.mind.requests.total 48723 1301210220\n',
> 'us.production.mind.requests.total 48498 1301210280\n',
> 'us.production.mind.requests.total 48767 1301210340\n',
> 'us.production.mind.requests.total 47721 1301210400\n',
> 'us.production.mind.requests.total 48451 1301210460\n',
> 'us.production.mind.requests.total 47160 1301210520\n',
> 'us.production.mind.requests.total 48488 1301210580\n',
> 'us.production.mind.requests.total 49108 1301210640\n',
> 'us.production.mind.requests.total 49271 1301210700\n',
> 'us.production.mind.requests.total 48328 1301210760\n',
> 'us.production.mind.requests.total 49832 1301210820\n',
> 'us.production.mind.requests.total 48314 1301210880\n',
> 'us.production.mind.requests.total 48750 1301210940\n',
> 'us.production.mind.requests.total 49387 1301211000\n',
> 'us.production.mind.requests.total 49573 1301211060\n',
> 'us.production.mind.requests.total 48919 1301211120\n',
> 'us.production.mind.requests.total 48267 1301211180\n',
> 'us.production.mind.requests.total 49387 1301211240\n',
> 'us.production.mind.requests.total 47475 1301211300\n',
> 'us.production.mind.requests.total 47814 1301211360\n',
> 'us.production.mind.requests.total 48651 1301211420\n',
> 'us.production.mind.requests.total 48456 1301211480\n',
> 'us.production.mind.requests.total 48831 1301211540\n',
> 'us.production.mind.requests.total 47082 1301211600\n',
> 'us.production.mind.requests.total 47483 1301211660\n',
> 'us.production.mind.requests.total 47864 1301211720\n',
> 'us.production.mind.requests.total 48184 1301211780\n',
> 'us.production.mind.requests.total 47350 1301211840\n',
> 'us.production.mind.requests.total 47480 1301211900\n',
> 'us.production.mind.requests.total 47509 1301211960\n',
> 'us.production.mind.requests.total 48019 1301212020\n',
> 'us.production.mind.requests.total 47866 1301212080\n',
> 'us.production.mind.requests.total 48035 1301212140\n']
>
> Here is my import/collection script:
> #!/usr/bin/python
> import fileinput
> import glob
> import sys
> import time
> import os
> import platform
> import subprocess
> from socket import socket
>
> CARBON_SERVER = '127.0.0.1'
> CARBON_PORT = 2003
> delay = 10
> sock = socket()
> try:
> sock.connect( (CARBON_SERVER,CARBON_PORT) )
> except:
> print "Couldn't connect to %(server)s on port %(port)d, is
> carbon-agent.py running?" % { 'server':CARBON_SERVER, 'port':CARBON_PORT }
> sys.exit(1)
>
> files = glob.glob('/opt/graphite/storage/import_data/*')
>
> while True:
> for file in files:
> with open(file) as f:
> message = str(f.readlines())
> print '-' * 80
> print message
> print
> sock.sendall(message)
> time.sleep(delay)
>
> Python 2.6.6
> [root@dhcp-10-100-46-73 bin]# pip freeze
> Django==1.3.1
> Twisted==11.1.0
> distribute==0.6.24
> django-tagging==0.3.1
> iniparse==0.3.1
> mod-python==3.3.1
> pycurl==7.19.0
> pygpgme==0.1
> python-ldap==2.3.10
> python-memcached==1.43
> simplejson==2.3.1
> urlgrabber==3.9.1
> virtualenv==1.7
> virtualenvwrapper==2.11
> whisper==0.9.9
> yum-metadata-parser==1.1.2
> yum-presto==0.4.4
> zope.interface==3.8.0
>
> Any help would be greatly appreciated!!!
>
>
> --
> You received this question notification because you are a member of
> graphite-dev, which is an answer contact for Graphite.
>
> _______________________________________________
> Mailing list: https://launchpad.net/~graphite-dev
> Post to : <email address hidden>
> Unsubscribe : https://launchpad.net/~graphite-dev
> More help : https://help.launchpad.net/ListHelp
>

Revision history for this message
Barry Morrison (basketcase) said :
#2

Brian,
That is the output on the console when I run my pyparser2.py file (the one above). The actual file it is parsing does not have a \n in it.

[root@dhcp-10-100-46-73 bin]# python pyparser2.py
--------------------------------------------------------------------------------
['us.production.mind.requests.total 50301 1301209200\n', 'us.production.mind.requests.total 50348 1301209260\n', 'us.production.mind.requests.total 48798 1301209320\n', 'us.production.mind.requests.total 49425 1301209380\n', 'us.production.mind.requests.total 47898 1301209440\n', 'us.production.mind.requests.total 47486 1301209500\n', 'us.production.mind.requests.total 47378 1301209560\n', 'us.production.mind.requests.total 49055 1301209620\n', 'us.production.mind.requests.total 48428 1301209680\n', 'us.production.mind.requests.total 48662 1301209740\n', 'us.production.mind.requests.total 48693 1301209800\n', 'us.production.mind.requests.total 48561 1301209860\n', 'us.production.mind.requests.total 43759 1301209920\n', 'us.production.mind.requests.total 48766 1301209980\n', 'us.production.mind.requests.total 48537 1301210040\n', 'us.production.mind.requests.total 47781 1301210100\n', 'us.production.mind.requests.total 47507 1301210160\n', 'us.production.mind.requests.total 48723 1301210220\n', 'us.production.mind.requests.total 48498 1301210280\n', 'us.production.mind.requests.total 48767 1301210340\n', 'us.production.mind.requests.total 47721 1301210400\n', 'us.production.mind.requests.total 48451 1301210460\n', 'us.production.mind.requests.total 47160 1301210520\n', 'us.production.mind.requests.total 48488 1301210580\n', 'us.production.mind.requests.total 49108 1301210640\n', 'us.production.mind.requests.total 49271 1301210700\n', 'us.production.mind.requests.total 48328 1301210760\n', 'us.production.mind.requests.total 49832 1301210820\n', 'us.production.mind.requests.total 48314 1301210880\n', 'us.production.mind.requests.total 48750 1301210940\n', 'us.production.mind.requests.total 49387 1301211000\n', 'us.production.mind.requests.total 49573 1301211060\n', 'us.production.mind.requests.total 48919 1301211120\n', 'us.production.mind.requests.total 48267 1301211180\n', 'us.production.mind.requests.total 49387 1301211240\n', 'us.production.mind.requests.total 47475 1301211300\n', 'us.production.mind.requests.total 47814 1301211360\n', 'us.production.mind.requests.total 48651 1301211420\n', 'us.production.mind.requests.total 48456 1301211480\n', 'us.production.mind.requests.total 48831 1301211540\n', 'us.production.mind.requests.total 47082 1301211600\n', 'us.production.mind.requests.total 47483 1301211660\n', 'us.production.mind.requests.total 47864 1301211720\n', 'us.production.mind.requests.total 48184 1301211780\n', 'us.production.mind.requests.total 47350 1301211840\n', 'us.production.mind.requests.total 47480 1301211900\n', 'us.production.mind.requests.total 47509 1301211960\n', 'us.production.mind.requests.total 48019 1301212020\n', 'us.production.mind.requests.total 47866 1301212080\n', 'us.production.mind.requests.total 48035 1301212140\n']

Also, when I run it and tailing /listener.log this is the only thing that exists in the log:

30/12/2011 15:44:10 :: MetricLineReceiver connection with 127.0.0.1:40266 established
30/12/2011 15:44:23 :: MetricLineReceiver connection with 127.0.0.1:40266 closed cleanly

It's like, it sends creates the connection, but doesn't send anything.

Revision history for this message
Barry Morrison (basketcase) said :
#3

I did a fresh install on a new server.

Still:
CentOS 6.2
Python 2.6.6
Django 1.3.1

Graphite from bzr
Carbon/Whisper from source

Everything else was left alone out of box.

NOTHING was installed with pip.

When I try and run an import, not data is imported. /opt/graphite/storage/log/web-cache-a/listener.log does not exist (wasn't created). The only thing that exists is console.log

I've set DEBUG = True (or uncommented) in local_settings.py

Revision history for this message
Barry Morrison (basketcase) said :
#4

OK. Now listener.log exists, I can attempt to send data to it. it acknowledges that a connection is made, but no data is actually sent:

02/01/2012 16:19:38 :: MetricLineReceiver connection with 127.0.0.1:58674 established
02/01/2012 16:19:38 :: MetricLineReceiver connection with 127.0.0.1:58674 lost: Connection to the other side was lost in a non-clean fashion.
02/01/2012 16:20:36 :: MetricLineReceiver connection with 127.0.0.1:58675 established
02/01/2012 16:21:46 :: MetricLineReceiver connection with 127.0.0.1:58675 closed cleanly
02/01/2012 16:36:48 :: MetricLineReceiver connection with 127.0.0.1:58676 established
02/01/2012 16:37:16 :: MetricLineReceiver connection with 127.0.0.1:58676 closed cleanly
02/01/2012 16:42:03 :: MetricLineReceiver connection with 127.0.0.1:58677 established
02/01/2012 16:42:09 :: MetricLineReceiver connection with 127.0.0.1:58677 closed cleanly

Here is my script to feed it data:

!/usr/bin/python
import fileinput
import glob
import sys
import time
import os
import platform
import subprocess
from socket import socket

CARBON_SERVER = '127.0.0.1'
CARBON_PORT = 2003
delay = 1
sock = socket()
try:
  sock.connect( (CARBON_SERVER,CARBON_PORT) )
except:
  print "Couldn't connect to %(server)s on port %(port)d, is carbon-agent.py running?" % { 'server':CARBON_SERVER, 'port':CARBON_PORT }
  sys.exit(1)

files = glob.glob('/opt/graphite/storage/import_data/*')

while True:
        for file in files:
                for line in open(file):
                        data = line.rstrip('\n')
                        print data
                        print
                        sock.sendall(data)
                        time.sleep(delay)

Revision history for this message
Michael Leinartas (mleinartas) said :
#5

You may try setting LOG_UPDATES = True in your carbon.conf. That will create an update.log in your log directory which might give you some more info.

You might also want to experiment with telnetting to the port or using netcat to ensure you can get a few points recorded successfully before you continue trying to bulk import the data

Revision history for this message
Barry Morrison (basketcase) said :
#6

cat'ing the file and piping to nc 127.0.0.1 2003 worked. I threw 300k records at it in a matter of seconds.

It appears socket is not actually sending anything, though Graphite's logs indicate a connection:

04/01/2012 21:28:35 :: MetricLineReceiver connection with 127.0.0.1:37981 closed cleanly
04/01/2012 21:38:46 :: MetricLineReceiver connection with 127.0.0.1:37982 established
04/01/2012 21:38:46 :: MetricLineReceiver connection with 127.0.0.1:37982 closed cleanly
04/01/2012 21:39:52 :: MetricLineReceiver connection with 127.0.0.1:37983 established
04/01/2012 21:39:53 :: MetricLineReceiver connection with 127.0.0.1:37983 closed cleanly
04/01/2012 21:43:07 :: MetricLineReceiver connection with 127.0.0.1:37984 established
04/01/2012 21:43:07 :: MetricLineReceiver connection with 127.0.0.1:37984 closed cleanly
04/01/2012 21:46:32 :: MetricLineReceiver connection with 127.0.0.1:37985 established
04/01/2012 21:46:32 :: MetricLineReceiver connection with 127.0.0.1:37985 closed cleanly
04/01/2012 21:46:54 :: MetricLineReceiver connection with 127.0.0.1:37986 established
04/01/2012 21:49:47 :: MetricLineReceiver connection with 127.0.0.1:37986 closed cleanly
04/01/2012 21:50:12 :: MetricLineReceiver connection with 127.0.0.1:37987 established
04/01/2012 21:51:25 :: MetricLineReceiver connection with 127.0.0.1:37987 closed cleanly
04/01/2012 21:52:01 :: MetricLineReceiver connection with 127.0.0.1:37988 established
04/01/2012 21:52:01 :: MetricLineReceiver connection with 127.0.0.1:37988 closed cleanly

even if I do a simple sock.send('this.is.my.file 123 timestamp') it doesn't actually send anything.

Revision history for this message
Barry Morrison (basketcase) said :
#7

So it is in fact sending to 2003:

[root@dhcp-10-100-46-73 ~]# nc -l 2003
us.staging.mind.requests.total 55 1325741217

So my script is in fact sending on the socket. Why isn't graphite picking it up?!

Revision history for this message
Michael Leinartas (mleinartas) said :
#8

Hmm so it looks like previously you were counting on the newlines from the file contents to remain in the string when sent. I'm not sure why that didnt work, but now that you're stripping them out you need to add newlines after each one.

Try this in the console first:
import socket
import time
sock = socket.socket()
sock.connect((CARBON_SERVER,CARBON_PORT))
sock.sendall("this.is.my.file 123 %s\n" % int(time.time()))

Revision history for this message
Barry Morrison (basketcase) said :
#9

FWIW, this is what I was able to get working yesterday (or this morning? -- nights mess up my sense of time).

#!/usr/bin/python
import fileinput
import glob
import sys
import time
import os
import platform
import subprocess
from socket import socket

CARBON_SERVER = '127.0.0.1'
CARBON_PORT = 2003
delay = 60
sock = socket()
try:
  sock.connect( (CARBON_SERVER,CARBON_PORT) )
except:
  print "Couldn't connect to %(server)s on port %(port)d, is carbon-agent.py running?" % { 'server':CARBON_SERVER, 'port':CARBON_PORT }
  sys.exit(1)

files = glob.glob('/opt/graphite/storage/import_data/*')

for file in files:
 lines = []
 for line in open(file):
  drtylines = line.rstrip('\n')
  lines.append(drtylines)
  data = '\n'.join(lines) + '\n'
  print "sending message"
  print 80 * '-'
  print data
  sock.sendall(data)

I'll mark this as resolved...but a new problem is introduced -- will research further first. I can import 300k records, logs indicate that it creates data points, but actual graph is incomplete, often only a "window" of time actually graphed.

Much appreciated for all your help!