Clustered Webapp - Not Communicating

Asked by VegHead on 2012-05-03

I have Graphite setup on three instances on EC2:

* carbon-relay - relay1.graphite.prod.example.ec2
* carbon-cache + webapp - cache3.graphite.prod.example.ec2
* carbon-cache + webapp - cache4.graphite.prod.example.ec2

The relay is working perfectly with consistent-hashing. The problem is the two web servers are not communicating with each other, so I only see the metrics from one server.

I spent a lot of time looking at https://answers.launchpad.net/graphite/+question/114206 and I can't figure out what I have setup incorrectly. I can run a wget from cache3 against cache4, get data back and see it in the Apache logs. So I don't think it's a firewall issue. I tried enabling "suppressError = False" in remote_storage.py and turned on DEBUG in local_settings.py, but I don't see any errors in Firebug.

cache3 - local_settings.py
CLUSTER_SERVERS = [ 'cache4.graphite.prod.example.ec2', 'localhost' ]

cache4 - local_settings.py
CLUSTER_SERVERS = [ 'cache3.graphite.prod.example.ec2', 'localhost' ]

I have tried using IP addresses as well and that had no impact. Any suggestions?

Question information

Language:
English Edit question
Status:
Solved
For:
Graphite Edit question
Assignee:
No assignee Edit question
Solved by:
VegHead
Solved:
2012-05-04
Last query:
2012-05-04
Last reply:
2012-05-03
VegHead (organicveggie) said : #1

I did a little more debugging and modified storage.py to directly hard code my remote hosts:

STORE = Store(settings.DATA_DIRS, remote_hosts=["cache4.graphite.prod.example.ec2", "127.0.0.1"])

That worked. So, somehow my CLUSTER_SERVERS value isn't getting pulled in from local_settings.py correctly.

Michael Leinartas (mleinartas) said : #2

Hm, I dont see anything immediately wrong with the config. It's strange to me that setting it directly works but in local_settings.py doesnt..

One note: you dont actually need to specify 'localhost' in the config (it's filtered out automatically).

What version or checkout rev are you running?

VegHead (organicveggie) said : #3

I installed carbon & whisper via pip and it looks like I'm running 0.9.10-pre1.

Michael Leinartas (mleinartas) said : #4

Well if you installed graphite-web via pip, it's likely 0.9.9 which was released last November. 0.9.10-pre1 was just cut a couple days ago and isn't yet on Pypi for pip to find. To be sure, you can check settings.py (not local_settings) and see what WEBAPP_VERSION is set to.

Can you also post your local_settings.py?

VegHead (organicveggie) said : #5

Believe it or not, settings.py says 0.9.10-pre1. On both instances.

WEBAPP_VERSION = '0.9.10-pre1'

I'm installing everything via a Chef recipe that I wrote, which uses the python_pip LWRP. And I did just set the server up within the past 2 days. That leaves me really puzzled how I ended up with 0.9.10-pre1... I triple checked and you're absolutely right, PyPI doesn't have v0.9.10-pre1. Strangely enough, I also have whisper 0.9.10-pre1 and carbon 0.9.10-pre1, neither of which are available on PyPI.

Should I try switching to 0.9.9 instead?

Here is the contents of local_settings.py (only the uncommented items) on cache3.graphite.prod:

---snip---
USE_LDAP_AUTH = True
LDAP_URI = "ldap://ldap.example.ec2:389/"
LDAP_SEARCH_BASE = "ou=People,dc=example,dc=ec2"
LDAP_USER_QUERY = "(uid=%s)" #For Active Directory use "(sAMAccountName=%s)"

CLUSTER_SERVERS = [ 'cache3.graphite.prod.example.ec2','cache4.graphite.prod.example.ec2' ]
---snip---

And here is the contents of local_settings.py on the other servers, cache4.graphite.prod:

---snip---
USE_LDAP_AUTH = True
LDAP_URI = "ldap://ldap.example.ec2:389/"
LDAP_SEARCH_BASE = "ou=People,dc=example,dc=ec2"
LDAP_USER_QUERY = "(uid=%s)" #For Active Directory use "(sAMAccountName=%s)"

CLUSTER_SERVERS = [ 'cache3.graphite.prod.example.ec2','localhost' ]
---snip---

VegHead (organicveggie) said : #6

I tried explicitly switching back to 0.9.9 and that did not seem to help.

VegHead (organicveggie) said : #7

And that led me to the answer. :)

The permissions on local_settings.py were too restrictive:

-rw------- 1 root root 4006 May 4 13:40 local_settings.py

Since it was owned by root and chmod 600, Apache wasn't able to read it. :) I changed the permissions to 644 and it all started working. Woo woo!

Thanks for the help.

-Veggie