Find results slow when using CLUSTER_SERVERS in two node setup

Asked by Tobias on 2018-02-09

I am trying to set up two nodes to have a high availablility setup. I am using the following local_settings.py

DEBUG = True
MEMCACHE_HOSTS = ['172.27.174.55:11211', '172.27.172.181:11211']
DEFAULT_CACHE_DURATION = 60
DEFAULT_CACHE_POLICY = [(0, 60), # default is 60 seconds
                        (7200, 120), # >= 2 hour queries are cached 2 minutes
                        (21600, 180), # >= 6 hour queries are cached 3 minutes
                        (43200, 360), # >= 12 hour queries are cached 6 minutes
                        (86400, 600)] # >= 24 hour queries are cached 10 minutes

SECRET_KEY = 'dA"%hBb24,>UhUj%`7VUb3uMFbs9!cCx'
REMOTE_EXCLUDE_LOCAL = False
CLUSTER_SERVERS=["172.27.174.55","172.27.172.181"]

I see slow find results with over 3 seconds in the logs:

2018-02-09,15:03:12.749 :: Got a find result for <FindQuery: carbon.agents.* from Fri Feb 9 08:57:14 2018 until Fri Feb 9 15:05:14 2018> after 3.116738s
2018-02-09,15:03:12.749 :: Got all find results for <FindQuery: carbon.agents.* from Fri Feb 9 08:57:14 2018 until Fri Feb 9 15:05:14 2018> in 3.117328s

After commenting out the following:

#REMOTE_EXCLUDE_LOCAL = False
#CLUSTER_SERVERS=["172.27.174.55","172.27.172.181"]

I get much faster response times:

2018-02-09,15:04:48.227 :: Got a find result for <FindQuery: carbon.agents.*.* from Fri Feb 9 09:00:41 2018 until Fri Feb 9 15:08:41 2018> after 0.002134s
2018-02-09,15:04:48.232 :: Got all find results for <FindQuery: carbon.agents.*.* from Fri Feb 9 09:00:41 2018 until Fri Feb 9 15:08:41 2018> in 0.007380s

graphite-web version 1.1.1

Does anybody know what I'm missing here?

Question information

Language:
English Edit question
Status:
Solved
For:
Graphite Edit question
Assignee:
No assignee Edit question
Solved by:
Tobias
Solved:
2018-02-12
Last query:
2018-02-12
Last reply:
2018-02-12
Denis Zhdanov (deniszhdanov) said : #1

That's strange. Could you please show `pip freeze` output?

Tobias (lindqt01) said : #2

Thanks. Here is the output:

ansible==2.4.2.0
attrs==17.4.0
Automat==0.6.0
aws-cfn-bootstrap==1.4
Babel==0.9.6
backports.ssl-match-hostname==3.4.0.2
boto==2.45.0
cachetools==2.0.1
cairocffi==0.8.0
carbon==1.1.1
cffi==1.6.0
chardet==2.2.1
Cheetah==2.4.4
cloud-init==0.7.9
configobj==4.7.2
constantly==15.1.0
cryptography==1.7.2
decorator==3.4.0
Django==1.11.10
django-tagging==0.4.3
enum34==1.0.4
fros==1.0
graphite-web==1.1.1
httplib2==0.9.2
hyperlink==17.3.1
idna==2.4
incremental==17.5.0
iniparse==0.4
iotop==0.6
ipaddress==1.0.16
IPy==0.75
javapackages==1.0.0
Jinja2==2.7.2
jmespath==0.9.0
jsonpatch==1.2
jsonpointer==1.9
kitchen==1.1.1
lockfile==0.12.2
lxml==3.2.1
Markdown==2.4.1
MarkupSafe==0.11
paramiko==2.1.1
perf==0.1
Pillow==2.0.0
ply==3.4
policycoreutils-default-encoding==0.1
prettytable==0.7.2
pyasn1==0.1.9
pycparser==2.14
pycrypto==2.6.1
pycurl==7.19.0
Pygments==1.4
pygobject==3.22.0
pygpgme==0.3
pyliblzma==0.5.3
pyparsing==2.2.0
pyserial==2.6
pystache==0.5.4
python-daemon==1.6.1
python-keyczar==0.71rc0
python-linux-procfs==0.4.9
python-memcached==1.48
pytz==2018.3
pyudev==0.15
pyxattr==0.5.1
PyYAML==3.10
requests==2.6.0
rsa==3.4.1
scandir==1.6
schedutils==0.4
seobject==0.1
sepolicy==1.1
setroubleshoot==1.1
six==1.9.0
slip==0.4.0
slip.dbus==0.4.0
Twisted==17.9.0
txAMQP==0.8.2
urlgrabber==3.10
urllib3==1.10.2
whisper==1.1.1
yum-metadata-parser==1.1.4
zope.interface==4.4.3

Denis Zhdanov (deniszhdanov) said : #3

3 extra seconds looks like fetch timeout, which is 3 seconds.
Are CLUSTER_SERVERS really available? Does
curl -v http://172.27.174.55//render?format=json and curl -v http://172.27.172.181//render?format=json
returns something?

Tobias (lindqt01) said : #4

I found the problem. The memcached port was not open in the firewall. Sorry for the trouble.