Graphite

I/O utlization goes up when having same data on backends

Asked by Tobias on 2017-05-04

I have the following setup:

Physical loadbalancer round robin - > 2 * graphite-web frontends with same configuration, CLUSTER_SERVERS = ["10.57.72.33:80", "10.57.72.34:80"], pointing to two graphite backend servers. Both backend servers have the same data, in order to have redundancy.

The problem I see is that we get a lot of I/O utilization when having this configuration. I also see the following line in the exceptions.log:

Failed to join remote_fetch thread 10.57.72.33:80 within 6s
Failed to join remote_fetch thread 10.57.72.34:80 within 6s

If I remove one server from the CLUSTER_SERVERS everything seems to work very well.

I am running graphite-web version 0.9.15

Here is the complete conf for my frontend:

SECRET_KEY = '?=exBKb/9J~m4B3re@P2Waa,`"H_e"x~'
TIME_ZONE = 'CET'
MEMCACHE_HOSTS = ['10.57.72.31:11211']
DEFAULT_CACHE_DURATION = 600 # Cache images and data for 10 minutes
STORAGE_DIR = '/var/opt/graphite/storage'
LOG_DIR = '/opt/graphite/storage/log/webapp'
CLUSTER_SERVERS = ["10.57.72.33:80", "10.57.72.34:80"]
CARBONLINK_HOSTS = []

Here is the conf for the backends:

SECRET_KEY = '?=exBKb/9J~m4B3re@P2Waa,`"H_e"x~'
TIME_ZONE = 'CET'
WHISPER_DIR = '/var/opt/graphite/storage/whisper'
CARBONLINK_HOSTS = ["127.0.0.1:7102:w1", "127.0.0.1:7103:w2", "127.0.0.1:7104:w3", "127.0.0.1:7105:w4", "127.0.0.1:7106:w5", "127.0.0.1:7107:w6"]
CARBONLINK_QUERY_BULK = True

Does anyone have any idea what could be causing this? Seems to be a configuration issue to me.

Question information

Language:: English Edit question

Status:: Expired

For:: Graphite Edit question

Assignee:: No assignee Edit question

Last query:: 2017-05-04

Last reply:: 2017-05-20

Link existing bug