I/O utlization goes up when having same data on backends

Asked by Tobias

I have the following setup:

Physical loadbalancer round robin - > 2 * graphite-web frontends with same configuration, CLUSTER_SERVERS = ["10.57.72.33:80", "10.57.72.34:80"], pointing to two graphite backend servers. Both backend servers have the same data, in order to have redundancy.

The problem I see is that we get a lot of I/O utilization when having this configuration. I also see the following line in the exceptions.log:

Failed to join remote_fetch thread 10.57.72.33:80 within 6s
Failed to join remote_fetch thread 10.57.72.34:80 within 6s

If I remove one server from the CLUSTER_SERVERS everything seems to work very well.

I am running graphite-web version 0.9.15

Here is the complete conf for my frontend:

SECRET_KEY = '?=exBKb/9J~m4B3re@P2Waa,`"H_e"x~'
TIME_ZONE = 'CET'
MEMCACHE_HOSTS = ['10.57.72.31:11211']
DEFAULT_CACHE_DURATION = 600 # Cache images and data for 10 minutes
STORAGE_DIR = '/var/opt/graphite/storage'
LOG_DIR = '/opt/graphite/storage/log/webapp'
CLUSTER_SERVERS = ["10.57.72.33:80", "10.57.72.34:80"]
CARBONLINK_HOSTS = []

Here is the conf for the backends:

SECRET_KEY = '?=exBKb/9J~m4B3re@P2Waa,`"H_e"x~'
TIME_ZONE = 'CET'
WHISPER_DIR = '/var/opt/graphite/storage/whisper'
CARBONLINK_HOSTS = ["127.0.0.1:7102:w1", "127.0.0.1:7103:w2", "127.0.0.1:7104:w3", "127.0.0.1:7105:w4", "127.0.0.1:7106:w5", "127.0.0.1:7107:w6"]
CARBONLINK_QUERY_BULK = True

Does anyone have any idea what could be causing this? Seems to be a configuration issue to me.

Question information

Language:
English Edit question
Status:
Expired
For:
Graphite Edit question
Assignee:
No assignee Edit question
Last query:
Last reply:
Revision history for this message
Tobias (lindqt01) said :
#1

Is this related to REMOTE_STORE_MERGE_RESULTS?

Revision history for this message
Tobias (lindqt01) said :
#2
Revision history for this message
Denis Zhdanov (deniszhdanov) said :
#3

PR #1565 is for 1.0.x branch, not for 0.9.x
You can try to disable REMOTE_STORE_MERGE_RESULTS too, but I recommend to upgrade Graphite to 1.0.1 or at least to 0.9.16 first and check if you still have the issue.

Revision history for this message
Tobias (lindqt01) said :
#4

OK thank you I will give that a go.

Revision history for this message
Launchpad Janitor (janitor) said :
#5

This question was expired because it remained in the 'Open' state without activity for the last 15 days.