Missing metrics in periodically in graphite

Asked by john

I am running a config of 3 servers behind a single load balancer. Server A, B, C all run a carbon-relay with 2 carbon-caches (1 for each cpu as I have read in other documentation). I am seeing an issue where a consistent metric is missing periodically and then will be written later.

example:
-rw-r--r-- 1 graphite graphite 224680 Dec 3 03:34 guest.wsp
-rw-r--r-- 1 graphite graphite 224680 Dec 3 03:31 idle.wsp
-rw-r--r-- 1 graphite graphite 224680 Dec 3 03:34 iowait.wsp
-rw-r--r-- 1 graphite graphite 224680 Dec 3 03:34 irq.wsp
-rw-r--r-- 1 graphite graphite 224680 Dec 3 03:31 nice.wsp
-rw-r--r-- 1 graphite graphite 224680 Dec 3 03:34 softirq.wsp
-rw-r--r-- 1 graphite graphite 224680 Dec 3 03:34 steal.wsp
-rw-r--r-- 1 graphite graphite 224680 Dec 3 03:31 system.wsp
-rw-r--r-- 1 graphite graphite 224680 Dec 3 03:34 user.wsp

You can see that idle, nice, and system cpu metrics are all behind by 3 minutes. these metrics are delivered every 60 seconds and my storage-schema matches that.

This is only on server A. Server B and C both have the metrics. I am running the same configs on all 3 boxes. One really interesting thing I have seen is the cache-b logs have a lot of queries, and cache-a logs have none. Also, cache-a never showed a queue increase where cache-b shows a queue increase to 800. I have been see fullqueuedrops but don't understand why.

On the disk side I am running SSD and seeing the following from iostat. I can provide more info if needed.

-sh-4.2$ iostat -d 1
Linux 3.10.0-229.14.1.el7.x86_64 (ip-10-110-1-18) 12/03/2015 _x86_64_ (2 CPU)

Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
xvda 486.49 5.27 2171.49 4232842 1744289310

Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
xvda 0.00 0.00 0.00 0 0

Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
xvda 0.00 0.00 0.00 0 0

Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
xvda 1.00 8.00 0.00 8 0

Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
xvda 0.00 0.00 0.00 0 0

Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
xvda 2150.00 0.00 8600.00 0 8600

Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
xvda 3051.00 0.00 12232.00 0 12232

Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
xvda 2934.00 0.00 12984.00 0 12984

Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
xvda 1056.00 0.00 4228.00 0 4228

Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
xvda 0.00 0.00 0.00

I am currently doing about 40k metrics / 60 seconds. I'm really confused why I'm seeing a consistency in the missing metrics. I thought if this was a queue or caching issue it could be random metrics. Any help and direction would really be appreciated.
Thanks.

Question information

Language:
English Edit question
Status:
Expired
For:
Graphite Edit question
Assignee:
No assignee Edit question
Last query:
Last reply:
Revision history for this message
Launchpad Janitor (janitor) said :
#1

This question was expired because it remained in the 'Open' state without activity for the last 15 days.