carbon-relay send queue full results in dropped metrics
Hi,
I'm running Graphite-web/carbon 0.9.12 on Ubuntu 14.04 with the following setup:
- Load balancer which sends metric traffic to carbon relays
- 2x Top Level Carbon relays(let's call this TLR) which send metrics to two downstream carbon nodes
- Each node is self contained running carbon-relay, graphite web and multiple carbon-caches (8x).
The TLRs are running on VMs (16 Cores) and the carbon nodes are baremetal servers with SSDs.
This setup was working fine up until the TLRs stopped processing new metrics (at least this is what it seems). I noticed that after we hit 1Mill/Min the top level relays started to log these messages in clients.log:
21/02/2016 06:25:04 :: CarbonClientPro
21/02/2016 06:25:04 :: CarbonClientPro
21/02/2016 06:28:54 :: CarbonClientFac
21/02/2016 06:28:54 :: CarbonClientFac
At this point, all the data in my Graphs stopped plotting because the relays don't seem to process any new metrics.
My relay settings from carbon.conf:
[relay]
LINE_RECEIVER_
LINE_RECEIVER_PORT = 2003
PICKLE_
PICKLE_
MAX_QUEUE_SIZE = 10000
MAX_DATAPOINTS_
LOG_LISTENER_
RELAY_METHOD = consistent-hashing
REPLICATION_FACTOR = 1
DESTINATIONS = 10.146.
The relay is running really hot at 100% utilization, could this be the cause of the issue?
Any help on how I can resolve this?
Note: I have an identical setup in another environment with the only difference being that the TLRs use relay rules instead of consistent hashing.
Question information
- Language:
- English Edit question
- Status:
- Expired
- For:
- Graphite Edit question
- Assignee:
- No assignee Edit question
- Last query:
- Last reply: