Missing Metrics in Carbon Cache
We are seeing some missing metrics (null) from the cache when we query from graphite-web but some data is retrieved from the cache and displayed. Once everything is flushed to disk all metrics appear. Both members of the cluster exhibit the same behavior.
We have a new Graphite Cluster of two servers, each server is running carbon-relay, one carbon cache instance per server and graphite-web sitting behind a load balancer. We are using consistent-hashing with a replication-factor of 2 so that both servers have the same data.
Carbon-cache settles between 100k-300k metrics, the fewer metrics in cache results in fewer metrics missing. Restarts of carbon-cache will mask the issue for a few hours as the cache grows. We are writing ~275k metrics per minute.
The server referenced below uploads metrics every 5 seconds, matching what is set in the storage-
Attached below are some of the configs and logs illustrating the issue. If any other info is needed please let me know and any assistance is greatly appreciated.
tail -f query.log
07/11/2017 12:40:43 :: [127.0.0.1:36494] cache query for "web.servers.
Below the oldest ~16 metrics have been persisted to disk.. all showing up correctly in graphite-web. We then have ~21 null metrics that are not being pulled from the cache but I presume to be there since they get written to the disk a few moments later. We then have the newest 7 metrics being pulled from the cache and shown in graphite-web correctly.
curl "http://
[7.0, 1510079840], [6.0, 1510079845], [4.0, 1510079850], [5.0, 1510079855], [5.0, 1510079860], [8.0, 1510079865], [5.0, 1510079870], [6.0, 1510079875], [9.0, 1510079880], [5.0, 1510079885], [4.0, 1510079890], [3.0, 1510079895], [4.0, 1510079900], [3.0, 1510079905], [null, 1510079910], [null, 1510079915], [null, 1510079920], [null, 1510079925], [null, 1510079930], [null, 1510079935], [null, 1510079940], [null, 1510079945], [null, 1510079950], [null, 1510079955], [null, 1510079960], [null, 1510079965], [null, 1510079970], [null, 1510079975], [null, 1510079980], [null, 1510079985], [null, 1510079990], [null, 1510079995], [null, 1510080000], [null, 1510080005], [4.0, 1510080010], [8.0, 1510080015], [6.0, 1510080020], [6.0, 1510080025], [4.0, 1510080030], [5.0, 1510080035], [4.0, 1510080040]],
python ~/whisper-info.py /opt/graphite/
maxRetention: 63072000
xFilesFactor: 0.5
aggregationMethod: average
fileSize: 10172212
Archive 0
retention: 2592000
secondsPerPoint: 5
points: 518400
size: 6220800
offset: 52
Archive 1
retention: 15552000
secondsPerPoint: 60
points: 259200
size: 3110400
offset: 6220852
Archive 2
retention: 63072000
secondsPerPoint: 900
points: 70080
size: 840960
offset: 9331252
carbon.conf
[cache]
DATABASE = whisper
ENABLE_LOGROTATION = True
USER =
MAX_CACHE_SIZE = 5000000
MAX_UPDATES_
MAX_CREATES_
MIN_TIMESTAMP_
LINE_RECEIVER_
LINE_RECEIVER_PORT = 2004
ENABLE_UDP_LISTENER = True
UDP_RECEIVER_
UDP_RECEIVER_PORT = 2004
PICKLE_
PICKLE_
USE_INSECURE_
CACHE_QUERY_
CACHE_QUERY_PORT = 7002
USE_FLOW_CONTROL = True
LOG_UPDATES = False
LOG_CREATES = False
LOG_CACHE_HITS = True
LOG_CACHE_
CACHE_WRITE_
WHISPER_AUTOFLUSH = False
WHISPER_
CARBON_
CARBON_
GRAPHITE_URL = http://
[relay]
LINE_RECEIVER_
LINE_RECEIVER_PORT = 2003
PICKLE_
PICKLE_
ENABLE_UDP_LISTENER = True
UDP_RECEIVER_
UDP_RECEIVER_PORT = 2003
RELAY_METHOD = consistent-hashing
REPLICATION_FACTOR = 2
DESTINATIONS = 127.0.0.1:2005, 10.1.1.12:2005
MAX_QUEUE_SIZE = 100000
MAX_DATAPOINTS_
QUEUE_LOW_
TIME_TO_
USE_FLOW_CONTROL = True
USE_RATIO_
MIN_RESET_
MIN_RESET_RATIO=0.9
MIN_RESET_
[aggregator]
LINE_RECEIVER_
LINE_RECEIVER_PORT = 2023
PICKLE_
PICKLE_
FORWARD_ALL = True
DESTINATIONS = 127.0.0.1:2004
REPLICATION_FACTOR = 1
MAX_QUEUE_SIZE = 10000
USE_FLOW_CONTROL = True
MAX_DATAPOINTS_
MAX_AGGREGATION
local_settings.py
SECRET_KEY = 'Edited'
CLUSTER_SERVERS = ["10.1.1.12:80"]
REMOTE_FIND_TIMEOUT = 3.0 # Timeout for metric find requests
REMOTE_
REMOTE_RETRY_DELAY = 60.0 # Time before retrying a failed remote webapp
Question information
- Language:
- English Edit question
- Status:
- Expired
- For:
- Graphite Edit question
- Assignee:
- No assignee Edit question
- Last query:
- Last reply: