How to debug carbon.agents.*.errors ?

Asked by Bryan

I have a graphite host that has been happily recording 100K metrics per minute for the last month. Today, however, I'm seeing that the internal carbon.agents.*.errors metric is showing between 1 and 20 errors per minute. This loosely correlates to an increase in pointsPerUpdate and a decrease in updateOperations. committedPoints and metricsReceived has remained stable though.

What is the best way of working out why this is happening?

I'm not seeing anything suspicious in console.log, creates.log or listener.log.

I have however reduced the logging in these files due to the number of metrics we have coming in. In carbon.conf I have:
LOG_LISTENER_CONNECTIONS = False
LOG_UPDATES = False
LOG_CACHE_HITS = False
LOG_CACHE_QUEUE_SORTS = False

Question information

Language:
English Edit question
Status:
Solved
For:
Graphite Edit question
Assignee:
No assignee Edit question
Solved by:
Bryan
Solved:
Last query:
Last reply:
Revision history for this message
Bryan (bryan-4) said :
#1

For anyone else who finds this, the console.log file is where you need to look.

In my particular case, it seems the filesystem filled up for a period of time without my knowledge, corrupting a number of whisper files.

whisper-info.py can be used to inspect each of your .wsp files.

Fixing a corrupt file doesn't look easy, so I chose to just delete my data and start again.