All of a sudden metrics getting lost

Asked by ak

I'm using graphite to monitor our systems, especially aerospike nodes. Till last week, I was able to see the historical data (3months, 6 months) of the metrics. Two days back when I checked, all the old metrics are gone and only a day of metrics were present.
Today when I checked, only yesterday's data is present.

I read through forums and understood that metrics are retained as per definitions in /etc/carbon/storage-schemas.conf.
It has the default entries.
[default_1min_for_1day]
pattern = .*
retentions = 60s:1d

I checked the creates.log and see the below entries.

15/10/2018 05:04:18 :: new metric instances.aerospike.dummy-v2-8.dummy.appeals_records_exonerated matched schema default_1min_for_1day
15/10/2018 05:04:18 :: new metric instances.aerospike.dummy-v2-8.dummy.appeals_records_exonerated matched aggregation schema default_average
15/10/2018 05:04:18 :: creating database file /var/lib/graphite/whisper/instances/aerospike/idstore-aerospike-dummy-v2-8/dummy/appeals_records_exonerated.wsp (archive=[(60, 1440)] xff=0.5 agg=average)

So as per my understanding, the metrics are matching the [default_1min_for_1day] tag and so they're dropped from the whisper storage.

My question is how all of a sudden the metrics are matching the [default_1min_for_1day] definition and getting dropped. Why they didn't match till last week?
I confirm that no changes are made in the configurations or the metrics exporters.

Also can you explain the term "The retentions line is saying that each datapoint represents 10 seconds, and we want to keep enough datapoints so that they add up to 14 days of data." referring to 'retentions = 10s:14d'?
Does it mean that all data will be retained for 14days? What will happen if i change the 10s to 1s?

Question information

Language:
English Edit question
Status:
Solved
For:
Graphite Edit question
Assignee:
No assignee Edit question
Solved by:
ak
Solved:
Last query:
Last reply:
Revision history for this message
Denis Zhdanov (deniszhdanov) said :
#1

Whisper (Graphite data storage format) was designed to have predictable disk usage.
So, retentions = 60s:1d means that Graphite will store datapoint for every 60 seconds and it will keep it for 1 day. It will not drop anything after 1 day, it will just update old points with new data, so, file size will be the same, but if you request more then 1 day of data you will get nothing.
So, if you was able to see the historical data before means that files before was created using different retention, not 60s:1d. I do not see any other explanation, so, probably retention was changed recently.
If you using retention 10s:14d, then it will store 1 datapoint every 10 seconds and it will keep data for 14 days, indeed. If you change 10s to 1s then file will be 10 times bigger.
Please check Whisper documentation for details - https://graphite.readthedocs.io/en/latest/whisper.html
You can use multiple archives if you want to balance file size and data retention, e.g.
retention = 10s:1d,60s:7d,5m:1y

Revision history for this message
ak (ffsak) said :
#2

Thanks for the clarification.

I changed the retention to 60s:180d, i.e store data points every 1m and keep it for 180d.

I read that whisper resize command has to be executed to make the changes effective.

Could you guide how to execute that? I tried the options mentioned in the manual , but getting error.

root@ip:/var/lib/graphite/whisper# /usr/bin/whisper-resize default_1min_for_1day 60s:180d
[ERROR] File 'default_1min_for_1day' does not exist!
Usage: whisper-resize path timePerPoint:timeToStore [timePerPoint:timeToStore]*

timePerPoint and timeToStore specify lengths of time, for example:

60:1440 60 seconds per datapoint, 1440 datapoints = 1 day of retention
15m:8 15 minutes per datapoint, 8 datapoints = 2 hours of retention
1h:7d 1 hour per datapoint, 7 days of retention
12h:2y 12 hours per datapoint, 2 years of retention

root@ip:/var/lib/graphite/whisper# /usr/bin/whisper-resize /etc/carbon/storage-schemas.conf
Usage: whisper-resize path timePerPoint:timeToStore [timePerPoint:timeToStore]*

timePerPoint and timeToStore specify lengths of time, for example:

60:1440 60 seconds per datapoint, 1440 datapoints = 1 day of retention
15m:8 15 minutes per datapoint, 8 datapoints = 2 hours of retention
1h:7d 1 hour per datapoint, 7 days of retention
12h:2y 12 hours per datapoint, 2 years of retention

root@ip-dummy:/var/lib/graphite/whisper# /usr/bin/whisper-resize /etc/carbon/storage-schemas.conf 60s:180d
Traceback (most recent call last):
  File "/usr/bin/whisper-resize", line 67, in <module>
    info = whisper.info(path)
  File "/usr/lib/python2.7/dist-packages/whisper.py", line 704, in info
    info = __readHeader(fh)
  File "/usr/lib/python2.7/dist-packages/whisper.py", line 230, in __readHeader
    raise CorruptWhisperFile("Unable to read archive%d metadata" % i, fh.name)
whisper.CorruptWhisperFile: Unable to read archive39 metadata (/etc/carbon/storage-schemas.conf)

Revision history for this message
ak (ffsak) said :
#3

I figured out. Thanks.