Is a 10 year data retention value safe?

Asked by Nathanael Anderson

I'm planning to use this schema for data retention, and wanted to double check that a more complicated storage scheme like this is safe to use.
my understanding of these settings is
A. 20:60 - should store data every 20 seconds for 60 minutes
B. 60:10080 - should store data every minute for a week
C. 600:5184000 - should store data every 10 minutes for 10 years

so is this type of setup?
1. Correct
2. Safe
3. Scalable

[load]
priority = 100 #used to determine the order in which schemas are tested to create a new file
                       # matching is done highest to lowest.
#pattern = "^some.*regex$" #if this pattern matches a given metric name, this schema will be used
pattern = ^servers\.
retentions = 20:60, 60:10080, 600:5184000 #ex: 60:120 is minutely data kept for two hours

Question information

Language:
English Edit question
Status:
Solved
For:
Graphite Edit question
Assignee:
No assignee Edit question
Last query:
Last reply:
Revision history for this message
chrismd (chrismd) said :
#1

Almost, a retention configuration of 20:60 will store 60 20 second intervals of data, which is 20 minutes. A good rule of them is to just multiple the two numbers to find out how many seconds of retention you will have. B is correct, but C has the same problem.

Configurations like this are perfectly safe, the size of your files depends on the number of data points plus a small amount of header data. Each data point is 12 bytes, which you multiple times the number of data points you are storing (the second number in each retention configuration). If you have three retention configs then just add up the second number from each and that is the total number of points you will store, multiply by 12 bytes and thats your file size.

The syntax of you schemas file sample is correct. Also note that schemas only apply at file creation time, meaning that if you have already received data for a particular metric the file will already have been created with the prior retention configuration so changing the schema file will only apply to new files. I plan on making a simple utility to "resize" database files in place but I have not done this yet. For now you can simply delete existing files and let them get recreated with the new schema, or if you do not wish to lose data then you must write a script that uses the graphite.whisper module (in /usr/local/graphite/lib/) to fetch() out your data and re-insert it into another file with the appropriate size. This is annoying I know, so hopefully I will have time to write the resize utility soon.

Revision history for this message
Nathanael Anderson (wirelessdreamer) said :
#2

I'm still a little confused on one point, i'm able to set data retention values as stated above, but the example script sleeps for 60 seconds between data updates, so wouldn't that make either the sleep 60 in the example, or aquire data value of 20 ignored. with the value set in both places, won't it only be able to get new values every 60 seconds, because the reporting script is sleeping. How is this intended to work?

Revision history for this message
chrismd (chrismd) said :
#3

If your retention configuration is set to store data for every 20 second interval then you need to make sure that data is sent every 20 seconds, or else there will be gaps in the database and hence gaps in your graph. The example client script sleeps for 60 seconds because that is compatible with the default retention configuration of 2 hours of minutely data. If you wish to store data (and hence visualize it) with finer granularity, such as every 20 seconds, then you will need to modify the script to send data every 20 seconds.

Think of the retention configuration as a way of creating "slots" in the database. When the graphite backend receives a data point it finds which slot the value belongs in by looking at the timestamp. Your configuration would create 60 slots that each cover a 20 second time range, so every time the client script sends a data point its timestamp will fit into exactly one slot, hence the need to send data points as often as the retention configuration expects (otherwise some slots will be empty).

The values that go into the additional retention configurations like one week of minutely data and 10 years of 10-minutely data are calculated automatically based on what was put into the *finest precision* configuration (ie. the 20 minutes of 20-secondly data). I hope that helps.

Revision history for this message
Jeff Blaine (jblaine-kickflop) said :
#4

This was answered in full years ago. Marking solved.