Ceres - aggregation and compression methods

Asked by Francois Mikus

Once the Ceres database is available. One key element that differentiates the most cutting edge professional storage mechanisms is the aggregation and compression method.

Ceres has the possibility to implement more efficient algorithms for storing time series data. The use of fan interpolators or Straight Line Interpolative Methods to store only the relevant data points in a time series. This can achieve compression ratios of 20-35x versus raw data while also staying accurate to within a defined maximum deviation. This is particularly interesting for industrial process data or trends that typical aggregation methods like min/max/average render too coarse.

Average aggregation have a high compression ratio, but for more accuracy administrators also typically add min and max aggregation. Which is still by no means an accurate representation of the time series. Thus ending up with 3 data points + 1 time value in RRDTool or 3 data points + 3 time values in Whisper for a approx 20x compression ratio with high accuracy loss. Whereas, a little upfront processing can make 1 data point + 1 time value much more accurate AND still achieve a 20-35x compression versus the raw data. All data points between two stored values can be interpolated to be a straight line between the two data points. This eliminates both time and data values to interpolation.

Google for DATA COMPRESSION FOR PROCESS HISTORIANS by Peter A. James. for a comparison of various algorithms.

This would be a great implementation mini project for anyone with a little statistical background and a bit of python knowledge!

Question information

Language:
English Edit question
Status:
Answered
For:
Graphite Edit question
Assignee:
No assignee Edit question
Last query:
Last reply:
Revision history for this message
chrismd (chrismd) said :
#1

Agreed, this sounds like a great idea. I unfortunately don't have the bandwidth to approach it right now but anyone is welcome to contribute. You should file a bug since questions here typically don't result in follow up once the question has been marked answered and they eventually expire if they don't get answered.

Revision history for this message
Francois Mikus (fmikus) said :
#2

I will file it in the bug tracking/feature request system. Hopefully someone with more statistics expertise will be good enough to implement it.

Can you help with this problem?

Provide an answer of your own, or ask Francois Mikus for more information if necessary.

To post a message you must log in.