Consolidate as a SUM and random update

Asked by Sebastien Estienne

Hello,

I'm using whisper as a library to solve the following issue:

I have an HTTP page view logger producing text log with ID of the page requested. (there are about 5 millions differents possible pages)

Every minutes i'm parsing 1 minute of log and counting/sorting the ID to count views.

I have one whisper db per ID with the following format:
- 60 points with 60 seconds interval
- 24 points with 1 hour interval
- 30 points with 1 day interval

This allows me to count the views on the last hour, day, month.

The default consolidation function in whisper is "average", i 'd like to have a sum instead: Is it enought change
aggregateValue = float(sum(knownValues)) /float(len(knownValues))
to
aggregateValue = float(sum(knownValues))
?

And i'd like to know what happends if the db are not updated in a timely way? For pages with 1 view per day?

Should i use xFilesFactor=0?

The average consolidation divide by the number of "knowValues", this means that unknow values are not considered as 0.
This is the behaviour i would expect.

Thanx for writting this nice piece of software and for your answers.

PS:
Would be great to have a "dump" function and a small tutorial:

import whisper
whisper.create('test.wsp', [(60 ,60), (60 * 60, 24), (60 * 60 * 24, 30),])
whisper.update('test.wsp', 42)
etc...

Question information

Language:
English Edit question
Status:
Solved
For:
Graphite Edit question
Assignee:
No assignee Edit question
Last query:
Last reply:
Revision history for this message
chrismd (chrismd) said :
#1

Yes if you were to make that change to the aggregateValue line it should sum as you expect, and one of my todo's is to make the consolidation function an optional creation-time parameter. However when using sum instead of average I would recommend having a high xFilesFactor, probably 1.0. That means that all datapoints in a high precision archive must be present in order to propagate a value to a lower precision archive. If you don't set it to 1.0 then a single missing data point would cause your aggregated value to be artificially low.

As far as unknown values not being considered the same as zero. I know that initially it may seem intuitive to think that a lack of a value is the same as zero, and in many cases it is quite correct (like if your numbers represent a counter of some sort). However in general this is not always so, consider the case where the database is storing something like network latency, then a value of zero implies that your network has zero latency, which is different than a value of "unknown" which implies that no measurement was taken. One might think that you could simply interpret a zero as no measurement being taken, but then consider the case where you try to calculate an average latency over a larger period of time. The zeros would artificially lower your average latency, while storing the values as "unknown" allows you to leave them out of the calculation or to make to the decision that not enough data was available to make the calculation. Another example where zero and "unknown" must be distinct is if your numbers represent something where a zero is sometimes normal, like for the size of a mail queue, or the load on a server.

Currently there is no real documentation around whisper itself, which stinks. I need to write some docs and also to create a couple of utilities for working with whisper files to do things like basic create/update/fetch operations from a shell command, dumping or changing the configuration, resizing a file in place, etc.

Revision history for this message
Sebastien Estienne (sebest) said :
#2

How should i use/modify whisper to consider Unknow as zero, and have it consolidate value even if there is only one unknow point?

should i use xFilesFactor=0?

For example with the DB settings i define in my initial question.
If i only enter 1 value and that i query on a period of a month, i'd like whisper to return me this value.

whisper.create('test.wsp', [(60 ,60), (60 * 60, 24), (60 * 60 * 24, 30),])
whisper.update('test.wsp', 42)
whisper.fetch('test.wsp', int(time.time()) - 60 * 60 * 24)
-> Should return this 42 in the result list

Is it possible?

Revision history for this message
chrismd (chrismd) said :
#3

If you set your xFilesFactor to zero then no matter how many data points are unknown the high precision values will propagate. This is ok if you expect only random updates, instead of receiving updates on a consistent basis. I would recommend against having whisper interpret unknowns as zeros, because having an xFilesFactor of zero essentially does this when it comes to propagation, and when you fetch() your own code could simply interpret the None values as zeros. But if you really want to make a fetch() return real zeros for unknown values then you can change the following statement (which occurs on line 454 and 481)

valueList = [None] * points

To look like so:

valueList = [0] * points

This statement basically allocates the value list with all values set as unknown, then later they get changed in place to real values if they are present in the archive.

As far as the last part of your question, setting xFilesFactor to zero should achieve what you're after. A fetch() call looks for the highest precision archive in your database that covers the entire requested time interval (in your example, both the daily archive with hour precision and the monthly archive with daily precision covers the past 24 hours you requested, so whisper will look at values stored for the daily archive with hour precision but if only one value was inserted into the database then only 1/60th of that hour will contain data because the highest precision archive is minutely, so unless your xFilesFactor is 1/60 or less, no value will have propagated to the daily archive so you get the default, unknown. I hope that makes sense.

Revision history for this message
Jeff Blaine (jblaine-kickflop) said :
#4

Answered completely ages ago. Marking as solved.