What is Zipf distribution?
- Keywords:
- zipf
Zipf disribution implements Zipf-like law. According to Zipf's law the
most frequent object has probability twice the second most frequent,
three times the third most frequent object, etc. Here you can find more
info at http://
popZipf() distribution emulates Zipf's law for objects in the working
set: The last object in the working set is the most popular, the second
object is second most frequent, etc. The first (oldest) object in
working set has the lowest probability. The skew parameter of popZipf()
affects how probability changes from one object to the next one.
The formula for popZipf() is:
popZipf(skew) = wss + 1 - floor((wss + 1) ** (x ** skew))
where skew is popZips() argument, wss is working set size and x is
uniform random value in [0, 1) range.
Skew varies the probability "step" between oids. High skew would give
more preference to the recent oids. Low skew (< 0.3) would give
preference to older oids.
Exact formula for probability of selecting oid N (where N=1 is the most
recent oid, N=wss is the oldest oid in the working set) is:
P(N) = log(N+1; wss+1)**(1/skew) - log(N; wss+1)**(1/skew)
The first log() argument is value, the second one is base.
Examples for wss = 99:
* with skew = 10: P(1) = 82%, P(2) = 4%, P(3) = 2%
* with skew = 1: P(1) = 15%, P(2) = 9%, P(3) = 6%
* with skew = 0.5: P(1) = 2%, P(2) = 3%, P(3) = 3%