What is Zipf distribution?

Created by Dmitry Kurochkin
Keywords:
zipf

  Zipf disribution implements Zipf-like law. According to Zipf's law the
  most frequent object has probability twice the second most frequent,
  three times the third most frequent object, etc. Here you can find more
  info at http://en.wikipedia.org/wiki/Zipf's_law.

  popZipf() distribution emulates Zipf's law for objects in the working
  set: The last object in the working set is the most popular, the second
  object is second most frequent, etc. The first (oldest) object in
  working set has the lowest probability. The skew parameter of popZipf()
  affects how probability changes from one object to the next one.

  The formula for popZipf() is:

    popZipf(skew) = wss + 1 - floor((wss + 1) ** (x ** skew))

  where skew is popZips() argument, wss is working set size and x is
  uniform random value in [0, 1) range.

  Skew varies the probability "step" between oids. High skew would give
  more preference to the recent oids. Low skew (< 0.3) would give
  preference to older oids.

  Exact formula for probability of selecting oid N (where N=1 is the most
  recent oid, N=wss is the oldest oid in the working set) is:

    P(N) = log(N+1; wss+1)**(1/skew) - log(N; wss+1)**(1/skew)

  The first log() argument is value, the second one is base.

  Examples for wss = 99:

    * with skew = 10: P(1) = 82%, P(2) = 4%, P(3) = 2%
    * with skew = 1: P(1) = 15%, P(2) = 9%, P(3) = 6%
    * with skew = 0.5: P(1) = 2%, P(2) = 3%, P(3) = 3%