4

I answered a question regarding generating samples with a positive support and known mean and variance using the gamma distribution in NumPy. I thought I'd try the same in Incanter. But unlike the results I got with NumPy, I could not get a sample mean and variance close to the distribution's mean and variance.

(defproject incanter-repl "0.1.0-SNAPSHOT"
  :description "FIXME: write description"
  :url "http://example.com/FIXME"
  :license {:name "Eclipse Public License"
            :url "http://www.eclipse.org/legal/epl-v10.html"}
  :dependencies [[org.clojure/clojure "1.6.0"]
                 [incanter "1.5.4"]])

(require '[incanter 
           [core] 
           [distributions :refer [gamma-distribution mean variance draw]] 
           [stats :as stats]])

(def dist 
  (let [mean 0.71 
        variance 2.89 
        theta (/ variance mean) 
        k (/ mean theta) ] 
    (gamma-distribution k theta)))

Incanter calculates mean and variance of the distribution

(mean dist) ;=> 0.71
(variance dist) ;=> 2.89

I calculate a sample mean and variance based on draws from that distribution

(def samples (repeatedly 10000 #(draw dist)))

(stats/mean samples) ;=> 0.04595208774029654
(stats/variance samples) ;=> 0.01223348345651905

I expected these stats calculated on the sample to be much closer to the mean and variance of the distribution. What am I missing?

Answer

Incanter had a bug inherited from Parallel Colt. The treatment of parameters is not consistent across methods in Parallel Colt. See issue report https://github.com/incanter/incanter/issues/245.

Community
  • 1
  • 1
A. Webb
  • 26,227
  • 1
  • 63
  • 95

1 Answers1

4

Contrary to numpy.random.gamma which takes shape (k) and scale (theta) as parameters, clojures gamma-distribution takes shape (k) and rate (1/theta) as parameters. See (doc gamma-distribution)and http://en.wikipedia.org/wiki/Gamma_distribution

Thus, to get the desired results, you may define dist as

(def dist 
  (let [mean 0.71 
        variance 2.89 
        r (/ mean variance) 
        k (* mean r) ] 
    (gamma-distribution k r)))

A sample result is then

(def samples (repeatedly 10000 #(draw dist)))
#'incanter-test.core/samples
incanter-test.core=> (stats/mean samples)
0.7163908381930312
incanter-test.core=> (stats/variance samples)
2.940867216122528
Terje D.
  • 6,250
  • 1
  • 22
  • 30
  • Ah, thanks. This answers half the question. If you take `dist` as defined here then Incanter will report `(mean dist) ;=> 0.042852`, `(variance dist)` ;=> 0.010527`. In other words, the authors of the `mean` and `variance` methods also assumed that the parameters were shape (k) and scale (theta). Our demonstrations suggest the doc string is correct and the methods are wrong. Bug. – A. Webb Apr 19 '14 at 11:20