1

Why can't statistics.mode find the mode for a normally distributed (therefore, unimodal) random variable, but works fine for vectors containing integers?

import numpy as np
from numpy.random import rand,randn
import statistics as st

y = randn(20)
print(st.mode(y))

This returns the following error

StatisticsError: no unique mode; found 20 equally common values
Lawhatre
  • 1,302
  • 2
  • 10
  • 28
develarist
  • 1,224
  • 1
  • 13
  • 34

2 Answers2

1

That's because mode doesn't exist. The number of unique element in y and the total element in y are same so no mode exits by definition.

np.size(np.unique(y)) - np.size(y)

>>> 0

Mode doesn't exist can also be verified by looking at the histogram (flat in the present case). Peaks in this graph represents mode and since we cann't find a peak, mode is None.

Histogram for y

Edit: If you want to really find the mode then

  1. Draw enough samples from the distribution. So that it reflects the original pdf
  2. Adjust the precision (I have rounded it off to 1 decimal place). Consequently, the model will have a error range accordingly.
import numpy as np
from numpy.random import rand,randn
import statistics as st

y = randn(10000000)
st.mode(list(np.round(y,1)))

This gives

>>> 0.0 

Following is the hist (See now you also get a peak at 0.0) Hist for large samples

Lawhatre
  • 1,302
  • 2
  • 10
  • 28
  • A normal distribution always has a mode though so the program is wrong obviously, even the histogram. what is the work around – develarist Nov 16 '20 at 17:33
  • The output of the ```statistics ``` library is 100% correct. You cann't compute the mode in your case. – Lawhatre Nov 16 '20 at 17:35
  • Although, you are sampling from a normal distribution but the sample size matters! The present sample size is not a representation of the normal distribtion but rather a uniform distribution. And thats exactly why you are getting the error by the ```statistics``` – Lawhatre Nov 16 '20 at 17:39
  • Check out the edit @develarist, this will solve your problem. – Lawhatre Nov 16 '20 at 17:49
  • I've raised the sample size to 2000000 before and the error is still there – develarist Nov 16 '20 at 17:58
  • Can you please copy the code I provided and then run it. Its working fine – Lawhatre Nov 16 '20 at 18:01
  • what is the `round` for, and what if it is not included – develarist Nov 17 '20 at 13:50
  • Can you pls change the decimal places for rounding off ranging from 1 to say 6. For each setting observe the peak in the histogram. Also calculate mode for each. you will then understand its purpose. – Lawhatre Nov 17 '20 at 15:40
  • Actually, the pdf will gradually change from normal to uniform. round provides the precision of number measurement. – Lawhatre Nov 17 '20 at 15:43
1

randn returns a third-party ndarray rather than a Python builtin array (i.e. a list). The statistics module was not built to serve numpy explicitly and so unexpected behaviour occurs.

A solution could be converting y to a list (i.e. st.mode(list(y))).

honno
  • 53
  • 1
  • 6
  • Completly disagree with the reasoning!! – Lawhatre Nov 16 '20 at 17:39
  • @Ragnar Ah so when I reproduced OP's example, I got a number which wasn't even in `y`, and so thought that was their problem (and converting to `list` solved it). If it's something else then ignore me. – honno Nov 16 '20 at 17:43
  • Ragnar updated his answer using your `list` idea though – develarist Nov 16 '20 at 17:59
  • 1
    Dunno what Ragnar is up to, but try running your example and change the last line i.e. `print(st.mode(y))` -> `print(list(st.mode(y)))`. Also note that `y` will not have a mode or mean of exactly `0`, as it is just generating numbers that will have modes or means that _tend towards_ `0`. – honno Nov 16 '20 at 18:04
  • I don't understand why someone would say your idea doesn't work and takes it for their own – develarist Nov 16 '20 at 18:06
  • 1
    Ragnars example isn't doing what I suggest (why `round` or change the sample size). – honno Nov 16 '20 at 18:07
  • @develarist Ah so what's the specific problem? Ragnar edited your question to say you get a `StatisticsError` raised—is that true? If so I can't help, because I can't reproduce that problem. I tried this on Python3.8. – honno Nov 17 '20 at 14:04
  • the error was also in the question title. i have the same python. why don't you get the error – develarist Nov 17 '20 at 14:07
  • @develarist Hmm, could you print the output of `y`? – honno Nov 17 '20 at 15:38
  • `y` is a random number generator, so it's stochastic – develarist Nov 17 '20 at 16:14
  • @develarist That's the thing, I'm guessing something is wrong with your Python environment so that the "OS-level RNG" being used by `randn` is faulty.. Just `print(y)` to see the contents of `y`. If you mean there's a different value for `y` everytime, then yeah I know that should be the case. – honno Nov 17 '20 at 17:39