-3

I'm trying to calculate standard deviation in python without the use of numpy or any external library except for math. I want to get better at writing algorithms and am just doing this as a bit of "homework" as I improve my python skills. My goal is to translate this formula into python but am not getting the correct result.

I'm using an array of speeds where speeds = [86,87,88,86,87,85,86]

When I run:

std_dev = numpy.std(speeds)
print(std_dev)

I get: 0.903507902905. But I don't want to rely on numpy. So...

My implementation is as follows:

import math

speeds = [86,87,88,86,87,85,86]

def get_mean(array):
    sum = 0
    for i in array:
        sum = sum + i
    mean = sum/len(array)
    return mean

def get_std_dev(array):
    # get mu
    mean = get_mean(array)
    # (x[i] - mu)**2
    for i in array:
        array = (i - mean) ** 2
        return array
    sum_sqr_diff = 0
    # get sigma
    for i in array:
        sum_sqr_diff = sum_sqr_diff + i
        return sum_sqr_diff
    # get mean of squared differences
    variance = 1/len(array)
    mean_sqr_diff = (variance * sum_sqr_diff)
    
    std_dev = math.sqrt(mean_sqr_diff)
    return std_dev

std_dev = get_std_dev(speeds)
print(std_dev)

Now when I run:

std_dev = get_std_dev(speeds)
print(std_dev)

I get: [0] but I am expecting 0.903507902905

What am I missing here?

bkleeman
  • 109
  • 1
  • 14
  • `math` is not an external library. – Kelly Bundy Nov 23 '21 at 20:49
  • what is your input? how we can reproduce that answer? – Zain Ul Abidin Nov 23 '21 at 20:53
  • 1
    You are defining speeds and calling it with narrow_speed. Doesn't that give you an error? – cup Nov 23 '21 at 20:54
  • @ZainUlAbidin All the code was available in the question body but it wasn't all in the same block. I have edited the part where I show my implementation to include everything necessary to reproduce. – bkleeman Nov 23 '21 at 21:00
  • 1
    Your code is not returning `[0]` ! –  Nov 23 '21 at 21:01
  • @cup You're right, thank you for pointing that out! I modified the code I was working with for a minimal example and forgot to change the variable name. I updated my code to call `get_std_dev` with `speeds`. – bkleeman Nov 23 '21 at 21:01
  • @YvesDaoust What are you getting when you run it? – bkleeman Nov 23 '21 at 21:02
  • Your first mistake: `return array`. –  Nov 23 '21 at 21:02
  • @YvesDaoust Can you elaborate? – bkleeman Nov 23 '21 at 21:03
  • You return a number, so the result can't be `[0]`, as that's a list. – Kelly Bundy Nov 23 '21 at 21:05
  • @YvesDaoust `array = (i - mean) ** 2` comes earlier. – Kelly Bundy Nov 23 '21 at 21:07
  • @KellyBundy: why do you tell me ? –  Nov 23 '21 at 21:09
  • @YvesDaoust Just pointing out your comment is mistaken. – Kelly Bundy Nov 23 '21 at 21:14
  • @KellyBundy: my comment is not mistaken. I am pointing an error in the OP's post. The code is not returning `[0]` as he describes (and I know very well why it doesn't). By the way, the OP made no effort to fix the text. –  Nov 23 '21 at 21:23
  • @YvesDaoust `array = (i - mean) ** 2` is another mistake and it comes earlier than `return array`, so no, `return array` is *not* their first mistake. You're mistaken. – Kelly Bundy Nov 23 '21 at 21:36
  • @KellyBundy: ok, I didn't know you were referring to my second comment. –  Nov 23 '21 at 21:48
  • @KellyBundy: anyway, the assignment to `array` is the first coding mistake in the program text, but at run-time the `return` statement results in a problem *before* the assignment does. At this stage, assigning `array` does no harm. –  Nov 23 '21 at 21:51

4 Answers4

2

The problem in your code is the reuse of array and return in the middle of the loop

def get_std_dev(array):
    # get mu
    mean = get_mean(array)       <-- this is 86.4
    # (x[i] - mu)**2
    for i in array:
        array = (i - mean) ** 2  <-- this is almost 0
        return array             <-- this is the value returned

Now let us look at the algorithm you are using. Note that there are two std deviation formulas that are commonly used. There are various arguments as to which one is correct.

sqrt(sum((x - mean)^2) / n)

or

sqrt(sum((x - mean)^2) / (n -1))

For big values of n, the first formula is used since the -1 is insignificant. The first formula can be reduced to

sqrt(sum(x^2) /n - mean^2)

So how would you do this in python?

def std_dev1(array):
   n = len(array)
   mean = sum(array) / n
   sumsq = sum(v * v for v in array)
   return (sumsq / n - mean * mean) ** 0.5
cup
  • 7,589
  • 4
  • 19
  • 42
1
speeds = [86,87,88,86,87,85,86]

# Calculate the mean of the values in your list
mean_speeds = sum(speeds) / len(speeds)

# Calculate the variance of the values in your list
# This is 1/N * sum((x - mean(X))^2)
var_speeds = sum((x - mean_speeds) ** 2 for x in speeds) / len(speeds)

# Take the square root of variance to get standard deviation
sd_speeds = var_speeds ** 0.5

>>> sd_speeds
0.9035079029052513
CJR
  • 3,916
  • 2
  • 10
  • 23
  • When I run that I get `1.0`. – bkleeman Nov 23 '21 at 21:13
  • Restart your python kernel. Something you've done has screwed with one of the built-in functions. – CJR Nov 23 '21 at 21:14
  • 1
    Oh, never mind, you're using python2.7, arent you. Add `from __future__ import division` - the standard division `/` is not true division until python 3.0 unless you import from future. – CJR Nov 23 '21 at 21:18
  • yes I'm using 2.7. Your solution plus the future division import is working for me now. Thank you very much for the help! – bkleeman Nov 23 '21 at 21:28
  • It's time to move to py3, mate. – CJR Nov 23 '21 at 21:30
  • I'm pretty new to python and I haven't yet figured out the rhyme or reason as to when my machine runs py2 vs py3 to be honest. I'll have to get that sorted out. – bkleeman Nov 23 '21 at 21:32
  • 1
    A lot of linux distros ship with py2.7 and py3 - you probably have python3 (but the binary is `python3` instead of just `python`). You can also consider using something like anaconda to set up environments. py2.7 is well past end-of-life. – CJR Nov 23 '21 at 21:34
-1

some problems in the code, one of them is the return value inside the for statement. you can try this

def get_mean(array):
    return sum(array) / len(array)


def get_std_dev(array):
    n = len(array)
    mean = get_mean(array)
    squares_arr = []
    for item in array:
        squares_arr.append((item - mean) ** 2)
    return math.sqrt(sum(squares_arr) / n)
Hadar
  • 658
  • 4
  • 17
-1

If you don't want to use numpy its ok give a try to statistics package in python

import statistics

st_dev = statistics.pstdev(speeds)
print(st_dev)

or if you are still willing to use a custom solution then I recommend you to use the following way using list comprehension instead of your complex buggy approach

import math

mean = sum(speeds) / len(speeds)
var = sum((l-mean)**2 for l in speeds) / len(speeds)
st_dev = math.sqrt(var)
print(st_dev)
Zain Ul Abidin
  • 2,467
  • 1
  • 17
  • 29