5

While doing some coding exercise, I came across this problem:

"Write a function that takes in a list of dictionaries with a key and list of integers and returns a dictionary with standard deviation of each list."

e.g

input = [
    
    { 
        'key': 'list1',
        'values': [4,5,2,3,4,5,2,3]
        },

    {
        'key': 'list2',
        'values': [1,1,34,12,40,3,9,7],
    }
]

Answer: Answer: {'list1': 1.12, 'list2':14.19}

Note, the 'values' is actually the key to the list of the value, a little decepting at first!

My attempt:

def stdv(x):
    
    for i in range(len(x)):
        
        for k,v in x[i].items():
            result = {}
            print(result)
        
            if k == 'values':
                mean = sum(v)/len(v)                
                variance = sum([(j - mean)**2 for j in v]) / len(v)        
                
                stdv = variance**0.5
                
                return stdv   # not sure here!!
            
            result = {k, v} # this is where i get stuck

I was able to calculate the standard deviation, but I have no idea how to put the results back into the dictionary as suggested in the answer. Can anyone shed some lights into it? Much appreciated!

illusion
  • 1,272
  • 11
  • 22
stackword_0
  • 185
  • 8

4 Answers4

3

I would try another implementation of the std calculation since that one is O(2n), because you first loop to get the mean and then to get the std. It can be done in a single loop as noted here.

I'm not sure about it, but i think numpy's implementation does that. So, you can make a function like this one:

from numpy import std
def stdv(list_of_dicts):
    return {d['key'] : std(d['values']) for d in list_of_dicts}

UPDATED:
if you really need to implement the std calculation yourself, you can make another function for that:

def std(arr):
    n = len(arr)
    if n == 0: return 0

    _sum, _sq_sum = 0, 0
    for v in arr:
        _sum += v 
        _sq_sum += v ** 2
    _sq_mean = (_sum / n) ** 2
    return (_sq_sum / n - _sq_mean) ** 0.5

Since you said that this is a coding exercise, i will try to point out where i think your mistake is.

You made a loop in x[i].items() to get the key and the values and then check whether you key is 'values' to perform the std calculation. Since you want to store the result in a dictionary, you also need to have the value in the 'key' field simultaneously. With that loop you are only getting one of those at a time.

Also, not directly related, but if you want to loop over a list to get the values inside, and you dont care about the index, is better to do:

for x_i in x:
    for k,v in x_i.items():

Instead of:

for i in range(len(x)):
    for k,v in x[i].items():

I would recomend this video.

vLabayen
  • 131
  • 5
1

Try the following, note that it is not adding the values to the array of dictionaries. Instead, it returns a new dictionary (AS SHOWN IN 'Answer:') where each key is the key from the array of dictionaries...:

def stdv(x):
  ret = {}
  for i in range(len(x)):
    v = x[i]['values']
    mean = sum(v)/len(v)
    variance = sum([(j - mean)**2 for j in v]) / len(v)        
    ret[x[i]['key']] = variance**0.5
  return ret  
illusion
  • 1,272
  • 11
  • 22
  • 1
    I came up with exactly same solution later with some help, I think, I had a wrong idea on appending the value in (key, value) pair. And this method is actually neat and eye opening to me. Thank you though. – stackword_0 Dec 15 '20 at 17:52
  • Cool... Thanks! lemme know if you need anything else... – illusion Dec 15 '20 at 19:25
1

Using statistics.pstdev and dictionary comprehensions.

from statistics import stdev, pstdev

#dont shadow the input builtin!
input_ = [
    
    { 
        'key': 'list1',
        'values': [4,5,2,3,4,5,2,3]
        },

    {
        'key': 'list2',
        'values': [1,1,34,12,40,3,9,7],
    }
]

result = { di["key"] : pstdev(di["values"]) for di in input_}  
print(result)

output:

{'list1': 1.118033988749895, 'list2': 14.185710239533304}
JL Peyret
  • 10,917
  • 2
  • 54
  • 73
0

you can add with the update function like this

x = [

{ 
    'key': 'list1',
    'values': [4,5,2,3,4,5,2,3]
    },

{
    'key': 'list2',
    'values': [1,1,34,12,40,3,9,7],
}
]
arr=[]
for i in range(len(x)):
    
    for k,v in x[i].items():
        result = {}
        print(result)
    
        if k == 'values':
            mean = sum(v)/len(v)                
            variance = sum([(j - mean)**2 for j in v]) / len(v)        
            
            stdv = variance**0.5
            
            #print( stdv)   # not sure here!!
            arr.append(stdv)
        #result = {k, v} # this is where i get stuck
for i in range(len(arr)):
    x[i].update({"varience":arr[i]})
print(x)    
Pratik Agrawal
  • 405
  • 3
  • 17