Calculating mean and standard deviation and ignoring 0 values

Question

I have a list of lists with sublists all of which contain float values. For example the one below has 2 lists with sublists each:

 mylist =  [[[2.67, 2.67, 0.0, 0.0], [2.67, 2.67, 2.0, 2.0], [2.67, 2.67, 2.0, 2.0], [2.67, 2.67, 2.0, 2.0], [2.67, 2.67, 2.0, 2.0], [2.67, 2.67, 2.0, 2.0], [0.0, 2.67, 2.0, 2.0], [2.67, 2.67, 2.0, 2.0], [2.67, 2.67, 2.0, 2.0], [2.67, 2.67, 2.0, 2.0]], [[2.67, 2.67, 2.0, 2.0], [0.0, 2.67, 2.0, 2.0], [2.67, 2.67, 2.0, 2.0], [2.67, 2.67, 2.0, 2.0], [2.67, 2.67, 2.0, 2.0], [0.0, 0.0, 0.0, 0.0], [2.67, 2.67, 2.0, 2.0], [2.67, 2.67, 2.0, 2.0], [2.67, 2.67, 2.0, 2.0], [2.67, 2.67, 2.0, 2.0], [2.67, 2.67, 2.0, 2.0], [2.67, 2.67, 2.0, 2.0], [2.67, 2.67, 2.0, 2.0], [2.67, 2.67, 2.0, 2.0]]]

I want to calculate the standard deviation and the mean of the sublists and what I applied was this:

mean = [statistics.mean(d) for d in mylist]
stdev = [statistics.stdev(d) for d in mylist]

but it takes also the 0.0 values that I do not want because I turned them to 0 in order not to be empty ones. Is there a way to ignore these 0s as they do not exist in the sublist?To not take them under consideration at all? I could not find a way for how I am doing it.

Why aren't we using appropriate data science libraries like `numpy` or `pandas`? — Parfait, May 15 '20 at 23:42
@Parfait to be honest I am new to python and using so many lists with data so I am learning now. But if it is easier with those libraries I will give it a try — piggy, May 15 '20 at 23:49

nocibambi · Accepted Answer · 2020-05-15T23:54:35.287

3

You can use numpy's nanmean and nanstd functions.

import numpy as np


def zero_to_nan(d):
    array = np.array(d)
    array[array == 0] = np.NaN
    return array


mean = [np.nanmean(zero_to_nan(d)) for d in mylist]
stdev = [np.nanstd(zero_to_nan(d)) for d in mylist]

edited May 15 '20 at 23:54

answered May 15 '20 at 23:45

nocibambi

2,065
1
16
22

like this the 0s will be ignored? – piggy May 15 '20 at 23:46
@piggy Actually I just realized that you are talking about 0s and not missing values. I adjusted the code so now it replaces the zeros with the missing values and then implements the numpy functions. – nocibambi May 15 '20 at 23:56

Peter Schaumann · Answer 2 · 2020-05-16T00:56:21.897

You can do this with a list comprehension.

The following lambda function flattens the nested list into a single list and filters out all zeros:

flatten = lambda nested: [x for sublist in nested for x in sublist if x != 0]

Note that the list comprehension has two for and one ifstatement similar to this code snippet, which does essentially the same:

flat_list = []

for sublist in nested:
   for x in sublist:
       if x != 0:
           flat_list.append(x)

To apply this to your list you can use map. The map function will return an iterator. To get a list we need to pass the iterator to list:

flat_list = list(map(flatten, myList))

Now you can calculate the mean and standard deviation:

mean = [statistics.mean(d) for d in flat]
stdev = [statistics.stdev(d) for d in flat]

print(mean)
print(stdev)

Thank you for providing an explanation of your code and a link to documentation. These are best practices on Stack Overflow. This is a really good first post. — Jeremy Caney, May 16 '20 at 00:52

score 1 · Answer 3 · answered May 16 '20 at 00:15

1

mean = [statistics.mean(d) for d in mylist if d != 0]
stdev = [statistics.stdev(d) for d in mylist if d != 0]

answered May 16 '20 at 00:15

user13552326

11
1

This works as well! I will just accept the answer above because of its efficiency but for what I asked you are correct! Thank you so much! – piggy May 16 '20 at 00:20
1

Welcome to SO. While this code snippet may be the solution, including an explanation really helps to improve the quality of your post. Remember that you are answering the question for readers in the future, and those people might not know the reasons for your code suggestion. – alan.elkin May 16 '20 at 00:28

score 0 · Answer 4 · answered May 15 '20 at 23:42

0

Try:

mean = [statistics.mean([k for k in d if k]) for d in mylist]
stdev = [statistics.stdev([k for k in d if k]) for d in mylist]

answered May 15 '20 at 23:42

Eli Baum

58
1
6

Calculating mean and standard deviation and ignoring 0 values

4 Answers4