Numpy size requirement

Question

I recently stumbled over a strange numpy behavior which I do not understand: I have a list of experiments. Each of the experiments itself is again a list of samples. So I end up with a list of lists. Experiments were conducted under various conditions, so some of them contain more samples than others. They all have in common that they contain well over 100 samples.

Now I wanted to compute the mean and standard deviation of the samples for every experiment. What works for me is

sDevPD = [np.std(x) for x in f0PD]

where I simpy iterate over all the lists in my list of experiments f0PD. Okay, now I tried using numpy:

sDevPD = np.std(f0PD, axis = 1)

This does not work, numpy will throw IndexError: tuple index out of range. I tried to track down the error the best I could, and I found that the numpy function throws this error only if the experiments vary in size. If I have a list of lists that are all of the same length, everything works fine. The same applies for np.mean.

Can anybody please explain this behavior to me? I think it is absolutely legit to compute standard deviations for differently sized lists.

Related-ish https://stackoverflow.com/questions/3386259/how-to-make-a-multidimension-numpy-array-with-a-varying-row-size — DavidG, May 08 '18 at 09:11
If `f0PD` is a list of varying size lists, then `np.array(f0PD)` will be a 1d object dtype array of those same lists. That's why `axis=1` gives the error; the argument doesn't have enough dimensions. `np.std` is designed to work with multidimensional array, not a ragged list. — hpaulj, May 08 '18 at 16:54

ma3oun · Accepted Answer · 2018-05-08T09:39:25.720

Numpy is really a matrix library. It doesn't do well with variable length arrays. All operations must be able to be broadcast, which is not the case in your example...Instead of using Numpy, try using Pandas. It relies on numpy for elementary operations. For example:

import pandas as pd
import numpy as np

f0PD = []
for _ in range(10): # here I assume f0PD is a list of 10 lists
    f0PD.append(np.arange(np.random.randint(20))) # this creates lists of random size, up to 20

df = pd.DataFrame(f0PD)
df.std(axis=1) # this works well, regardless of the size of elementary lists

Numpy size requirement

1 Answers1