0

I have a large number of array in pandas with 256 row and 5 columns and I would like to calculate statistical(min, max, mean, ....) features for 4 members of array in each column. i wrote the following code but it is so time-consuming:

for col in array:
    for j in range(0,256,1):
        min = array[col].iloc[j:j+4].min()
        max= array[col].iloc[j:j+4].max()
        (other functions)

as I have many array and i would like to do this task for each array it is very time consuming. is there any way to write a simpler code without loop that decreases the time of execution.

Artemis
  • 2,553
  • 7
  • 21
  • 36
MSN
  • 173
  • 4
  • 12
  • 1
    Um, *what exactly are you dealing with here*. `iloc` is likely for some `pandas` data structure. – juanpa.arrivillaga Jul 10 '17 at 01:56
  • may data structure is in pandas but i try to ask my question in easy way – MSN Jul 10 '17 at 01:59
  • 2
    Not giving the details does not make it easy. It makes it not well defined. – juanpa.arrivillaga Jul 10 '17 at 02:00
  • 1
    something similar to this https://stackoverflow.com/questions/25479607/pandas-min-of-selected-row-and-columns – user93 Jul 10 '17 at 02:16
  • yes it is similar but it want to do this task without for loop – MSN Jul 10 '17 at 02:26
  • This looks like windowing: easy for *avg*, challenging for *min*, *max*. `but it is so time-consuming` & `without [loop/for]`: you are faced with problem X (something taking more time than you are willing to invest) and ask for a solution to [problem Y](http://xyproblem.info/) (how to achieve the same effect without one particular language construct). `simpler code without loop that decreases the time` looping isn't *increasing* processing time, but have a look at [itertools](https://docs.python.org/3/library/itertools.html) and [statistics](https://docs.python.org/3/library/statistics.html). – greybeard Jul 10 '17 at 02:41
  • 1
    I am not sure if you can apply a function to each n/4 elements without using up O(n) time. – Unni Jul 10 '17 at 02:41
  • 1
    (@Unni: sadly, it isn't even n/k, but 2n/k (entry&exit window/window size), and constants worsened. Either the link to statistics triggers *affordable approximate results using sampling* - or not.) – greybeard Jul 10 '17 at 07:20

1 Answers1

1

You want to calculate min and max for 4 consecutive elements of a pandas.DataFrame?

This can be done using pandas rolling:

df.rolling(4).agg(['min', 'max']).shift(-3)

The shift is necessary as the default for pandas is to have the window right aligned.

MaxNoe
  • 14,470
  • 3
  • 41
  • 46