1

`Improve performance in python 'for' loop: I need to know How can i decide whether my loop is efficient or not? If it iterates over X number of loops, what should be the acceptable time taken?

I was trying to make a function to create a frequency distribution table with python. I have a continuous data in form of numpy array, i want to make class intervals and put each elements in these class intervals(I use 'for loop' to do it). I have created the function but i'm not convinced if my function is efficient or not.

def maketable(data,bins):
    data=np.array(data)
    edges=np.linspace(min(data),max(data),bins)  #creating classintervals
    classes={(edges[x],edges[x+1]):0 for x in range(bins-1)} #{tuple of classlimits:frequency}
    #for every value in data array we check if it falls in an interval(a bin) if yes,increment frequency 
    for val in data:
       for interval in classes.keys():
           if val>=interval[0] and val<=interval[1]:
              classes[interval]+=1
              break
    return(classes)

"Finished 'maketable' in 0.17328 secs ". The data contains 20,604 values and the function takes 0.17 secs to complete. I want to know if its ok or not. i appreciate any kinds help.

  • Hint: you should be able to obtain the same result with pandas' [`cut`](https://pandas.pydata.org/pandas-docs/version/0.23.4/generated/pandas.cut.html) function, it may even be faster – rrobby86 May 14 '19 at 21:29
  • i didnt know that one too. Thanks buddy. I spent a few hours making it – Kewal Takhellambam May 14 '19 at 21:42

1 Answers1

0

So it looks like what you are actually trying to obtain is the histogram of some data. Your function could then be implemented using numpy with:

classes, bins = np.histogram(data, bins=bins)

Then you can return your classes name.

albeksdurf
  • 124
  • 8
  • oh i didnt know that one. Thanks . I want to know how do decide if a loop is efficient or not. Is there any relation between the number of items and time taken? Like looping for this much items time should be around that much. Python seems to be slow so i cant go further if i dont know whether my loop is ok or not. – Kewal Takhellambam May 14 '19 at 21:39
  • @KewalTakhellambam There's no correlation between number of iterations and loop time except the scaling factor. It totally depends what's being done on each iteration! – Two-Bit Alchemist May 14 '19 at 21:40
  • @KewalTakhellambam you always want to look for implementations in libraries such as numpy or pandas, as they generally implement this in Cython, which is much faster. Compare np.histogram and your for loop :) – albeksdurf May 14 '19 at 21:43
  • albeksdurf you are right. i spent some hours. i could have used one of the library functions :) – Kewal Takhellambam May 14 '19 at 21:48