2

Is it possible to include a break in the Pandas apply function?

I have a set of very large dataframes that I need to apply a function to as part of an optimization problem. This seems like the best approach but there's significant daylight between the best-case and worst-case scenarios. Best case, because the dataframe is ordered, the first solution I try works and is the best I'll find in that dataframe. If I could put in a break then I would avoid having to apply the function to the rest of the rows. But worst-case, there's no solution in the dataframe, so I want to run through the whole dataframe as fast as I can and go on to the next one.

Without being able to insert a break in apply, my best-case is terrible. With a lazy iterator, my worst-case is terrible. Is there a way to quickly apply a function to a dataframe but also stop when some criterion is met?

  • I think [this](https://stackoverflow.com/questions/42053223/how-to-run-a-function-on-each-row-in-pandas-dataframe-and-have-it-stop-when-a-co) answer could help you. – Logan George Jun 04 '20 at 03:01
  • 1
    Why break, instead first filter the dataframe based on the condition then use ``apply`` on the filtered dataset. – sushanth Jun 04 '20 at 03:02
  • 1
    @Sushanth Because the output of the function can't be known before the function is run :( – Kung Fu Howie Jun 04 '20 at 03:11
  • No, you cannot "break", not cleanly. In any case, `.apply` is basically a for-loop underneath the hood (although recent versions of pandas have optimized it somewhat, I believe). But you should just use a loop, which you can break, probably using somthing like `.itertuples` and `.iat` etc – juanpa.arrivillaga Jun 04 '20 at 03:11
  • @KungFuHowie no, in general, a for loop is not slower than apply, if you do it correctly. Let's see your loop. – juanpa.arrivillaga Jun 04 '20 at 03:12

0 Answers0