0

As seen here i am doing a isin() call which takes only 11126 to complete. Then i do a boolean indexing on that isin() but suddenly the time needed to complete that task is ~18x higher at 187088.

 2      11126.0   5563.0      0.5      randomness = ~dataframe.certificate_status.isin(
61         1          4.0      4.0      0.0          [
62                                                       "tamagotchi",
63                                                       "nintendo",
64                                                       "megaman",
65                                                       "mic_check",
66                                                       "onetwothree",
67                                                       "test",
68                                                       "else",
69                                                       "something",
70                                                   ]
71                                               )
72                                           
73         1     187088.0 187088.0      8.9      dataframe = dataframe.loc[randomness]

I actually expected boolean indexing to be faster than isin(). Can someone explain why i am getting the results seen here?

zacko
  • 179
  • 2
  • 9
  • as an aside, why do you expect boolean to be faster than isin. I see it the other way, isin is likely executed by a binary search which is fast. the boolean indexing I'd suspect would be linear, and dependent on the size of the data. I have little knowledge on the implementation of boolean indexing, just an assumption – sammywemmy Jun 10 '22 at 06:45
  • you should provide a reproducible example – mozway Jun 10 '22 at 07:05
  • But isn't boolean indeixing just taking a vector, thus one operation? Why does loc have to search at all? It knows the positions @sammywemmy I'll provide a reproducable example tomorrow – zacko Jun 10 '22 at 20:49
  • I also forgot to add that i expected loc to be blazing fast, because i use the [True, False, ... ] Vector generated from randomness to select the columns that i want. – zacko Jun 10 '22 at 20:51

0 Answers0