0

I access rows in pandas with the loc function as below:

pdf.loc[pdf.a>2]

Is this vectorised? Is it better than using numpy

pdf[pdf.a>2]
Henry Ecker
  • 34,399
  • 18
  • 41
  • 57
Chogg
  • 389
  • 2
  • 19
  • Following up from https://stackoverflow.com/questions/34426247/vectorized-update-to-pandas-dataframe/34426589?noredirect=1#comment98407630_34426589 – Chogg Apr 26 '19 at 19:54
  • i think `loc[]` is better then a for loop when you do a conditional update based on columns. – anky Apr 26 '19 at 20:03
  • `numpy` will be faster, but then you lose the indices, which are super useful and inherent to pandas. `pdf.to_numpy()[np.where(pdf.a > 2)[0]]` should be faster than `.loc` – ALollz Apr 27 '19 at 02:14

1 Answers1

1

This timing suggests there is no slow down with loc

testa = pd.DataFrame(np.arange(10000000),columns =['q'])
%timeit testb = testa.loc[testa.q>6] 
%timeit testc = testa[testa.q>7]

1 loop, best of 3: 207 ms per loop
1 loop, best of 3: 208 ms per loop
cs95
  • 379,657
  • 97
  • 704
  • 746
Chogg
  • 389
  • 2
  • 19
  • Reading a bit more into this, vectorisation just means that the for loop is done at the c level. Presumably this can be done in loc. One other thing I'm confused about is why the loc command uses [ ] rather than ( ). Presumably this implies something about what loc is doing that I have never understood. – Chogg May 01 '19 at 23:17