Does loc in pandas use vectorised logic or a for loop?

Question

I access rows in pandas with the loc function as below:

pdf.loc[pdf.a>2]

Is this vectorised? Is it better than using numpy

pdf[pdf.a>2]

Following up from https://stackoverflow.com/questions/34426247/vectorized-update-to-pandas-dataframe/34426589?noredirect=1#comment98407630_34426589 — Chogg, Apr 26 '19 at 19:54
i think `loc[]` is better then a for loop when you do a conditional update based on columns. — anky, Apr 26 '19 at 20:03
`numpy` will be faster, but then you lose the indices, which are super useful and inherent to pandas. `pdf.to_numpy()[np.where(pdf.a > 2)[0]]` should be faster than `.loc` — ALollz, Apr 27 '19 at 02:14

score 1 · Answer 1 · edited Apr 26 '19 at 20:01

1

This timing suggests there is no slow down with loc

testa = pd.DataFrame(np.arange(10000000),columns =['q'])
%timeit testb = testa.loc[testa.q>6] 
%timeit testc = testa[testa.q>7]

1 loop, best of 3: 207 ms per loop
1 loop, best of 3: 208 ms per loop

edited Apr 26 '19 at 20:01

cs95

answered Apr 26 '19 at 19:52

Chogg

Reading a bit more into this, vectorisation just means that the for loop is done at the c level. Presumably this can be done in loc. One other thing I'm confused about is why the loc command uses [ ] rather than ( ). Presumably this implies something about what loc is doing that I have never understood. – Chogg May 01 '19 at 23:17

1 Answers1