1

Why would I use reset_index(drop=True), when the alternative is much faster? I am sure there is something I am missing. (Or my timings are bad somehow...)

import pandas as pd

l = pd.Series(range(int(1e7)))

%timeit l.reset_index(drop=True)
# 35.9 ms +- 1.29 ms per loop (mean +- std. dev. of 7 runs, 10 loops each)

%timeit l.index = range(int(1e7))
# 13 us +- 455 ns per loop (mean +- std. dev. of 7 runs, 100000 loops each)
jpp
  • 159,742
  • 34
  • 281
  • 339
The Unfun Cat
  • 29,987
  • 31
  • 114
  • 156
  • Of the top of my head: 1) You are hard coding the length of the DataFrame (probably negligible) 2) `reset_index(drop=True)` returns a copy (can be an advantage if you are chaining methods). – ayhan May 09 '18 at 09:35
  • Great. Thanks. I guess this q deserves an answer instead of deletion since I am sure sb else will wonder the same thing. – The Unfun Cat May 09 '18 at 09:36
  • @user2285236, For (1) using `len(l.index)` instead doesn't add much time, (2) I often hear "chaining methods" as something that is *inherently good* about pandas, but I find it often obfuscates logic. – jpp May 09 '18 at 09:37
  • @jpp I guess it is considered good because the *hardest* thing in programming is naming things and method chaining helps you with that. – ayhan May 09 '18 at 10:04

1 Answers1

7

The costly operation in reseting the index is not to create the new index (as you showed, that is super fast) but to return a copy of the series. If you compare:

%timeit l.reset_index(drop=True)
22.6 ms ± 172 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit l.index = range(int(1e7))
14.7 µs ± 348 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%timeit l.reset_index(inplace=True, drop=True)
13.7 µs ± 121 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

You can see that the inplace operation (where no copy is returned) is more or less equally fast as your methode. However it is generally discouraged to perform inplace operations.

P.Tillmann
  • 2,090
  • 10
  • 17