Does pandas index have advantage on performance than column?

Question

Till now, I used to put timestamps as the index for my time series dataframe. I felt that if I put the timestamps as the index, I might have performance gain when I search the data comparing to the search with the timestamps in a column. (It's kind of my feeling from the name, 'index'. I felt it might be indexed.) But I start to feel it might not the case.

Is there any advantage of using the index comparing to the column?

if your index is unique, you should get a performance gain. non-unique indexes not so much. [read here](https://stackoverflow.com/questions/16626058/what-is-the-performance-impact-of-non-unique-indexes-in-pandas) — sammywemmy, Feb 17 '21 at 05:24
@sammywemmy That was the great answer. But how about sort_values? Doesn't it make the column indexed? — hbadger19042, Feb 17 '21 at 05:33

score 1 · Accepted Answer · answered Feb 17 '21 at 05:48

One answer is in terms of data frame size. I have a data frame with 50M rows

df_Usage.info()

output

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 49991484 entries, 0 to 49991483
Data columns (total 7 columns):
BILL_ACCOUNT_NBR         int64
MM_ADJ_BILLING_YEARMO    int64
BILLING_USAGE_QTY        float64
BILLING_DAYS_CNT         int64
TARIFF_RATE_TYP          object
READ_FROM                object
READ_TO                  object
dtypes: float64(1), int64(3), object(3)
memory usage: 2.6+ GB

Setting the first two columns as index (one includes time)

df_Usage['MM_ADJ_BILLING_YEARMO'] = pd.to_datetime(df_Usage['MM_ADJ_BILLING_YEARMO'],  format='%Y%m')
df_Usage.set_index(['BILL_ACCOUNT_NBR','MM_ADJ_BILLING_YEARMO'],inplace = True)
df_Usage.info()

<class 'pandas.core.frame.DataFrame'>
MultiIndex: 49991484 entries, (5659128163, 2020-09-01 00:00:00) to (7150058108, 2020-01-01 00:00:00)
Data columns (total 5 columns):
BILLING_USAGE_QTY    float64
BILLING_DAYS_CNT     int64
TARIFF_RATE_TYP      object
READ_FROM            object
READ_TO              object
dtypes: float64(1), int64(1), object(3)
memory usage: 2.1+ GB

20% reduction in memory

Does pandas index have advantage on performance than column?

1 Answers1