Coming from R background, I find the (very high) use of Index
objects in pandas a little disconcerting. For example, if train
is a pandas DataFrame, is there some special reason why train.columns
should return an Index
rather than a list? What purpose would additionally be served if it is an Index
object? As per the definition of pandas.Index
, it is the basic object storing axis labels for all pandas objects. While train.index.values
does return the row labels (axis=0), how can I get column labels or columns names from pandas.index
? In this question unlike in an earlier question, I have a specific example in mind.

- 1,191
- 2
- 15
- 28
-
1Possible duplicate of [What is the point of indexing in pandas?](https://stackoverflow.com/questions/27238066/what-is-the-point-of-indexing-in-pandas) – Brad Solomon Sep 14 '17 at 14:17
-
The link above has some good info about why all elements of the index being hashable matters. – Brad Solomon Sep 14 '17 at 14:18
-
Thanks. It does. I am going through it. – Ashok K Harnal Sep 14 '17 at 14:22
2 Answers
A pd.Index
is an array-like container of the column names, so in some sense it doesn't make sense to ask how to get the labels from the index, because the index is the labels.
That said, you can always get the underlying numpy array with df.columns.values
, or convert to a python list with tolist()
as @Mitch showed.
In terms of why an index is used over a bare array - an Index
provides extra functionality/performance used throughout pandas - the core of which is hash table based indexing.
By example, consider the following frame / columns.
df = pd.DataFrame(np.random.randn(10, 10),
columns=list('abcdefghkm'))
cols = df.columns
cols
Out[16]: Index(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'k', 'm'], dtype='object')
Now say you want to select column 'h'
out of the frame. With a list or array version of the columns, you would have loop over the columns to find the position of 'h'
, which is O(n)
in the number of columns - something like this:
for i, col in enumerate(cols):
if col == 'h':
found_loc = i
break
found_loc
Out[18]: 7
df.values[:, found_loc]
Out[19]:
array([-0.62916208, 2.04403495, 0.29498066, 1.07939374, -1.49619915,
-0.54592646, -1.04382192, -0.45934113, -1.02935858, 1.62439231])
df['h']
Out[20]:
0 -0.629162
1 2.044035
2 0.294981
3 1.079394
4 -1.496199
5 -0.545926
6 -1.043822
7 -0.459341
8 -1.029359
9 1.624392
Name: h, dtype: float64
With the Index
, pandas constructs a hash table of the column values, so finding the location of 'h' is an amortized O(1)
operation, generally significantly faster, especially if the number of columns is significant.
df.columns.get_loc('h')
Out[21]: 7
This example was only selecting a single column, but as @ayhan notes in the comment, this same hash table structure also speeds up many other operations like merging, alignment, filtering, and grouping.

- 49,833
- 8
- 70
- 70
-
2It all comes down to finding the location of 'h' but it might be worth mentioning that this speeds up many other operations like grouping, subsetting, merging etc. – ayhan Sep 14 '17 at 14:28
-
From the documentation for pandas.Index
Immutable ndarray implementing an ordered, sliceable set. The basic object storing axis labels for all pandas objects
Having a regular list as an index for a DataFrame could cause issues with unorderable or unhashable objects, evidently - since it is backed by a hash table, the same principles apply as to why lists can't be dictionary keys in regular Python.
At the same time, the Index object being explicit permits us to use different types as an Index, as compared to the implicit integer index that NumPy has for instance, and perform fast lookups.
If you want to retrieve a list of column names, the Index object has a tolist
method.
>>> df.columns.tolist()
['a', 'b', 'c']

- 28,857
- 6
- 80
- 93
-
Will be grateful if you can please expand upon the statement ' Having a regular list as an index for a DataFrame could cause issues with unorderable or unhashable objects, evidently. '. (Maybe there is an example.) Thanks. – Ashok K Harnal Sep 14 '17 at 14:14
-
1@user3282777 An index is like a mapping to the DataFrame columns, sort of like a Python dict. So the same principles apply as for why you can't have mutable types as dict keys in regular Python, which the [Python wiki](https://wiki.python.org/moin/DictionaryKeys) has a useful bit on. – miradulo Sep 14 '17 at 14:19