I am aware that there are two ways to access a column in DataFrame: by index or as an attribute.
For the following DataFrame df
:
colA colB
0 0.469112 -0.282863
1 1.212112 -0.173215
2 -0.861849 -2.104569
3 0.721555 -0.706771
4 -0.424972 0.567020
5 -0.673690 0.113648
6 0.404705 0.577046
7 -0.370647 -1.157892
>>> dict = {"colA": [0.469112, 0.861849, 0.673690, 0.404705, 0.370647],
"colB": [0.282863,0.173215, 0.113648, 0.577046, 1.157892]}
>>> df = pd.DataFrame(dict)
By index:
>>> df['colA']
0 0.469112
1 0.861849
2 0.673690
3 0.404705
4 0.370647
Name: colA, dtype: float64
By attribute:
>>> df.colA
0 0.469112
1 0.861849
2 0.673690
3 0.404705
4 0.370647
Name: colA, dtype: float64
I am wondering what the advantages of accessing a column one way or another are and in what situations one may prefer to use one or the other. For example in one instance I preferred to use df['colA'].unique()
to list the unique values in my DataFrame for the sake of clarity despite the documentation having used df.colA.unique()
. Is there a difference in efficiency of access in this case?