1

I am aware that there are two ways to access a column in DataFrame: by index or as an attribute.

For the following DataFrame df:

         colA      colB
0    0.469112 -0.282863
1    1.212112 -0.173215
2   -0.861849 -2.104569
3    0.721555 -0.706771
4   -0.424972  0.567020
5   -0.673690  0.113648
6    0.404705  0.577046
7   -0.370647 -1.157892
>>> dict = {"colA": [0.469112, 0.861849, 0.673690, 0.404705, 0.370647],
        "colB": [0.282863,0.173215, 0.113648, 0.577046, 1.157892]}

>>> df = pd.DataFrame(dict)

By index:

>>> df['colA']
0    0.469112
1    0.861849
2    0.673690
3    0.404705
4    0.370647
Name: colA, dtype: float64 

By attribute:

>>> df.colA
0    0.469112
1    0.861849
2    0.673690
3    0.404705
4    0.370647
Name: colA, dtype: float64

I am wondering what the advantages of accessing a column one way or another are and in what situations one may prefer to use one or the other. For example in one instance I preferred to use df['colA'].unique() to list the unique values in my DataFrame for the sake of clarity despite the documentation having used df.colA.unique(). Is there a difference in efficiency of access in this case?

Tetraquark
  • 33
  • 5
  • 3
    It's a convenience and one I hate. The pandas API is huge so I find the attribute-access-method confusing. I'd rather not have the ambiguity – roganjosh May 20 '20 at 17:48
  • 1
    Long story short, _always_ use the bracket notation. It costs you 3 more characters typing it out, but saves you dozens of unforeseen headaches caused by using the dot notation – G. Anderson May 20 '20 at 17:49
  • 1
    About two years ago, I switched to only using the square bracket notation. It is more flexible. The other is just a shortcut that has its limitations. – Scott Boston May 20 '20 at 17:49
  • For ad-hoc analysis in Jupyter Lab I often use the implicit attribute ("syntactic sugar"), but never in operational code. See the "Zen of Python": Explicit is better than implicit. – Peter May 20 '20 at 18:26

0 Answers0