3

On a dataframe with > 100 columns I want pandas (v1.4.2) to automatically convert all columns to the "best" dtype. According to the docs df.convert_dtypes() or df.infer_objects() should do the trick. Consider the following example:

>>df = pd.DataFrame({"A":["1","2"], "C":["abc","bcd"]})
>>df
   A    C
0  1  abc
1  2  bcd

>>df.dtypes
A    object
C    object
dtype: object

>>df.convert_dtypes().dtypes
A    string
C    string
dtype: object

>>df.infer_objects().dtypes
A    object
C    object
dtype: object

Why is column A not converted into int? What would be an alternative if I am trying the wrong pandas methods?

Anoushiravan R
  • 21,622
  • 3
  • 18
  • 41
Viktor
  • 583
  • 1
  • 3
  • 10

1 Answers1

5

Looking at the documentation of convert_dtypes(), it seems that the method does the conversion from Object to Int correctly but cannot determine if a string object is numerical:

>>> df = pd.DataFrame(
    {
        "A": pd.Series([3, 4, 5], dtype=np.dtype("O")),
        "B": pd.Series(["3", "4", "5"], dtype=np.dtype("O")),
        "C": pd.Series(["abc","bcd"], dtype=np.dtype("O"))
    }
)
         
>>> df.dtypes

A    object
B    object
C    object
dtype: object

>> df.convert_dtypes().dtypes

A     Int64
B    string
C    string
dtype: object

You can use the following as a workaround for your conversion:

>>> df.convert_dtypes().apply(pd.to_numeric, errors="ignore").dtypes

A     Int64
B     int64
C    object
dtype: object
ali bakhtiari
  • 1,051
  • 4
  • 23