2

How can I check if a column is a string, or another type (e.g. int or float), even though the dtype is object?

(Ideally I want this operation vectorised, and not applymap checking every row...)

import io
# American post code
df1_str = """id,postal
1,12345
2,90210
3,"""
df1 = pd.read_csv(io.StringIO(df1_str))
df1["postal"] = df1["postal"].astype("O")  # is an object (of type float due to the null row 3)
# British post codes
df2_str = """id,postal
1,EC1
2,SE1
3,W2"""
df2 = pd.read_csv(io.StringIO(df2_str))
df2["postal"] = df2["postal"].astype("O")  # is an object (of type string)

Both df1 and df2 return object when doing df["postal"].dtype

  • However, df2 has .str methods, e.g. df2["postal"].str.lower(), but df1 doesn't.
  • Similarly, df1 can have mathematical operations done to it, e.g. df1 * 2

This is different to other SO questions. who ask if there are strings inside a column (and not the WHOLE column). e.g:

A H
  • 2,164
  • 1
  • 21
  • 36
  • 1
    Try: https://datascience.stackexchange.com/questions/60955/how-to-check-all-values-in-particular-column-has-same-data-type-or-not However, I usually use pd.to_numeric without huge performance issues – David Erickson Jan 17 '21 at 01:52
  • It would be cool if the pandas devs included an optional argument in df.info() for this in the future. It could show the value counts of each data type within each column and again would be an optional argument. – David Erickson Jan 17 '21 at 02:26

1 Answers1

4

You can use pandas.api.types.infer_dtype:

>>> pd.api.types.infer_dtype(df2["postal"])
'string'
>>> pd.api.types.infer_dtype(df1["postal"])
'floating'

From the docs:

Efficiently infer the type of a passed val, or list-like array of values. Return a string describing the type.

Pablo C
  • 4,661
  • 2
  • 8
  • 24