10

This answer contains a very elegant way of setting all the types of your pandas columns in one line:

# convert column "a" to int64 dtype and "b" to complex type
df = df.astype({"a": int, "b": complex})

I am starting to think that that unfortunately has limited application and you will have to use various other methods of casting the column types sooner or later, over many lines. I tested 'category' and that worked, so it will take things which are actual python types like int or complex and then pandas terms in quotation marks like 'category'.

I have a column of dates which looks like this:

25.07.10
08.08.10
07.01.11

I had a look at this answer about casting date columns but none of them seem to fit into the elegant syntax above.

I tried:

from datetime import date
df = df.astype({"date": date})

but it gave an error:

TypeError: dtype '<class 'datetime.date'>' not understood

I also tried pd.Series.dt.date which also didn't work. Is it possible to cast all your columns including the date or datetime column in one line like this?

cottontail
  • 10,268
  • 18
  • 50
  • 51
cardamom
  • 6,873
  • 11
  • 48
  • 102
  • 1
    How are you obtaining this dataframe `df`? If you are reading it through a CSV, you could simply use `dtypes` argument to explicitly set the `dtype` of every column. – tidakdiinginkan Apr 20 '20 at 19:48
  • Yes, am reading it from a csv. Maybe that is what you are supposed to do, put something into `read_csv` as you read it, but still one would think it would still be possible after and in one line. – cardamom Apr 20 '20 at 19:50
  • 1
    I don't think there is a `date` `dtype` in pandas, you could convert it into a `datetime` however using the same syntax as - ```df = df.astype({'date': 'datetime64[ns]'})``` When you convert an `object` to `date` using ```pd.to_datetime(df['date']).dt.date``` , the `dtype` is still `object` – tidakdiinginkan Apr 20 '20 at 19:57
  • 2
    `df = df.astype({'date': 'datetime64[ns]'})` worked by the way. I think that must have considerable built-in ability for different date formats, year first or last, two or four digit year. I just saw 64 ns and thought it wanted the time in nanoseconds. While 'date' types might exist, I get the impression from the docs that that type is perfectly suitable for dates https://numpy.org/doc/1.18/reference/arrays.datetime.html – cardamom Apr 21 '20 at 14:14

2 Answers2

25

This has been answered in the comments where it was noted that the following works:

df.astype({'date': 'datetime64[ns]'})

In addition, you can set the dtype when reading in the data:

pd.read_csv('path/to/file.csv', parse_dates=['date'])
joelostblom
  • 43,590
  • 17
  • 150
  • 159
2

Pandas datetime dtype is from numpy datetime64, so you can use the following as well; there's no date dtype (although you can perform vectorized operations on a column that holds datetime.date values).

df = df.astype({'date': np.datetime64})

# or (on a little endian system)
df = df.astype({'date': '<M8'})
# (on a big endian system)
df = df.astype({'date': '>M8'})

That said, since you can't pass datetime format to astype(), it's a little primitive and it's better to use pd.to_datetime() instead. For example, if the dates in the data are of the format %d/%m/%Y such as 01/04/2020, astype() would incorrectly parse it as Jan 1, 2020 whereas with pd.to_datetime(), you can pass the correct format.

Even with read_csv, you have some control over the format, e.g.

df = pd.read_csv('file.csv', parse_dates=['date'], dayfirst=True)
cottontail
  • 10,268
  • 18
  • 50
  • 51