3

For an easy example :

import datatable as dt
import pandas as pd
from datetime import datetime

d_t = dt.Frame(pd.DataFrame({"Date": ["04/05/2020", "04/06/2020"]}))

There is only a column named Date with two values in str32 type.

How could I convert the Date column into a Date Format in datatable frame.

I have tried

dates_list = [datetime.strptime(date, "%m/%d/%Y") for date in d_t["Date"].to_list()[0]]
d_t[:,"NewDate"] = dt.Frame(dates_list)
d_t["NewDate"].max()
# The code can run successfully so far.

But the result was shown like this :

     NewDate
    ▪▪▪▪▪▪▪▪
  0     NA

I think it was still not a date format.

Even I looked up the type of each column, I still have no idea:

d_t.stypes

[Out]: (stype.str32, stype.obj64)

Is there any way to solve the problem or any alternatives?

Thanks for answering.

Pasha
  • 6,298
  • 2
  • 22
  • 34
Denny Chen
  • 489
  • 3
  • 8
  • did you try to convert the column with `pd.to_datetime()` first? – FObersteiner May 17 '20 at 19:19
  • It was a `datatable frame` object, so i can't directly use `pd.to_datetime`. So i extract the column and turned it into list or array and then convert it with `pd.to_datetime`, it had successfully turned into datetime format but when I combind the array or the list back to the datatable frame, it change into str32 automatically. – Denny Chen May 18 '20 at 12:04
  • 2
    Right now this is impossible because datatable doesn't support datetime column type yet. – Pasha May 18 '20 at 17:25
  • OK! Thank you very much. – Denny Chen May 19 '20 at 06:36

1 Answers1

1

datatable 1.0 introduced new column types datatable.Type.date32 and datatable.Type.time64.

In this case conversion from the string column Date to date32 type works like this:

d_t[:, dt.update(Date_date = 
             dt.time.ymd(dt.as_type(dt.str.slice(dt.f.Date, 6, 10), int), 
                         dt.as_type(dt.str.slice(dt.f.Date, 3, 5), int), 
                         dt.as_type(dt.str.slice(dt.f.Date, 0, 2), int)))]

d_t

    Date        Date_date
    ▪▪▪▪        ▪▪▪▪
0   04/05/2020  2020-05-04
1   04/06/2020  2020-06-04

Although a bit verbose it performs all operations inside datatable without overhead and penalty of intermediate lists or pandas frames. Hopefully, future releases will add more functions beyond datatable.time.ymd() to simplify such conversion ops.

Have you picked the ISO 8601 standard to represent dates (i.e. string in the format %Y-%m-%d) then conversion to date32 would have been considerably simpler:

d_t = dt.Frame({"Date": ["2020-04-05", "2020-04-06"]})
d_t[0] = dt.Type.date32
d_t.stypes

> (stype.date32,)
topchef
  • 19,091
  • 9
  • 63
  • 102
  • Thanks for the information of the new features introduced in the latest version issued. This seems great! – Denny Chen Jul 22 '21 at 01:45
  • glad you found this useful. For a bit more info see my latest post where I put together a brief overview of date/time functionality in v1.0: https://novyden.blogspot.com/2021/07/time-travel-with-py-datatable-10.html – topchef Jul 22 '21 at 19:24