39

I have a timestamp column where the timestamp is in the following format

2016-06-16T21:35:17.098+01:00

I want to extract date and time from it. I have done the following:

import datetime as dt

df['timestamp'] = df['timestamp'].apply(lambda x : pd.to_datetime(str(x)))

df['dates'] = df['timestamp'].dt.date

This worked for a while. But suddenly it does not.

If I again do df['dates'] = df['timestamp'].dt.date I get the following error

Can only use .dt accessor with datetimelike values

Luckily, I have saved the data frame with dates in the csv but I now want to create another column time in the format 23:00:00.051

EDIT

From the raw data file (15 million samples), the timestamp column looks like following (first 5 samples):

            timestamp

0           2016-06-13T00:00:00.051+01:00
1           2016-06-13T00:00:00.718+01:00
2           2016-06-13T00:00:00.985+01:00
3           2016-06-13T00:00:02.431+01:00
4           2016-06-13T00:00:02.737+01:00

After the following command

df['timestamp'] = df['timestamp'].apply(lambda x : pd.to_datetime(str(x)))

the timestamp column looks like with dtype as dtype: datetime64[ns]

0    2016-06-12 23:00:00.051
1    2016-06-12 23:00:00.718
2    2016-06-12 23:00:00.985
3    2016-06-12 23:00:02.431
4    2016-06-12 23:00:02.737

Then finally

df['dates'] = df['timestamp'].dt.date

0           2016-06-12
1           2016-06-12
2           2016-06-12
3           2016-06-12
4           2016-06-12

EDIT 2

Found the mistake. I had cleaned the data and saved the data frame in a csv file, so I don't have to do the cleaning again. When I read the csv, the timestamp dtype changes to object. Now how do I fix this?

Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
chintan s
  • 6,170
  • 16
  • 53
  • 86
  • 2
    well it means you have some duff values so you can force these duff values to `NaT`: `df['timestamp'] = pd.to_datetime(df['timestamp'], errors='coerce')` then you can get rid of these values using `dropna` and then you can call `dt.date` as before – EdChum Sep 23 '16 at 13:36
  • `df.timestamp.isnull().sum()` returns 0 – chintan s Sep 23 '16 at 13:38
  • Sorry but unless you post your raw data and code with errors then this becomes a hypothetical posturing exercise which wastes time – EdChum Sep 23 '16 at 13:39
  • If checking with summing null values, you assume that incorrect values are null/interpreted as null. I'd doubt that – g_uint Sep 23 '16 at 13:40
  • related question: http://stackoverflow.com/questions/33365055/attributeerror-can-only-use-dt-accessor-with-datetimelike-values – sokoli Sep 23 '16 at 13:41
  • I have posted the code to reproduce. I will try your method too now. – chintan s Sep 23 '16 at 14:03
  • 2
    @chintans OT, instead of `df['timestamp'].apply(lambda x : pd.to_datetime(str(x)))`, consider `pd.to_datetime(df['timestamp'])`. – Ami Tavory Sep 23 '16 at 14:06
  • Is it because you have imported `datetime` as `dt` and then calling the `dt` accessor attribute resulting in it being overshadowed and thereby resulting in the error? – Nickil Maveli Sep 23 '16 at 14:16
  • That seems to be the mistake. I think @Ami 's solution will work. It just takes a lot of time to apply it to 15 million samples :) – chintan s Sep 23 '16 at 14:18
  • OK, found the mistake. I had cleaned the data and saved the data frame in a csv file, so I dont have to do the cleaning again. When I read the csv, the `timestamp` dtype changes to object. Now how do I fix this? – chintan s Sep 23 '16 at 14:57
  • @chintans look at the `converters` argument for `pd.read_csv`... – Jon Clements Sep 23 '16 at 15:03
  • 1
    @chintans To speed up the conversion, specify the format of your datetime strings --- see [this question](http://stackoverflow.com/questions/39548139/faster-way-of-converting-date-column-to-weekday-name-in-pandas/39548416#39548416). – Alicia Garcia-Raboso Sep 23 '16 at 15:16

4 Answers4

54

Do this first:

df['time'] = pd.to_datetime(df['timestamp'])

Before you do your extraction as usual:

df['dates'] = df['time'].dt.date
Gursel Karacor
  • 1,137
  • 11
  • 21
24

If date is in string form then:

import datetime

# this line converts the string object in Timestamp object
df['DateTime'] = [datetime.datetime.strptime(d, "%Y-%m-%d %H:%M") for d in df["DateTime"]]

# extracting date from timestamp
df['Date'] = [datetime.datetime.date(d) for d in df['DateTime']] 

# extracting time from timestamp
df['Time'] = [datetime.datetime.time(d) for d in df['DateTime']] 

If the object is already in the Timestamp format then skip the first line of code.

%Y-%m-%d %H:%M this means your timestamp object must be in the form like 2016-05-16 12:35:00.

Richard D
  • 327
  • 3
  • 16
Ajay Goyal
  • 296
  • 2
  • 4
3

You can use pandas built-in to_datetime object for this

df['timestamp'] = pd.to_datetime(df['timestamp'])
df['date'] = df['timestamp'].dt.date
df['time'] = df['timestamp'].dt.time
Colin Anthony
  • 1,141
  • 12
  • 21
0

When you are importing your csv, then use parse_dates parameter of pandas.read_csv(). For example, to import a column utc_datetime as datetime:

parse_dates = ['utc_datetime']
df = pandas.read_csv('file.csv', parse_dates=parse_dates)

To extract date from timestamp, use numpy instead of pandas:

df['utc_date'] = numpy.array(df['utc_datetime'].values, dtype='datetime64[D]')

Numpy datetime operations are significantly faster than pandas datetime operations.

sudesh
  • 23
  • 4