0

I am working on a data frame uploaded from CSV, I have tried changing the data typed on the CSV file and to save it but it doesn't let me save it for some reason, and therefore when I upload it to Pandas the date and time columns appear as object. I have tried a few ways to transform them to datetime but without a lot of success:

1) df['COLUMN'] = pd.to_datetime(df['COLUMN'].str.strip(), format='%m/%d/%Y') gives me the error:

AttributeError: Can only use .str accessor with string values, which use np.object_ dtype in pandas

2) Defining dtypes at the beginning and then using it in the read_csv command - gave me an error as well since it does not accept datetime but only string/int.

Some of the columns I want to have a datetime format of date, such as: 2019/1/1, and some of time: 20:00:00

Do you know of an effective way of transforming those datatype object columns to either date or time?

user11395824
  • 71
  • 1
  • 7

1 Answers1

0

Based on the discussion, I downloaded the data set from the link you provided and read it through pandas. I took one column and a part of it; which has the date and used the pandas data-time module as you did. By doing so I can use the script you mentioned.

#import necessary library
import numpy as np
import pandas as pd

#load the data into csv
data = pd.read_csv("NYPD_Complaint_Data_Historic.csv")

#take one column which contains the datatime as an example
dte = data['CMPLNT_FR_DT']

# =============================================================================
# I will try to take a part of the data from dte which contains the 
# date time and convert it to date time
# =============================================================================

from pandas import datetime
test_data  = dte[0:10]
df1 = pd.DataFrame(test_data)
df1['new_col'] = pd.to_datetime(df1['CMPLNT_FR_DT'])
df1['year'] = [i.year for i in df1['new_col']]
df1['month'] = [i.month for i in df1['new_col']]
df1['day'] = [i.day for i in df1['new_col']]


#The way you used to convert the data also works


df1['COLUMN'] =  pd.to_datetime(df1['CMPLNT_FR_DT'].str.strip(), format='%m/%d/%Y')

It might be the way you get the data. You can see the output from this attached. As the result can be stored in dataframe it won't be a problem to save in any format. Please let me know if I understood correctly and it helped you. The month is not shown in the image, but you can get it.enter image description here

hemanta
  • 1,405
  • 2
  • 13
  • 23