I'm trying to convert the value of a column named date
in my csv file to another format, for instance:
Original records
transfer id,player id,player name,season,date
732058,1126,,12/13,Jul 1- 2012
581951,1126,,11/12,Jun 3- 2011
295000,1126,,09/10,Aug 12- 2009
98459,1126,,06/07,Nov 6- 2006
7267,1126,,03/04,Jul 2- 2003
...
and I'd like to get a result like
transfer id,player id,player name,season,date
732058,1126,,12/13,2012-07-01
581951,1126,,11/12,2011-06-03
295000,1126,,09/10,2009-08-12
98459,1126,,06/07,2006-11-06
7267,1126,,03/04,2003-07-02
...
as the data stored in csv file are string, I have written a method which can convert the date format:
import time
# convert date from original format to new format
def date_convert(_date,fmt_original,fmt_new):
if date_validate(_date,fmt_original):
timeArray=time.strptime(_date,fmt_original)
return time.strftime(fmt_new,timeArray)
else:
return '0001-01-01'
def date_validate(_date,fmt_original):
try:
time.strptime(_date, fmt_original)
return True
except ValueError:
return False
then I try to change the date in csv file,and I try to utilize pandas
, as is told by @MaxU:
and I write a code like
import pandas as pd
import date_format
df=pd.read_csv('the_transfer_info_test.csv',delimiter=',')
df.date=date_format.date_convert(df.date, '%b %d- %Y', '%Y-%m-%d')
print df
and at first I got an exception like this:
TypeError: expected string or buffer
I thought it may be related to the data type,as the df.date get a type of Series in pandas, so I coded as
df.date=date_format.date_convert(str(df.date), '%b %d- %Y', '%Y-%m-%d')
but it returns all 0001-01-01
which is the exception date in date_format, therefore I searched how to convert the Series to String and find an answer like
@Amit, and I tried methods below:
df['date'].astype(basestring)
df.date.apply(str)
df['date'].astype(str)
df['date'].astype('str')
but they don't work for me, I got the same exception like:
TypeError: expected string or buffer
I wonder how I can convert specific column values in csv file,either utilize pandas or not.
BTW, my python version is 2.7.12 with IDE PyCharm and Anoconda 4.0.0 and pandas 0.18.0.
Any help is appreciated,thank you.
Thanks to @jezrael, for my sample above it both work well, it's my fault that I meant to simplify my issue and simplified my question,actually my original data is like:
transfer id,player id,player name,season,date,move from,move from id,move to,move to id,market value,transfer fee
732058,1126,,12/13,Jul 1- 2012,e-frankfurt,24,1-fc-koln,3,£1.06m,Free transfer
581951,1126,,11/12,Jul 1- 2011,fc-st-pauli,35,eintracht-frankfurt,24,£1.70m,£425k
295000,1126,,09/10,Jul 1- 2009,alem-aachen,8,fc-st-pauli,35,£850k,Free transfer
98459,1126,,06/07,Jul 1- 2006,1860-munich,72,alemannia-aachen,8,£1.36m,£765k
7267,1126,,03/04,Jul 1- 2003,stuttgart-ii,102,tsv-1860-munich,72,-,£21k
...
and actually these approaches work well with part of my data I mean if I test it with afew lines with the same format,but when it comes to the original data which is about 40000 records,it's wired that those approaches do not work any more, for to_datetime
method,I got an exception like
ValueError: time data '-' does not match format '%b %d- %Y' (match)
while with the second method as parse_dates
, the date format remains the same like Jun 11- 2016
.
Again,any help will be appreciated.