1

I'm trying to convert a column of Year values from int64 to datetime64 in pandas. The column currently looks like

         Year

         2003
         2003
         2003
         2003
         2003
         ... 
         2021
         2021
         2021
         2021
         2021

However the data type listed when I use dataset['Year'].dtypes is int64.

That's after I used pd.to_datetime(dataset.Year, format='%Y') to convert the column from int64 to datetime64. How do I get around this?

EcoHealthGuy
  • 51
  • 1
  • 1
  • 7

2 Answers2

2

You should be able to convert from an integer:

df = pd.DataFrame({'Year': [2003, 2022]})

df['datetime'] = pd.to_datetime(df['Year'], format='%Y')

print(df)

Output:

   Year   datetime
0  2003 2003-01-01
1  2022 2022-01-01
mozway
  • 194,879
  • 13
  • 39
  • 75
  • Could it maybe be that the data was originally read from a csv file? The stuff I have in the question is how the Year column prints out for me, and when I checked the data type before trying anything, it was showing that the values int64, just like it's showing me now after using to_datetime. Maybe I'm missing something that's preventing to_datetime from working (white spaces I'm not seeing, missing values, etc.)? – EcoHealthGuy Jul 28 '22 at 20:12
  • Can you show your full code? Did you assign the output? – mozway Jul 28 '22 at 20:14
  • Good point! I'm gonna see if I can edit the question to include the whole code. Could be something I'm doing elsewhere causing this. – EcoHealthGuy Jul 28 '22 at 20:21
  • 1
    Actually, tweaked something from @Nameer1811 and it looks like it's working now. Thank you both for helping me understand pandas better! I really appreciate it! – EcoHealthGuy Jul 28 '22 at 20:25
  • OK, so you just had forgotten to assign? ;) – mozway Jul 28 '22 at 20:30
  • Looks like it, @mozway! It's the little things, right? – EcoHealthGuy Jul 28 '22 at 21:47
1

You have to assign pd.to_datetime(df['Year'], format="%Y") to df['date']. Once you have done that you should be able to see convert from integer.

df = pd.DataFrame({'Year': [2000,2000,2000,2000,2000,2000]})

df['date'] = pd.to_datetime(df['Year'], format="%Y")

df

The output should be:

    Year    date
0   2000    2000-01-01
1   2000    2000-01-01
2   2000    2000-01-01
3   2000    2000-01-01
4   2000    2000-01-01
5   2000    2000-01-01

So essentially all you are missing is df['date'] = pd.to_datetime(df['Year'], format="%Y") from your code and it should be working fine with respect to converting.

The pd.to_datetime() will not just return the Year (as far as I understood from your question you wanted the year), if you want more information on what .to_date_time() returns, you can see the documentation.

I hope this helps.

Nameer1811
  • 63
  • 6
  • It should be working if you have assigned the output. @mozway had a similar answer to mine so I believe his way would have worked too. – Nameer1811 Jul 28 '22 at 20:19
  • Looks like it's working now! I assigned it first to dataset['Year'], and now it's properly converting. Thank you for your answer! – EcoHealthGuy Jul 28 '22 at 20:26