I'm reading an excel file, with Pandas, containing Title
and Date
columns. When I manually set up a test version like this:
import pandas as pd
df = pd.DataFrame(data={'Title': ['Movie1', 'Movie2', 'Movie3', 'Movie4'],
'Date': ['1991-11', '1991', '1991', '1991-10-31']})
print(df)
It prints as expected and most importantly, i'm able to sort it exactly how I would like by using print(df.sort_values('Date'))
, Below is the output i'm ultimately trying to achieve. As you can see there are instances of YYYY/MM/DD, YYYY/MM and just YYYY.
Title Date
1 Movie2 1991
2 Movie3 1991
3 Movie4 1991-10-31
0 Movie1 1991-11
My issues occur when I try to run print(df.sort_values('Date'))
with the actual Excel file i'm reading, using read_excel
. I get TypeError: '<' not supported between instances of 'int' and 'str'
I've narrowed it down to how i'm entering YYYY-MM and YYYY-MM-DD dates into the Excel file. If I run it with just YYYY dates, it sorts correctly. In order to correctly display YYYY-MM and YYYY-MM-DD dates, in the Excel file, I must pre-pend a back-tick to them. Perhaps this is what is causing the problem.
Hopefully someone else has run into this before. Is there a way to read those dates with leading back-ticks correctly, using Pandas?
Or, is there a better way to enter dates into the Excel file for use with Pandas? (This may be just as much an Excel question as a Pandas question).