0

Pandas error when reading date from excel file. I am creating a dataframe using the following command.

df = pd.read_excel("report_file.xls", parse_dates=['operation_date'])
df.dtypes
operation_date  datetime64[ns]

Everything looks good. But when analyzing the dataframe, an error was found. After the number of the day matches the number of the month, the pandas is mistaken and reverses the day and month. For example, in October data it looks like this.

45 2021-10-13 11:50:34  ...                       329.97
46 2021-10-13 11:41:56  ...                       323.50
47 2021-10-13 11:41:55  ...                      2600.00
48 2021-10-10 02:05:13  ...                      1479.45
49 2021-09-10 20:22:01  ...                        40.00
50 2021-09-10 19:39:39  ...                        42.64
51 2021-09-10 19:39:39  ...                       350.00
52 2021-06-10 20:11:48  ...                        20.00
53 2021-06-10 13:34:25  ...                         1.96

You can see that after 2021-10-10 day number at the place of month.

user15228599
  • 11
  • 1
  • 3
  • What format does the XLS file use for the dates? It sounds like a localization problem: There are different ways to write dates (in the US it is month-day-year), which is one reason the year-mo-day was adopted as an unambiguous format. – alexis Oct 30 '21 at 09:43
  • XLS use day-month-year format – user15228599 Oct 30 '21 at 18:15

2 Answers2

1

Try passing the date format explicitly, something like this:

pd.read_excel(
    "report_file.xls",
    parse_dates=['operation_date'],
    date_parser=lambda x: pd.to_datetime(x, format='%Y-%m-%d %I:%M:%S')
)
funnydman
  • 9,083
  • 4
  • 40
  • 55
  • Unfortunately it didn't work for me.I got the following error information. "ValueError: time data 29.10.2021 19:49:07 doesn't match format specified".You have entered the correct path to find a solution. – user15228599 Oct 30 '21 at 18:04
0

I used another way. In my Excel file (report_file.xls) I have a column with name operation_date and dates in my table with date d-m-Y.

my table

At first I made a dataframe from file:

df = pd.read_excel('report_file.xls')
df.dtypes   # operation_date   object

Then I converted the date from a string to a datetime:

df['operation_date'] = pd.to_datetime(df['operation_date'],dayfirst=True)
df.dtypes   #  operation_date      datetime64[ns]

In this case, there were no problems in the date after 2021-10-10 that I described. Here is the code:

df = pd.read_excel("report_file.xls", parse_dates=['operation_date'])
45 2021-10-13 11:50:34  ...                       329.97
46 2021-10-13 11:41:56  ...                       323.50
47 2021-10-13 11:41:55  ...                      2600.00
48 2021-10-10 02:05:13  ...                      1479.45
49 2021-10-09 20:22:01  ...                        40.00
50 2021-10-09 19:39:39  ...                        42.64
51 2021-10-09 19:39:39  ...                       350.00
52 2021-10-06 20:11:48  ...                        20.00
53 2021-10-06 13:34:25  ...                         1.96

And now the date looks correct.

Jeremy Caney
  • 7,102
  • 69
  • 48
  • 77
user15228599
  • 11
  • 1
  • 3