4

I'm using Pandas to read a .xlsx file as follows:

    client_df = pd.read_excel(
    client_file, 
    header = 1,
    parse_dates = ['Event Date'], 
    date_parser = lambda x: datetime.strftime(x, '%Y-%m-%d')
    )

This works fine, but I'm getting this warning from Pandas:

.venv/lib/python3.8/site-packages/pandas/io/parsers.py:3339: FutureWarning:

Use pd.to_datetime instead.

return generic_parser(date_parser, *date_cols)

I read somewhere that Pandas is deprecating datetime.

  • Can anyone help me convert the date_parser expression to use pd.to_datetime?

EDIT: apologies for the lack of clarity. The client_file datetime objects is as follows:

YYYY-MM-DD HH:MM:SS

I need to extract the following, as a str object:

YYYY-MM-DD

EDIT: @jezrael's answer (below) working perfectly - thank you jezrael!

client_df = pd.read_excel(
client_file, 
header = 1,
parse_dates = ['Event Date'],
converters={'Event Date': lambda x: pd.to_datetime(x).strftime('%Y-%m-%d')}
)

EDIT 2: @MrFuppes pointed out that I didn't need to parse dates at all. His solution is probably better in this instance, as follows:

client_df = pd.read_excel(
client_file,
header = 1,
converters = {'Event Date': lambda x: x.strftime('%Y-%m-%d')}
)
Nando Machado
  • 182
  • 1
  • 11
  • Hey. The second option works but still gives me the future warning. I'm not sure what the first option is returning, but it fails to match the date as str... – Nando Machado May 10 '21 at 10:40
  • Can you check solutions form my edited answer? – jezrael May 10 '21 at 10:40
  • 1
    @NandoCarnielMachado I still think the conversion to pandas datetime is superfluous... you just need to set `converters={'Event Date': lambda x: x.strftime('%Y-%m-%d')}` in pd.read_excel, but *do not set* parse_dates. – FObersteiner May 10 '21 at 13:52
  • @MrFuppes Thanks, I'll see if that works. What is the parse_dates even doing there anyway? – Nando Machado May 10 '21 at 18:31
  • `parse_dates` is helpful if the column isn't formatted to date/time in Excel (the kwarg also exists for pd.read_csv btw., it's generally more useful there). The point is that pandas just uses other packages for the actual decoding of the Excel file (e.g. openpyxl). That code does the conversion from Excel serial date to Python datetime. pandas then converts to its built-in datetime dtype - unless you set the converter to intercept that... – FObersteiner May 10 '21 at 18:39
  • @MrFuppes Just tried as you suggested and didn't parse dates - it works better, thank you! – Nando Machado May 11 '21 at 07:42
  • But what is it that pandas doesn't like? Is the `date_parser` argument being phase out? Why would the original code stop working in the future? – adr Aug 02 '21 at 09:30

1 Answers1

2

Use Series.dt.strftime:

client_df = pd.read_excel(
    client_file, 
    header = 1,
    parse_dates = ['Event Date']
    )

client_df['Event Date'] = client_df['Event Date'].dt.strftime('%Y-%m-%d')

If need solution with read_excel use converters:

client_df = pd.read_excel(
    client_file, 
    header = 1,
    parse_dates = ['Event Date'],
    converters={'Event Date': lambda x: pd.to_datetime(x).strftime('%Y-%m-%d')}
    )

Another idea is use only strftime like pointed @MrFuppes:

client_df = pd.read_excel(
    client_file, 
    header = 1,
    parse_dates = ['Event Date'],
    converters={'Event Date': lambda x: x.strftime('%Y-%m-%d')}
    )
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • 1
    Great, the second option works perfectly! Thank you! I had tried converters before, but I found that it looked clumsy to handle use string slicing instead of strftime. – Nando Machado May 10 '21 at 10:50
  • 2
    Is there a way to handle this parsing when reading the CSV rather than after having created the dataframe? – Nando Machado May 10 '21 at 10:51
  • @NandoCarnielMachado - Added solution. – jezrael May 10 '21 at 11:00
  • 1
    Since the YYYY-MM-DD format doesn't change, wouldn't it be more efficient to just split on space and discard HH:MM:SS? Parsing to datetime seems unnecessary since the desired output dtype is string. – FObersteiner May 10 '21 at 11:24
  • 1
    @MrFuppes - yes, but it seems inoput data from excel are `datetimes` – jezrael May 10 '21 at 11:27
  • @MrFuppes Yes, that would be the case, but it's a datetime object, like jezrael says. – Nando Machado May 10 '21 at 13:38
  • 1
    @jezrael Thank you, the second solution is what I was looking for: no "Future warnings" and code working as expected. Thank you very much! – Nando Machado May 10 '21 at 13:44
  • @jezrael: see my comment under the question; if the column is formatted to date/time in excel, it is imported as datetime object, so you simple need to set a converter. no need to go from Python datetime to pandas datetime to string again. – FObersteiner May 10 '21 at 13:56