How to get time values as strings, during read_excel execution?

Question

i have to parse ODF-format turnstile's data file. In the file are employees entry/out time values in HH:MM:SS (like a 141:59:30).
link to sample file on GoogleDrive

My attempts to open the file with df = pd.read_excel(filename, engine="odf", ...) had crushed with exception: ParserError hour must be in 0..23: 141:59:30.

I tried to open file a several ways:

df = pd.read_excel(filename, engine="odf", skiprows=3) "skiprows" to cut useless header's rows.
df = pd.read_excel(filename, engine="odf", skiprows=3, dtype=str) "dtype=str" i thought, this option to represent all cells as string, to forbid automaticalliy datetime parsing.

But i haven't rid the ParserError exception. Can you to point me a way to get values like a '141:59:30' as string at time read_excel execution?

this really is a strange behaviour... the only working option I found was to format the column in the spreadsheet to "text". it will appear as fractional days (e.g. "141:59:30" will become ~5.91 days). now you can load to df. in Python/pandas, you can then convert to timedelta if you like, with pd.to_timedelta and unit='d'. — FObersteiner, Feb 14 '21 at 16:28

Sebastien A · Answer 1 · 2021-02-14T14:05:49.967

0

You can pass a dictionary to the dtype parameter where you input your column name as the key, and the data type as the value.

Could look something like this :

df = pd.read_excel(filename, engine="odf", skiprows=3, dtype={'time_col':str})

UPDATE

You could also try passing a converter function in the read statement.

def to_timedelta(x):
  return pd.to_timedelta(x)

df = pd.read_excel(filename, engine="odf", skiprows=3, converters={-1:to_timedelta})

edited Feb 14 '21 at 14:05

answered Feb 14 '21 at 04:54

Sebastien A

36
3

it doesn't helps for my case. I tried dtype={-1: str} as the time-value column ever is last and have no name. I'm asking the help here cause i have overtried many standard recipes – Gerasim Gerasimov Feb 14 '21 at 07:50
I have updated my answer with another possible solution. Make sure that -1 is referencing the right column, and not some extra blank column. Seems like you could also use column index 98, from your sample file. – Sebastien A Feb 14 '21 at 14:11
Thanks, but it didn't help me. ParserError exception is rising on df = pd.read_excel(...) string at once. Function (to_timedelta) is never called. – Gerasim Gerasimov Feb 14 '21 at 15:44

How to get time values as strings, during read_excel execution?

1 Answers1