1

i have to parse ODF-format turnstile's data file. In the file are employees entry/out time values in HH:MM:SS (like a 141:59:30).
link to sample file on GoogleDrive

My attempts to open the file with df = pd.read_excel(filename, engine="odf", ...) had crushed with exception: ParserError hour must be in 0..23: 141:59:30.

I tried to open file a several ways:

  1. df = pd.read_excel(filename, engine="odf", skiprows=3) "skiprows" to cut useless header's rows.
  2. df = pd.read_excel(filename, engine="odf", skiprows=3, dtype=str) "dtype=str" i thought, this option to represent all cells as string, to forbid automaticalliy datetime parsing.

But i haven't rid the ParserError exception. Can you to point me a way to get values like a '141:59:30' as string at time read_excel execution?

FObersteiner
  • 22,500
  • 8
  • 42
  • 72
  • this really is a strange behaviour... the only working option I found was to format the column in the spreadsheet to "text". it will appear as fractional days (e.g. "141:59:30" will become ~5.91 days). now you can load to df. in Python/pandas, you can then convert to timedelta if you like, with pd.to_timedelta and unit='d'. – FObersteiner Feb 14 '21 at 16:28

1 Answers1

0

You can pass a dictionary to the dtype parameter where you input your column name as the key, and the data type as the value.

Could look something like this :

df = pd.read_excel(filename, engine="odf", skiprows=3, dtype={'time_col':str})

UPDATE

You could also try passing a converter function in the read statement.

def to_timedelta(x):
  return pd.to_timedelta(x)

df = pd.read_excel(filename, engine="odf", skiprows=3, converters={-1:to_timedelta})

  • it doesn't helps for my case. I tried dtype={-1: str} as the time-value column ever is last and have no name. I'm asking the help here cause i have overtried many standard recipes – Gerasim Gerasimov Feb 14 '21 at 07:50
  • I have updated my answer with another possible solution. Make sure that -1 is referencing the right column, and not some extra blank column. Seems like you could also use column index 98, from your sample file. – Sebastien A Feb 14 '21 at 14:11
  • Thanks, but it didn't help me. ParserError exception is rising on df = pd.read_excel(...) string at once. Function (to_timedelta) is never called. – Gerasim Gerasimov Feb 14 '21 at 15:44