1

I am using: Python 3.7.2 & Pandas 0.24.2 And I try to read the following data (data.txt). Separated by whitespace, first column should be parsed as datetime objects:

       #00:00:00               col0       col1
       2019-03-28_08:58:00     1064      31965
       2019-03-28_09:08:00     1084      32565
       !2019-03-28_09:18:00    1104      33165
       2019-03-28_09:28:00     1124      33765

with pandas read_csv as:

import pandas as pd
import datetime 

def date_parser (s):
    return datetime.datetime.strptime(str(s),'%Y-%m-%d_%H:%M:%S')

df      = pd.read_csv(filepath_or_buffer='data.txt',
                      delim_whitespace = True,
                      index_col='#00:00:00',
                      parse_dates=True,
                      date_parser=date_parser,
                      comment='!',
                      )

All lines starting with a special character (here: !) should be skipped. It can be any other charakter. But with the commented line I always receive the error:

ValueError: time data 'nan' does not match format '%Y-%m-%d_%H:%M:%S'

I am thankful for any ideas

neo
  • 49
  • 2
  • 7
  • Are you sure it's not another line (not shown in the example input) that causes the problem? This seems to work ok. – Shaido May 09 '19 at 08:21
  • the error occurs when I insert the "!" in the 3rd data line. Without that it works. I used the exact same code and example input. – neo May 09 '19 at 08:25

2 Answers2

3

Try this approach:

df.columns = ["date", "c1", "c2"]
df.head()

date    c1  c2
0   2019-03-28_08:58:00 1064    31965
1   2019-03-28_09:08:00 1084    32565
2   2019-03-28_09:18:00 1104    33165
3   2019-03-28_09:28:00 1124    33765

df.dtypes
date    object
c1       int64
c2       int64
dtype: object

df.date = pd.to_datetime(df.date, format='%Y-%m-%d_%H:%M:%S')

df.dtypes

date    datetime64[ns]
c1               int64
c2               int64
dtype: object

After that you can also do these operations to extract year, hour or date like: df.date.dt.year df.date.dt.hour or df.date.dt.date

ashish14
  • 650
  • 1
  • 8
  • 20
  • Thank you very much. This approach also works wonderfully only I hoped to be able to do everything in one go. The whitespace was the problem. – neo May 09 '19 at 08:42
2

The example code you have provided is working fine for me. I'm using the same Pandas version as you and Python 3.7:

It's working...

I removed redundant whitespace from the input file you provided:

#00:00:00 col0 col1
2019-03-28_08:58:00 1064 31965
2019-03-28_09:08:00 1084 32565
!2019-03-28_09:18:00 1104 33165
2019-03-28_09:28:00 1124 33765
Nick Martin
  • 731
  • 3
  • 17
  • Thank you ... The whitespace is the problem. As you suggested with removing the redundant whitespace it works. The source provides it like that to make it readable in any text editor. But that seems to be a problem with parsing it again. – neo May 09 '19 at 08:41