1

I have a multi index dataframe with a minute timestamp of 5 minutes interval. I am trying to convert "TimeStamp" column to a "datetimeindex" so I can use this solution here (How do you clean and forward fill a multiple day 1 minute time series with pandas?). My sample data:

            2002074266              2002074444
TimeStamp   Total        Freq       Total       Freq
HH:MM        Pc           Hz         Pc          Hz
06:50        3            10         7           13
06:55        4            11         8           14
07:00        5            12         9           15

Is it possible to do it - I have tried the following code but it does not work.

df['TimeStamp'] = pd.to_datetime(df['TimeStamp'])
df = df.set_index('TimeStamp')

It return KeyError: 'TimeStamp'.

This is the line I use to read my CSV file:

df = pd.read_csv("Equipment.csv", header=[0,1], skiprows = 3, index_col=0)

Any guide/help will be appreciated. Thank you.

SunnyBoiz
  • 514
  • 1
  • 5
  • 14
  • Time stamp doesn’t exist,so what if you try removing the index_col argument from pd.read_csv(...) and then running your code again? – skarchmit Jul 17 '18 at 04:16

1 Answers1

0

I think need change parameter skiprows to [2] first:

df = pd.read_csv("Equipment.csv", header=[0,1], skiprows = [2], index_col=0)

print (df)
      2002074266      2002074444     
           Total Freq      Total Freq
06:50          3   10          7   13
06:55          4   11          8   14
07:00          5   12          9   15

And then DatetimeIndex solution:

df.index = pd.to_datetime(df.index, format='%H:%M')
df = df.resample('1min').ffill()
print (df)
                    2002074266      2002074444     
                         Total Freq      Total Freq
1900-01-01 06:50:00          3   10          7   13
1900-01-01 06:51:00          3   10          7   13
1900-01-01 06:52:00          3   10          7   13
1900-01-01 06:53:00          3   10          7   13
1900-01-01 06:54:00          3   10          7   13
1900-01-01 06:55:00          4   11          8   14
1900-01-01 06:56:00          4   11          8   14
1900-01-01 06:57:00          4   11          8   14
1900-01-01 06:58:00          4   11          8   14
1900-01-01 06:59:00          4   11          8   14
1900-01-01 07:00:00          5   12          9   15

TimedeltaIndex solution:

df.index = pd.to_timedelta(df.index + ':00')
df = df.resample('1min').ffill()
print (df)
         2002074266      2002074444     
              Total Freq      Total Freq
06:50:00          3   10          7   13
06:51:00          3   10          7   13
06:52:00          3   10          7   13
06:53:00          3   10          7   13
06:54:00          3   10          7   13
06:55:00          4   11          8   14
06:56:00          4   11          8   14
06:57:00          4   11          8   14
06:58:00          4   11          8   14
06:59:00          4   11          8   14
07:00:00          5   12          9   15
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Thanks Jezrael. I am new to Python so sorry if this is a basic question. The resample ffill will only fill in-between the start and end of the data should there be a missing interval of x minutes? Also, I am interested to know if it is possible to verify that I have 06:50:00 to 07:00:00 data. Some files may be missing 07:00:00 due to data corruption (e.g 06:50:00 to 06:56:00 so I need to ffill up to 07:00:00 using the previous data). – SunnyBoiz Jul 17 '18 at 10:03
  • @SunnyBoiz - You are right, it replace by forward filling and add all mising rows for consecutive index. – jezrael Jul 17 '18 at 10:06