1

I am new to python and this is my first question, please apologize any mistakes.

I have a big csv file with continuous measurements (measurements approx. every second, but interval is not fixed). I need to get mean value per minute. I found out that groupby would probably help me doing this but I am stuck with specifying the DATE_TIME column as index and dtype'datetime'. The csv file looks like this:

,DATE_TIME,N2O_dry
0,2016-03-01 02:32:02.651,0.70714453962
1,2016-03-01 02:32:03.762,0.7071444254000001
2,2016-03-01 02:32:05.257,0.70373171894
3,2016-03-01 02:32:05.953,0.70083729096
4,2016-03-01 02:32:07.049,0.69760065648
5,2016-03-01 02:32:07.928,0.6954438788699999
6,2016-03-01 02:32:08.726,0.6874527606899999
7,2016-03-01 02:32:10.005,0.6724201105500001
8,2016-03-01 02:32:10.851,0.6607286568199999
.
.
.
104503,2016-03-02 08:21:18.421,0.26879397415
104504,2016-03-02 08:21:19.532,0.26884030311
104505,2016-03-02 08:21:20.359,0.26887979686

So far I only succeeded in reading the file in a dataframe and specifying the DATE_TIME column as index and make the DATE_TIME column an dtype='datetime64[ns]' object with this:

import pandas

df=pandas.read_csv(file,usecols=[1,'N2O_dry'])
df=df.set_index('DATE_TIME')
df=pandas.to_datetime(df.index)

However, now i seem to be left only with the DATE_TIME column. Can somebody help me, please?

`

vera
  • 297
  • 3
  • 11

2 Answers2

0

I think you can add parameters parse_dates and index_col to read_csv and then use resample with mean (this works with pandas 0.18.0):

import pandas as pd
import io

temp=u""",DATE_TIME,N2O_dry
0,2016-03-01 02:32:02.651,0.70714453962
1,2016-03-01 02:32:03.762,0.7071444254000001
2,2016-03-01 02:32:05.257,0.70373171894
3,2016-03-01 02:32:05.953,0.70083729096
4,2016-03-01 02:32:07.049,0.69760065648
5,2016-03-01 02:32:07.928,0.6954438788699999
6,2016-03-01 02:32:08.726,0.6874527606899999
7,2016-03-01 02:32:10.005,0.6724201105500001
8,2016-03-01 02:32:10.851,0.6607286568199999"""
#after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp),
                 usecols=[1,'N2O_dry'], 
                 parse_dates=['DATE_TIME'], 
                 index_col=['DATE_TIME'])
print df
                          N2O_dry
DATE_TIME                        
2016-03-01 02:32:02.651  0.707145
2016-03-01 02:32:03.762  0.707144
2016-03-01 02:32:05.257  0.703732
2016-03-01 02:32:05.953  0.700837
2016-03-01 02:32:07.049  0.697601
2016-03-01 02:32:07.928  0.695444
2016-03-01 02:32:08.726  0.687453
2016-03-01 02:32:10.005  0.672420
2016-03-01 02:32:10.851  0.660729

print df.resample('1Min').mean()
                     N2O_dry
DATE_TIME                   
2016-03-01 02:32:00   0.6925
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Thank you! The parameters for the read_csv work perfectly! The resample function does not exactly do what I am looking for, as I try to get the mean value for each individual minute in the time-series and this function seems to give me the overall per minute mean value (output is just one value). I guess I was not specific enough when phrasing my problem. I would appreciate any further help, but I might be able to go from here! – vera Apr 10 '16 at 14:07
  • Hmmm, maybe the best is write [Minimal, Complete, and Verifiable example](http://stackoverflow.com/help/mcve) and desired output. – jezrael Apr 10 '16 at 14:13
0

If I understood correctly, then use

df.index = pd.to_datetime(df.index)

instead of

df = pd.to_datetime(df.index)

That should sort the problem with having only DATE_TIME column left. Then you get (in iPython):

In [27]:df.index
Out[27]: 
DatetimeIndex(['2016-03-01 02:32:02.651000', '2016-03-01 02:32:03.762000',
               '2016-03-01 02:32:05.257000', '2016-03-01 02:32:05.953000',
               '2016-03-01 02:32:07.049000', '2016-03-01 02:32:07.928000',
               '2016-03-01 02:32:08.726000', '2016-03-01 02:32:10.005000',
               '2016-03-01 02:32:10.851000'],
              dtype='datetime64[ns]', name=u'DATE_TIME', freq=None)

But still:

In [26]: df
Out[26]: 
                          N2O_dry
DATE_TIME                        
2016-03-01 02:32:02.651  0.707145
2016-03-01 02:32:03.762  0.707144
2016-03-01 02:32:05.257  0.703732
2016-03-01 02:32:05.953  0.700837
2016-03-01 02:32:07.049  0.697601
2016-03-01 02:32:07.928  0.695444
2016-03-01 02:32:08.726  0.687453
2016-03-01 02:32:10.005  0.672420
2016-03-01 02:32:10.851  0.660729
quapka
  • 2,799
  • 4
  • 21
  • 35
  • @vera Great, glad I could help. Consider accepting this answer (click the tick) so it won't show up in unanswered questions. – quapka Apr 10 '16 at 14:42