I have one DataFrame in Python 3.5, such as:
In [1]:tway5new.info()
<class 'pandas.core.frame.DataFrame'>
Index: 44 entries, to VOI
Columns: 43802 entries, 2011-01-01 00:00:00 to 2015-12-31 23:00:00
dtypes: int64(43802)
memory usage: 14.7+ MB
And the column name for this Dataframe is:
In [2]:tway5new.columns
Out[2]:
DatetimeIndex(['2011-01-01 00:00:00', '2011-01-01 01:00:00',
'2011-01-01 02:00:00', '2011-01-01 03:00:00',
...
'2015-12-31 20:00:00', '2015-12-31 21:00:00',
'2015-12-31 22:00:00', '2015-12-31 23:00:00'],
dtype='datetime64[ns]', name='timenew', length=43802, freq=None)
I want to subset this DataFrame into a sequence of relatively small data frame, i.e. each small data frame only incorporated one typical day record, such as: first small data frame contained information from '2011-01-01 00:00:00' to '2011-01-01 23:00:00', second small data frame contained information from '2011-01-02 00:00:00' to '2011-01-02 23:00:00', ....until 1826th small data frame contained information from '2015-12-31 00:00:00' to '2015-12-31 23:00:00'.
According to @EdChum help, I tried the following code:
df = tway5new.T
df.resample('d')
demean = lambda x: abs(x - x.mean())
Asub=df.groupby([df.index.year, df.index.month, df.index.day]).transform(demean)
#TO obtain the absolute difference between hours record and average hours record for each column###
Asubmax=Asub.groupby([df.index.year, df.index.month, df.index.day]).max()
AsubmaxID=Asubmax.idxmax(axis=1)
This gave the index of the maximum value for each typical day across columns, i.e, from 2011-01-01 to 2015-12-31:
with pd.option_context('display.max_rows',10,'display.max_columns',6):
print (AsubmaxID)
2011 1 1 UNF
2 NAT
3 NAT
4 NAT
5 NAT
2015 12 27 NAT
28 NAT
29 NAT
30 NAT
31 GOA
dtype: object
BmaxID=Asub.groupby([df.index.year, df.index.month, df.index.day]).idxmax(axis=1)
This gave the index of the maximum value for each hour record across columns, i.e, from 2011-01-01 00:00:00 to 2015-12-31 23:00:00:
with pd.option_context('display.max_rows',10,'display.max_columns',6):
print (BmaxID)
timenew
2011 1 1 2011-01-01 00:00:00 UNF
2011-01-01 01:00:00 NAT
2011-01-01 02:00:00 RTF
2011-01-01 03:00:00 UNF
2011-01-01 04:00:00 NAT
2015 12 31 2015-12-31 19:00:00 NAT
2015-12-31 20:00:00 NAT
2015-12-31 21:00:00 GOA
2015-12-31 22:00:00 NAT
2015-12-31 23:00:00 GOA
dtype: object
Now, how to find the index of maximum value for each typical day with exact hour record, for example, I know the index of overall maximum value for the first date 2011-01-01 was UNF column, but how can I obtain exact hour record for this UNF column on 2011-01-01, i.e. which hour record (either 2011-01-01 00:00:00 or 2011-01-01 03:00:00) has the maximum value?
Thanks a lot