1

I have been searching a lot but could not find an answer to my problem.

I have solar cells measurement data. The data has a timestamp value but no frequency. The number of values in an hour is not constant. Any number of values can occur in an hour. So i am struggling to create:

i. A column with Hourly averages of the data.

ii. Another column that holds number of counts of measurements that occured in each hour. This is important since my timestamp has no fix frequency. So it is important to know whether the mean calculated for a particular hour of the day has 1 or 2 or 14 measurements for example.

My dataframe image:

enter image description here

My code:

yr=df.index.year
mon=df.index.month
day= df.index.day
hr = df.index.hour
df.groupby([yr, mon, day, hr])['KP_sensor'].mean()

How do i add two columns at the end of this dataframe (Mean column and Number of counts of measurements occured in that hour).

Moh
  • 61
  • 6
  • You want to use `.transform()` like [this](https://stackoverflow.com/questions/30244952/python-pandas-create-new-column-with-groupby-sum) – CJR Dec 08 '18 at 23:40
  • @CJ59 yes this solved the problem of appending the column but I am still not sure how to calculate the number of counts of measurement in each hour. Could you please help me with that as well? Thanks! – Moh Dec 08 '18 at 23:50
  • you can try: `df['counts'] = df['hr'].map(df.groupby(['hr']).size())` – YOLO Dec 09 '18 at 00:01
  • @YOLO it has some sytanx errors. 'hr' is the same that i calculated above? Its dtype is int64. I tried without '' but it did not work. Could you look at it again? Thanks! – Moh Dec 09 '18 at 00:18

2 Answers2

0

Actually, looking at your problem you'd be better off with aggregate because you have multiple columns you'd like to take the mean of.

yr=df.index.year
mon=df.index.month
day= df.index.day
hr = df.index.hour
grouped_data = df.groupby([yr, mon, day, hr])
new_data = grouped_data.agg('mean')
new_data['counts'] = grouped_data.transform('count')
CJR
  • 3,916
  • 2
  • 10
  • 23
0

Here is one of many possible solutions. You already have part of it done.

df["hour"] = df.index.hour
df["date"] = df.index.date

dfMeans = df.groupby(["date", "hour"])[["Sol_Data", "Irr_Unc_perc", "ISF_stable", "KP_sensor", "Tamb"]].mean()
dfMeans.reset_index(inplace=True)

dfHours = pd.DataFrame(df.groupby(["date", "hour"])["hour"].count())
dfHours.columns = ["count"]
dfHours.reset_index(inplace=True)

dfFinal = pd.merge(dfHours, dfMeans, how="left", on=["date","hour"])
griggy
  • 446
  • 4
  • 7