How to add new columns at the end of a dataframe that contains hourly mean values and corresponding number of counts for each mean value per hour.

Question

I have been searching a lot but could not find an answer to my problem.

I have solar cells measurement data. The data has a timestamp value but no frequency. The number of values in an hour is not constant. Any number of values can occur in an hour. So i am struggling to create:

i. A column with Hourly averages of the data.

ii. Another column that holds number of counts of measurements that occured in each hour. This is important since my timestamp has no fix frequency. So it is important to know whether the mean calculated for a particular hour of the day has 1 or 2 or 14 measurements for example.

My dataframe image:

My code:

yr=df.index.year
mon=df.index.month
day= df.index.day
hr = df.index.hour
df.groupby([yr, mon, day, hr])['KP_sensor'].mean()

How do i add two columns at the end of this dataframe (Mean column and Number of counts of measurements occured in that hour).

You want to use `.transform()` like [this](https://stackoverflow.com/questions/30244952/python-pandas-create-new-column-with-groupby-sum) — CJR, Dec 08 '18 at 23:40
@CJ59 yes this solved the problem of appending the column but I am still not sure how to calculate the number of counts of measurement in each hour. Could you please help me with that as well? Thanks! — Moh, Dec 08 '18 at 23:50
you can try: `df['counts'] = df['hr'].map(df.groupby(['hr']).size())` — YOLO, Dec 09 '18 at 00:01
@YOLO it has some sytanx errors. 'hr' is the same that i calculated above? Its dtype is int64. I tried without '' but it did not work. Could you look at it again? Thanks! — Moh, Dec 09 '18 at 00:18

score 0 · Answer 1 · answered Dec 09 '18 at 00:36

Actually, looking at your problem you'd be better off with aggregate because you have multiple columns you'd like to take the mean of.

yr=df.index.year
mon=df.index.month
day= df.index.day
hr = df.index.hour
grouped_data = df.groupby([yr, mon, day, hr])
new_data = grouped_data.agg('mean')
new_data['counts'] = grouped_data.transform('count')

score 0 · Accepted Answer · answered Dec 09 '18 at 00:37

0

Here is one of many possible solutions. You already have part of it done.

df["hour"] = df.index.hour
df["date"] = df.index.date

dfMeans = df.groupby(["date", "hour"])[["Sol_Data", "Irr_Unc_perc", "ISF_stable", "KP_sensor", "Tamb"]].mean()
dfMeans.reset_index(inplace=True)

dfHours = pd.DataFrame(df.groupby(["date", "hour"])["hour"].count())
dfHours.columns = ["count"]
dfHours.reset_index(inplace=True)

dfFinal = pd.merge(dfHours, dfMeans, how="left", on=["date","hour"])

answered Dec 09 '18 at 00:37

griggy

446
4
7

Thanks @griggy, its awesome. Only thing if i could get the final dfFinal sorted according to timestamps? – Moh Dec 09 '18 at 01:00
@Moh, this should do it for you `dfFinal.sort_values(by=["date","hour"], ascending=False)` – griggy Dec 09 '18 at 11:29
@griddy Thanks :) – Moh Dec 09 '18 at 14:59

How to add new columns at the end of a dataframe that contains hourly mean values and corresponding number of counts for each mean value per hour.

2 Answers2