Find running average which equal

Question

Let say I have some data associated with football

Date   Home     Away  HomeGoal AwayGoal TotalGoal
2019   Arsenal  MU     5        1        6
2019   MCity    Liv    2        2        4
2019   MU       Liv    3        4        7
2019   MCity    MU     0        0        0

I want to create a column of data which show the average goals of that team in the recent 2 games. For example in the last row, I want to include a column that shows the average goal of MU in their last 2 games, which = (1+3)/2 = 2.

Is there any functions in python can achieve this?

So it does not matter if the team is home or away correct? and wouldn't that last row be 1.5? Since MU scored 0 and then previously 3? 3/2 = 1.5 — MattR, Feb 07 '20 at 13:17
@MattR it's not matter, I intend to find the average goals of that team in their last N matches. As the last row is the result data, so I don't include it to train the model. So it is correct that the average goal of MU should be 2. — Marco Lau, Feb 07 '20 at 14:39
@sammywemmy my expected output should be the actual numbers of goal. i m not at home right now, but the data includes other football data in matches — Marco Lau, Feb 07 '20 at 14:45
try creating two data frame with `Home` and `HomeGoal` as column and the other one with one `Away` and `AwayGoal` as column and then stack them together with new columns `Team` and `Goal`and then use `groupby(Team)[Goal].sum()` to count total goal — Shubham Shaswat, Feb 07 '20 at 14:52
Although you have a good answer, please note that SO is not a coding service. You don't just ask people to write the code for you, read [mcve] — anishtain4, Feb 07 '20 at 15:46
What is the issue, exactly? Have you tried anything, done any research? You seem to be able to read english, so you can read the Pandas docs. — AMC, Feb 08 '20 at 21:50

score 1 · Answer 1 · edited Jun 20 '20 at 09:12

Try this way:

Split into two Data frames according to their Home and Away Goals

df1=df[['Date','Home','HomeGoal']]
df2 = df[['Date','Away','AwayGoal']]

all_dfs=[df1,df2]

Name the columns

for dfs in all_dfs:
    dfs.columns = ['Date','Team', 'Goal']

Concat two dfs together

new_df=pd.concat(all_dfs,ignore_index=True).reset_index(drop=True)

output:

Date       Team    Goal
0   2019    Arsenal 5
1   2019    Mcity   2
2   2019    MU      3
3   2019    Mcity   0
4   2019    MU      1
5   2019    Liv     2
6   2019    Liv     4
7   2019    MU      0

Average of last two games:

new_df[new_df['Team'] == 'MU'].sort_values('Date')['Goal'][:2].sum()/2

Total Goal wrt to Team in both away and home matches

new_df.groupby('Team')['Goal'].sum()

Output:

Team
Arsenal    5
Liv        6
MU         4
Mcity      2

How do you get the average of last two games the OP is asking for? — XXavier, Feb 07 '20 at 15:15

score 1 · Accepted Answer · answered Feb 07 '20 at 16:16

For your requirement, you don't care if a team is Home or Away, only how many goal it scored per match. Try this:

# Rename the columns to make the unstacking operation a bit easier
# Always a good idea to specify an explicit `copy` when you intend
# to change the dataframe structure
>>> tmp = df[['Home', 'Away', 'HomeGoal', 'AwayGoal']].copy()

# Arrange the columns into a MultiIndex to make stacking easier
>>> tmp.columns = pd.MultiIndex.from_product([['Team', 'Goal'], ['Home', 'Away']])

# This is what `tmp` look like:

           Team      Goal     
      Home Away Home Away
0  Arsenal   MU    5    1
1    MCity  Liv    2    2
2       MU  Liv    3    4
3    MCity   MU    0    0

# And now the magic
>>> tmp.stack() \
        .groupby('Team').rolling(2).mean() \
        .groupby('Team').tail(1) \
        .droplevel([1,2])

# Result
         Goal
Team         
Arsenal   NaN
Liv       3.0
MCity     1.0
MU        1.5

Here's how it works:

stack unpivots Home and Away so that for every match, we have 2 rows for Teams and Goal
groupby('Team').rolling(2).mean() gets the rolling average of goal scored over the last 2 games per team
groupby('Team').tail(1) gets the last of those rolling averages per team
By this time, the transitional dataframe has 3 levels in its index: team's name, match number and home/away indicator of the last game played. We only care about the first, so we will drop the other 2.

How is this question worth answering? – AMC Feb 08 '20 at 21:50 — AMC, Feb 08 '20 at 21:50

Find running average which equal

2 Answers2

output:

Output: