3

Let say I have some data associated with football

Date   Home     Away  HomeGoal AwayGoal TotalGoal
2019   Arsenal  MU     5        1        6
2019   MCity    Liv    2        2        4
2019   MU       Liv    3        4        7
2019   MCity    MU     0        0        0

I want to create a column of data which show the average goals of that team in the recent 2 games. For example in the last row, I want to include a column that shows the average goal of MU in their last 2 games, which = (1+3)/2 = 2.

Is there any functions in python can achieve this?

Marco Lau
  • 43
  • 3
  • So it does not matter if the team is home or away correct? and wouldn't that last row be 1.5? Since MU scored 0 and then previously 3? 3/2 = 1.5 – MattR Feb 07 '20 at 13:17
  • 1
    Cud u show a DataFrame with ur expected output? – sammywemmy Feb 07 '20 at 13:54
  • @MattR it's not matter, I intend to find the average goals of that team in their last N matches. As the last row is the result data, so I don't include it to train the model. So it is correct that the average goal of MU should be 2. – Marco Lau Feb 07 '20 at 14:39
  • @sammywemmy my expected output should be the actual numbers of goal. i m not at home right now, but the data includes other football data in matches – Marco Lau Feb 07 '20 at 14:45
  • try creating two data frame with `Home` and `HomeGoal` as column and the other one with one `Away` and `AwayGoal` as column and then stack them together with new columns `Team` and `Goal`and then use `groupby(Team)[Goal].sum()` to count total goal – Shubham Shaswat Feb 07 '20 at 14:52
  • Although you have a good answer, please note that SO is not a coding service. You don't just ask people to write the code for you, read [mcve] – anishtain4 Feb 07 '20 at 15:46
  • What is the issue, exactly? Have you tried anything, done any research? You seem to be able to read english, so you can read the Pandas docs. – AMC Feb 08 '20 at 21:50

2 Answers2

1

Try this way:

Split into two Data frames according to their Home and Away Goals

df1=df[['Date','Home','HomeGoal']]
df2 = df[['Date','Away','AwayGoal']]

all_dfs=[df1,df2]

Name the columns

for dfs in all_dfs:
    dfs.columns = ['Date','Team', 'Goal']

Concat two dfs together

new_df=pd.concat(all_dfs,ignore_index=True).reset_index(drop=True)

output:

Date       Team    Goal
0   2019    Arsenal 5
1   2019    Mcity   2
2   2019    MU      3
3   2019    Mcity   0
4   2019    MU      1
5   2019    Liv     2
6   2019    Liv     4
7   2019    MU      0

Average of last two games:

new_df[new_df['Team'] == 'MU'].sort_values('Date')['Goal'][:2].sum()/2

Total Goal wrt to Team in both away and home matches

new_df.groupby('Team')['Goal'].sum() 

Output:

Team
Arsenal    5
Liv        6
MU         4
Mcity      2

Community
  • 1
  • 1
Shubham Shaswat
  • 1,250
  • 9
  • 14
1

For your requirement, you don't care if a team is Home or Away, only how many goal it scored per match. Try this:

# Rename the columns to make the unstacking operation a bit easier
# Always a good idea to specify an explicit `copy` when you intend
# to change the dataframe structure
>>> tmp = df[['Home', 'Away', 'HomeGoal', 'AwayGoal']].copy()

# Arrange the columns into a MultiIndex to make stacking easier
>>> tmp.columns = pd.MultiIndex.from_product([['Team', 'Goal'], ['Home', 'Away']])

# This is what `tmp` look like:

           Team      Goal     
      Home Away Home Away
0  Arsenal   MU    5    1
1    MCity  Liv    2    2
2       MU  Liv    3    4
3    MCity   MU    0    0

# And now the magic
>>> tmp.stack() \
        .groupby('Team').rolling(2).mean() \
        .groupby('Team').tail(1) \
        .droplevel([1,2])

# Result
         Goal
Team         
Arsenal   NaN
Liv       3.0
MCity     1.0
MU        1.5

Here's how it works:

  • stack unpivots Home and Away so that for every match, we have 2 rows for Teams and Goal
  • groupby('Team').rolling(2).mean() gets the rolling average of goal scored over the last 2 games per team
  • groupby('Team').tail(1) gets the last of those rolling averages per team
  • By this time, the transitional dataframe has 3 levels in its index: team's name, match number and home/away indicator of the last game played. We only care about the first, so we will drop the other 2.
Code Different
  • 90,614
  • 16
  • 144
  • 163