0

I have the following data definition about a football game:

Game = namedtuple('Game', ['Date', 'Home', 'Away', 'HomeShots', 'AwayShots',
                           'HomeBT', 'AwayBT', 'HomeCrosses', 'AwayCrosses',
                           'HomeCorners', 'AwayCorners', 'HomeGoals',
                           'AwayGoals', 'HomeXG', 'AwayXG'])

Here are some exmaples:

[Game(Date=datetime.date(2018, 10, 21), Home='Everton', Away='Crystal Palace', HomeShots='21', AwayShots='6', HomeBT='22', AwayBT='13', HomeCrosses='21', AwayCrosses='14', HomeCorners='10', AwayCorners='5', HomeGoals='2', AwayGoals='0', HomeXG='1.93', AwayXG='1.5'),
 Game(Date=datetime.date(2019, 2, 27), Home='Man City', Away='West Ham', HomeShots='20', AwayShots='2', HomeBT='51', AwayBT='6', HomeCrosses='34', AwayCrosses='5', HomeCorners='12', AwayCorners='2', HomeGoals='1', AwayGoals='0', HomeXG='3.68', AwayXG='0.4'),
 Game(Date=datetime.date(2019, 2, 9), Home='Fulham', Away='Man Utd', HomeShots='12', AwayShots='15', HomeBT='19', AwayBT='38', HomeCrosses='20', AwayCrosses='12', HomeCorners='5', AwayCorners='4', HomeGoals='0', AwayGoals='3', HomeXG='2.19', AwayXG='2.13'),
 Game(Date=datetime.date(2019, 3, 9), Home='Southampton', Away='Tottenham', HomeShots='12', AwayShots='15', HomeBT='13', AwayBT='17', HomeCrosses='15', AwayCrosses='15', HomeCorners='1', AwayCorners='10', HomeGoals='2', AwayGoals='1', HomeXG='2.08', AwayXG='1.27'),
 Game(Date=datetime.date(2018, 9, 22), Home='Man Utd', Away='Wolverhampton', HomeShots='16', AwayShots='11', HomeBT='17', AwayBT='17', HomeCrosses='26', AwayCrosses='13', HomeCorners='5', AwayCorners='4', HomeGoals='1', AwayGoals='1', HomeXG='0.62', AwayXG='1.12')]

And two almost identical functions calculating home and away statistics for a given team.

def calculate_home_stats(team, games):
    """
    Calculates home stats for the given team.
    """
    home_stats = defaultdict(float)

    home_stats['HomeShotsFor'] = sum(int(game.HomeShots) for game in games if game.Home == team)
    home_stats['HomeShotsAgainst'] = sum(int(game.AwayShots) for game in games if game.Home == team)
    home_stats['HomeBoxTouchesFor'] = sum(int(game.HomeBT) for game in games if game.Home == team)
    home_stats['HomeBoxTouchesAgainst'] = sum(int(game.AwayBT) for game in games if game.Home == team)
    home_stats['HomeCrossesFor'] = sum(int(game.HomeCrosses) for game in games if game.Home == team)
    home_stats['HomeCrossesAgainst'] = sum(int(game.AwayCrosses) for game in games if game.Home == team)
    home_stats['HomeCornersFor'] = sum(int(game.HomeCorners) for game in games if game.Home == team)
    home_stats['HomeCornersAgainst'] = sum(int(game.AwayCorners) for game in games if game.Home == team)
    home_stats['HomeGoalsFor'] = sum(int(game.HomeGoals) for game in games if game.Home == team)
    home_stats['HomeGoalsAgainst'] = sum(int(game.AwayGoals) for game in games if game.Home == team)
    home_stats['HomeXGoalsFor'] = sum(float(game.HomeXG) for game in games if game.Home == team)
    home_stats['HomeXGoalsAgainst'] = sum(float(game.AwayXG) for game in games if game.Home == team)
    home_stats['HomeGames'] = sum(1 for game in games if game.Home == team)

    return home_stats


def calculate_away_stats(team, games):
    """
    Calculates away stats for the given team.
    """
    away_stats = defaultdict(float)

    away_stats['AwayShotsFor'] = sum(int(game.AwayShots) for game in games if game.Away == team)
    away_stats['AwayShotsAgainst'] = sum(int(game.HomeShots) for game in games if game.Away == team)
    away_stats['AwayBoxTouchesFor'] = sum(int(game.AwayBT) for game in games if game.Away == team)
    away_stats['AwayBoxTouchesAgainst'] = sum(int(game.HomeBT) for game in games if game.Away == team)
    away_stats['AwayCrossesFor'] = sum(int(game.AwayCrosses) for game in games if game.Away == team)
    away_stats['AwayCrossesAgainst'] = sum(int(game.HomeCrosses) for game in games if game.Away == team)
    away_stats['AwayCornersFor'] = sum(int(game.AwayCorners) for game in games if game.Away == team)
    away_stats['AwayCornersAgainst'] = sum(int(game.HomeCorners) for game in games if game.Away == team)
    away_stats['AwayGoalsFor'] = sum(int(game.AwayGoals) for game in games if game.Away == team)
    away_stats['AwayGoalsAgainst'] = sum(int(game.HomeGoals) for game in games if game.Away == team)
    away_stats['AwayXGoalsFor'] = sum(float(game.AwayXG) for game in games if game.Away == team)
    away_stats['AwayXGoalsAgainst'] = sum(float(game.HomeXG) for game in games if game.Away == team)
    away_stats['AwayGames'] = sum(1 for game in games if game.Away == team)

    return away_stats

I'm wondering if there is a way to abstract over these two functions and merge them into one without creating a wall of if/else statements to determine whether the team plays at home or away from home and which fields should be counted.

  • 1
    I think the problem comes from your data structure. It's probably a good idea to design it so this kind of abstraction becomes trivial. For example, playing away/home doesn't have any impact on the rest of the data (goals/shots/etc.) – cglacet Jun 09 '20 at 00:01
  • So could you provide an alternative data definition which will help to make the required abstraction? – Konstantin Kostanzhoglo Jun 09 '20 at 00:03
  • Sure, I'll have to make this in a regular answer tho, that's a bit long for a comment. – cglacet Jun 09 '20 at 00:07
  • 1
    Organize your data so accessing home and away stats is more uniform. For example, have nested `game.HomeStats` and `game.AwayStats` data structures that store home and away stats in the same format instead of using two sets of separate attributes. – user2357112 Jun 09 '20 at 00:08
  • Looking forward to see it! – Konstantin Kostanzhoglo Jun 09 '20 at 00:08
  • @user2357112 supports Monica Is it such a good deal to change the data definition and rewrite all the functions relying on this data definition just for the sake of making a little abstraction? – Konstantin Kostanzhoglo Jun 09 '20 at 00:11

2 Answers2

1

Having cleaner data structure allow for writing simpler code. In that case, your data already contains duplication (eg, you have both HomeShots and AwayShots).

There are many possible answers to how you could structure data here. I'll just go over a solution that doesn't change too much from your original structure.

Statistics = namedtuple('Statistics', ['shots', 'BT', 'crosses', 'corners', 'goals', 'XG'])
Game = namedtuple('Game', ['home', 'away', 'date', 'home_stats', 'away_stats'])

You could use this like this (I haven't included all stats here, just a few to give an example):

def calculate_stats(games, team_name, home_stats_only=False, away_stats_only=False):

    home_stats = [g.home_stats._asdict() for g in games if g.home == team_name]
    away_stats = [g.away_stats._asdict() for g in games if g.away == team_name]

    if away_stats_only:
        input_stats = away_stats
    elif home_stats_only:
        input_stats = home_stats
    else:
        input_stats = home_stats + away_stats

    def sum_on_field(field_name):
        return sum(stats[field_name] for stats in input_stats)

    return {f:sum_on_field(f) for f in Statistics._fields}

Which can then be used to get both away/home stats:

example_game_1 = Game(
    home='Burnley', 
    away='Arsenal',
    date=datetime.now(),
    home_stats=Statistics(shots=12, BT=26, crosses=21, corners=4, goals=1, XG=1.73),
    away_stats=Statistics(shots=17, BT=26, crosses=22, corners=5, goals=3, XG=2.87),
)

example_game_2 = Game(
    home='Arsenal',
    away='Pessac',
    date=datetime.now(),
    home_stats=Statistics(shots=1, BT=1, crosses=1, corners=1, goals=1, XG=1),
    away_stats=Statistics(shots=2, BT=2, crosses=2, corners=2, goals=2, XG=2),
)

print(calculate_stats([example_game_1, example_game_2], 'Arsenal'))
print(calculate_stats([example_game_1, example_game_2], 'Arsenal', home_stats_only=True))
print(calculate_stats([example_game_1, example_game_2], 'Arsenal', away_stats_only=True))

Which prints:

{'shots': 18, 'BT': 27, 'crosses': 23, 'corners': 6, 'goals': 4, 'XG': 3.87}
{'shots': 1, 'BT': 1, 'crosses': 1, 'corners': 1, 'goals': 1, 'XG': 1}
{'shots': 17, 'BT': 26, 'crosses': 22, 'corners': 5, 'goals': 3, 'XG': 2.87}

When dealing with this kind of data, it's usually a good idea to use specialised tools like, for example, pandas. It could also be very convenient to use interactive tools, like JupyterLab.

cglacet
  • 8,873
  • 4
  • 45
  • 60
  • 1
    I'm pretty sure the home and away stats are for the two different teams in a game, not for two separate matches. – user2357112 Jun 09 '20 at 00:47
  • Ah, that makes sense indeed. – cglacet Jun 09 '20 at 00:49
  • @user2357112 supports Monica **this!** Given a regular championship for 20 teams, every team has 19 home and 19 away games and my functions calculate summary for them. There is no such a thing as `Match` in my program. – Konstantin Kostanzhoglo Jun 09 '20 at 00:50
  • I was staring at your code trying to figure out what's happening :) Seems like you get it as a summary for a knock-out tournament rather than a regular championship. – Konstantin Kostanzhoglo Jun 09 '20 at 00:52
  • @cglacet I have added some more examples into the code to make it clearer. – Konstantin Kostanzhoglo Jun 09 '20 at 00:57
  • I updated the answer, that doesn't change the idea too much, but this time it makes sense in terms of footbal (I guess) – cglacet Jun 09 '20 at 01:01
  • It seems like you are calculating stats for home and away teams overall, not for every team in particular. For example, when I call `calculate_home_stats('Arsenal', GAMES)` I get stats for **Arsenal** when they play at home, whereas your function seem to get overall home stats for all the teams combined. **Update** Apparently you have made an Edit when I was writing this. – Konstantin Kostanzhoglo Jun 09 '20 at 01:12
  • On the other hand that's still true, but easy to modify. – cglacet Jun 09 '20 at 01:23
  • You forgot to change the condition `if g.home == team_name` to `g.away` in the away case. – Konstantin Kostanzhoglo Jun 09 '20 at 01:25
  • Also your code produces overall stats with no divison on home_stats and away_stats, which is the crux of the problem. – Konstantin Kostanzhoglo Jun 09 '20 at 01:28
  • The output for home_stats should be like this: `{'HomeShotsFor': 242, 'HomeShotsAgainst': 201, 'HomeBoxTouchesFor': 517, 'HomeBoxTouchesAgainst': 312, 'HomeCrossesFor': 362, 'HomeCrossesAgainst': 260, 'HomeCornersFor': 125, 'HomeCornersAgainst': 84, 'HomeGoalsFor': 42, 'HomeGoalsAgainst': 16, 'HomeXGoalsFor': 37.67, 'HomeXGoalsAgainst': 24.77, 'HomeGames': 19}` – Konstantin Kostanzhoglo Jun 09 '20 at 01:30
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/215546/discussion-between-cglacet-and-konstantin-kostanzhoglo). – cglacet Jun 09 '20 at 01:33
0

I recommend not using a named tuple but a simple tuple with a dictionary, for example:

game=(datetime.date(2019, 5, 12), 'Burnley', 'Arsenal', '12', '17', '26', '26', '21', '22', '4', '5', '1', '3', '1.73', '2.87')

And a mapping dictionary:

numtostr={0: 'Date', 1: 'Home', 2: 'Away', 3: 'HomeShots', 4: 'AwayShots', 5: 'HomeBT', 6: 'AwayBT', 7: 'HomeCrosses', 8: 'AwayCrosses', 9: 'HomeCorners', 10: 'AwayCorners', 11: 'HomeGoals', 12: 'AwayGoals', 13: 'HomeXG'}
strtonum={'Date': 0, 'Home': 1, 'Away': 2, 'HomeShots': 3, 'AwayShots': 4, 'HomeBT': 5, 'AwayBT': 6, 'HomeCrosses': 7, 'AwayCrosses': 8, 'HomeCorners': 9, 'AwayCorners': 10, 'HomeGoals': 11, 'AwayGoals': 12, 'HomeXG': 13}

Make the mapping dictionaries for homestats and awaystats ({0: 'HomeShotsFor', 1: 'HomeShotsAgainst' etc} for home_stats). To explain how mapping dictionaries work, for example, if you want to get the HomeCrosses of a game, you can have

game[7]

or

game[strtonum['HomeCrosses']]

Then the functions:

def calculate_home_stats(team, games):
    home_stats=[0]*13
    for game in games:
        if game[1]=team:
            for index in range(12):
                home_stats[index]+=game[index+3] #because you just put the sum of everything except date, home, and away which are the first 3 indices. see how this cleans everything up?
            home_stats[12]+=1

def calculate_away_stats(team, games):
    away_stats=[0]*13
    for game in games:
        if game[2]=team:
            for index in range(12):
                away_stats[index]+=game[index+3]
            away_stats[12]+=1

If you really want to merge both functions into one you can do this:

def calculate_stats(team, games, homeaway):
    stats=[0]*13
    for game in games:
        if game[{'Home': 1, 'Away': 2}[homeaway]]=team:
            for index in range(12):
                stats[index]+=game[index+3]
            stats[12]+=1

As with my function the only thing you have to change is the index to check for home or away, instead of the redundant if else statements which require a lot of change.

Aphrodite
  • 111
  • 1
  • 8
  • To be honest not sure what is changes. We still have two almost identical functions which seem much less readable. Also it is not clear what's wrong with using `namedtuple`. – Konstantin Kostanzhoglo Jun 09 '20 at 00:40
  • I edited the answer to better answer the question. I made the function less verbose so it is more easily changed and thus allow both functions to be merged without changing all the if else statement. I didn't use namedtuple because it means I cannot really do stats[index]+=game[index+3] with them, which constitutes all the if else statements we desperately don't want. – Aphrodite Jun 09 '20 at 01:01