Question
I am wondering how to design a relational database for NBA games.
There are several similar questions in StackOverflow but no one gives a satisfying answer.
Star Schema database for NBA/Basketball statistics
Database for Basketball teams/games/scores per season
Apparently the game data is interesting for basketball fans and tons of websites provide these statistics.
Also, video games for basketball, football, ... should maintain similar data too.
Is there a standard/recommended way to design these databases?
Example
The solution that comes into my mind looks like:
Team and Game
Country (CountryCode pk, CountryName)
City (Country pk fk, CityCode pk, CityName)
Team (TeamCode pk, TeamName, City fk)
Person (Name pk, Birthday pk, Height, Weight, HomeTown fkCity)
Employment (Employee pk fkPerson, JoiningDate pk, TeamCode fk2, DepartureDate, Salary, JerseyNumber)
Game (City pk fk1, Date pk, HostTeam fk2, VisitingTeam fk3)
GamePlayer (Game pk fk, Player pk fkEmployee, Position)
Game events
Shooting (Game pk fk1, TimeOfGame pk, Player fk2, ShootingType, Missed)
Rebound (Game pk fk1, TimeOfGame pk, Player fk2, IsOffensive)
Block (Game pk fk1, TimeOfGame pk, Blocker fk2, Blocked fk3)
Steal (Game pk fk1, TimeOfGame pk, Stealer fk2, Stealed fk3)
Other game events...
Another way is to store only the stats (derived data) for each game.
GameStats (Game pk fk1, Player pk fk2, Minutes, FieldGoal, FieldGoalMissed, ThreePointMade, ThreePointMissed, FreeThrow, FreeThrowMissed, DefensiveRebounds, OffensiveRebounds, Blocks, ...)
Historical Data
One difficulty I met for this data model is that players can change teams, change jersey numbers so we need to store historical data.
The events (Shooting, Rebound, Timeout, Substitution, ...) in a game are historical data too.
Derived Data
Another difficulty I met is to decide whether to store derived data.
Since derived data causes Update Anomaly, we should avoid them. However, I find it difficult to avoid derived data completely.
For example
- The game statistics are derived data. In order to avoid them, we have to store the events.
- The minutes a player played for a game is derived from the events (Substitution/Timeout/...).
- Some statistics are related, Turnover is derivable from Steal/Block/...
- For a game event, the game time can be derived from the time of day and other events (Substitution/Timeout/...).
Shall we make a compromise and store derived data for simplicity?
Does a truly normalized database contain derived data?
Edit 1 - Neutral Stadium
To account for neutral stadiums where no team is the home team, we can use the following design.
Stadium (City pk, Name pk, Capacity)
Game (Stadium pk fk, Date pk)
TeamRole (TeamRole pk) ['Home', 'Away', 'Neutral']
GameTeam (Game pk fk, Team pk fk, TeamRole pk fk)