I am playing around with StatsBomb FIFA World Cup 18 data and am trying to figure out the central players in each team. I do this by constructing a network of passes (directed graph with player making a pass and player receiving the pass). Then I look at various centrality measures to gauge the most pivotal players (as regards to playmaking)
There are essentially two ways to present this data to the algorithm. One is to give each event (pass) as individual row when creating the graph (multiple edges between two nodes), the other is to aggregate by passer and receiver and give weight to the edge according to how many times player A passed to player B etc.
When looking at strength-scores, the results are exactly the same. However, that alone is uninteresting (strikers receive a lot of balls but they no longer pass it onwards as much so they cannot be said to be in the thick of things)
Betweenness-score would be more appropriate, calculating how often player X is the bridge in ball going from A to B. (to my best understanding of weighted version of betweenneess-measure)
However, here results fluctuate wildly. The one-pass-per-row, multiple edges between two nodes give reasonably logical results (here, looking at Argentina vs Croatia and Messi the second most central player after defender Nicolas Tagliafico and strikers Aguero and later Higuain in the other end. If you remember, the game was a disaster for Argentina: https://www.independent.co.uk/sport/football/world-cup/world-cup-2018-argentina-jorge-sampaoli-group-nigeria-lionel-messi-tactics-a8411031.html
The weighted-version however puts sub Higuain at top and good scores for Aguero and two other subs Dybala/Pavon as well. This cannot be right but I have no idea why the results are so different. Does the fact that on individual level we have an ordering of passes matter?
Here's my R code, StatsBombR needs to be installed from Github, via devtools::install_github("statsbomb/StatsBombR")
library(StatsBombR)
library(dplyr)
library(igraph)
matches <- FreeMatches(43)
#download match events
match <- get.matchFree(matches[9,])
#take info about passes, remove non-pass events
passes <- select(match,player.name,pass.recipient.name,team.name)
passes <- na.omit(passes)
#teams in match
teams <- unique(passes$team.name)
#two ways of presenting data, pass-per-row or aggregated player-wise
teamPasses <- passes[passes$team.name==teams[2],1:2]
weightPasses <- teamPasses %>%
group_by(player.name, pass.recipient.name) %>%
summarise(weight=n())
#create graphs
net <- graph_from_data_frame(teamPasses, directed = TRUE)
net2 <- graph_from_data_frame(weightPasses, directed = TRUE)
E(net2)$weight <- weightPasses$weight
#scores
betweenness(net)
betweenness(net2)
strength(net, mode = "out")
strength(net2, mode = "out")