1

I am playing around with StatsBomb FIFA World Cup 18 data and am trying to figure out the central players in each team. I do this by constructing a network of passes (directed graph with player making a pass and player receiving the pass). Then I look at various centrality measures to gauge the most pivotal players (as regards to playmaking)

There are essentially two ways to present this data to the algorithm. One is to give each event (pass) as individual row when creating the graph (multiple edges between two nodes), the other is to aggregate by passer and receiver and give weight to the edge according to how many times player A passed to player B etc.

When looking at strength-scores, the results are exactly the same. However, that alone is uninteresting (strikers receive a lot of balls but they no longer pass it onwards as much so they cannot be said to be in the thick of things)

Betweenness-score would be more appropriate, calculating how often player X is the bridge in ball going from A to B. (to my best understanding of weighted version of betweenneess-measure)

However, here results fluctuate wildly. The one-pass-per-row, multiple edges between two nodes give reasonably logical results (here, looking at Argentina vs Croatia and Messi the second most central player after defender Nicolas Tagliafico and strikers Aguero and later Higuain in the other end. If you remember, the game was a disaster for Argentina: https://www.independent.co.uk/sport/football/world-cup/world-cup-2018-argentina-jorge-sampaoli-group-nigeria-lionel-messi-tactics-a8411031.html

The weighted-version however puts sub Higuain at top and good scores for Aguero and two other subs Dybala/Pavon as well. This cannot be right but I have no idea why the results are so different. Does the fact that on individual level we have an ordering of passes matter?

Here's my R code, StatsBombR needs to be installed from Github, via devtools::install_github("statsbomb/StatsBombR")

library(StatsBombR)

library(dplyr)

library(igraph)

matches <- FreeMatches(43)

#download match events

match <- get.matchFree(matches[9,])

#take info about passes, remove non-pass events

passes <- select(match,player.name,pass.recipient.name,team.name)

passes <- na.omit(passes)

#teams in match

teams <- unique(passes$team.name)

#two ways of presenting data, pass-per-row or aggregated player-wise

teamPasses <- passes[passes$team.name==teams[2],1:2]

weightPasses <- teamPasses %>% 
    group_by(player.name, pass.recipient.name) %>% 
    summarise(weight=n())

#create graphs

net <- graph_from_data_frame(teamPasses, directed = TRUE)

net2 <- graph_from_data_frame(weightPasses, directed = TRUE)
E(net2)$weight <- weightPasses$weight

#scores

betweenness(net)

betweenness(net2)

strength(net, mode = "out")

strength(net2, mode = "out")
paqmo
  • 3,649
  • 1
  • 11
  • 21
Kellopeli
  • 21
  • 3

1 Answers1

0

Ad strength():

They are identical, just the ordering of vertices is different

all.equal(
  sort(strength(net, mode="out")),
  sort(strength(net2, mode="out"))
)
# TRUE

Ad betweenness():

It is easier to explain on a simple graph like these two:

el <- data_frame(
  from = c("A", "B", "A", "D", "A"), 
  to = c("B", "C", "D", "C", "B")
  )
g <- graph_from_data_frame(el)

elw <- el %>%
  count(from, to) %>%
  rename(weight=n)

gw <- graph_from_data_frame(elw) 

plot(g)

enter image description here

In gw there are only 4 arcs (vs 5 in g) and arc A -> B has weight 2 while all others have weight 1. Let's focus on betweenness of B:

  • In the multigraph g it will be 2/3 because:

    1. B lies only on the shortest paths between A and C
    2. There are alltogether 3 shortest paths from A to C: A->D->C and two paths A->B->C as we can pick one of the two arcs in (A,B) dyad.
    3. Two of those paths involve B
  • In the weighted graph gw it will be 0 because:

    1. The path A->B->C has weight 3
    2. The path A->D->C has weight 2
    3. The "shortest" (minimal weight) path is only one, and it does not involve B

In other words, in the weighted graph a shortest path is actually "path with minimum sum of arc weights". In that sense you are finding paths of players that pass the ball the least, which is probably totally not what you are after.

Michał
  • 2,755
  • 1
  • 17
  • 20