0

I'm trying to analyze goal-scoring networks in hockey. I have data for the player who scored the goal and the player who assisted on that goal. My issue is that some goals do not have an assist, so I'm not sure what I should do in those situations.

So, an example for my data looks like this:

scorer <- c("Lidstrom", "Yzerman", "Fedorov", "Yzerman", "Shanahan")
assister <- c("", "Lidstrom", "Yzerman", "Shanahan", "Lidstrom")

mydata <- data.frame(scorer, assister)

And the output is:

    scorer assister
1 Lidstrom         
2  Yzerman Lidstrom
3  Fedorov  Yzerman
4  Yzerman Shanahan
5 Shanahan Lidstrom

When I'm dealing with unassisted goals, does it make sense to act as if the assist goes to the scorer?

EX:

    scorer assister
1 Lidstrom Lidstrom        
2  Yzerman Lidstrom
3  Fedorov  Yzerman
4  Yzerman Shanahan
5 Shanahan Lidstrom

Or does it make sense to create a new name "unassisted" for unassisted goals?

EX:

    scorer assister
1 Lidstrom UNASSISTED       
2  Yzerman Lidstrom
3  Fedorov  Yzerman
4  Yzerman Shanahan
5 Shanahan Lidstrom

Here's the rest of my code for the PageRank, assuming that something is filled in for the blank assister space:

library(igraph)
library(dplyr)

my_network <- mydata %>%
  as.matrix() %>%
  graph.edgelist(directed = TRUE)

page_rank(my_network, directed = TRUE)$vector

I can't just remove goals that are unassisted, so I'm trying to come up with some solution that doesn't defy any major graph theory principles (of which I'm not knowledgeable). Any ideas?

Evan O.
  • 1,553
  • 2
  • 11
  • 20
  • First, go Red Wings!!! Second, what do you hope to use the pagerank score for? I think the answer to your question will depend on what you plan to learn from this score. Third, I would say that your network is directed, i.e. the puck goes from the assister to the scorer to the goal. – emilliman5 Jan 12 '18 at 14:36
  • Thanks for the reply. So I guess what I'm trying to learn from this is what players are the most indispensable for goal scoring. So I can say that Fedorov, for example, influences goal scoring more than Yzerman does. I know pagerank may not be ideal for that, but I think it's one way to look at it. So I know the network is directed, but doesn't making it directed severely decrease the value of the assist? I think goals > assists, but I still think assists should have similar value to goals. If I make it undirected, at least assists hold the same value. Any ideas? – Evan O. Jan 13 '18 at 16:04
  • Making a graph does not change the weight placed on the edge or nodes, it merely represents the asymmetry of the system. I think you want to make the graph directed, because a this will allow you to find scorers, assisters and scorer/assisters. In an undirected graph you will not be able to tell is someone assisted or scored when examining a pair of nodes. And I think solo scores should be represented by "scorer -> scorer" which is the diagnol of the incidence matrix. – emilliman5 Jan 15 '18 at 20:12

1 Answers1

1

I agree with the suggestion of @emilliman5 outlined in the comments: for unassisted goals, just make an edge from the scorer to itself. Then use PageRank for finding the most influential players. Actually, PageRank can be a particularly good choice here because the principles underlying the PageRank score bear some similarity to what is going on in a "real" hockey match.

Let me elaborate on this a bit. PageRank was originally invented for modeling the behaviour of a randomly chosen Internet user browsing the pages on the web. In each time step, the user can choose to follow a link on the web page currently being viewed, or surf to another, unrelated page, chosen uniformly from the set of all pages on the Internet. There is a fixed probability value that decides whether the user is going to follow a link (typically 0.85) or the user is going to "teleport" to a randomly chosen page (typically 0.15). The idea behind PageRank is that the most important pages are where the user is likely to spend a lot of time when following the rules above. The behaviour of the user is essentially a random walk over the set of webpages.

Now, in a hockey game, the "user" is the hockey puck that is being passed from player to player. At each pass, the puck is either passed from one player to another, or a goal is scored, or the puck is accidentally passed to the opposing team. In the latter two cases, the puck ends up at the opposing team, and eventually it is returned to the first team at a randomly chosen player. (This is a first approximation; if you want to go deeper, you could keep on "tracking" the puck for the opposing team as well). I think you can start seeing the similarities here. The assister-to-scorer network that you have captures a fragment of this, namely the last pass before each goal. From this point of view, I think it totally makes sense to think about unassisted goals as events where the player passed to himself before scoring.

Of course you would have a much better understanding of the team dynamics if your dataset contained all the passes, not only the ones that resulted in a goal. In fact, in that case, you could add an additional node called "GOAL" to your network, draw edges from scorers to the "GOAL" node, and then calculate the so-called personalized PageRank vector for the "GOAL" node, which would give you the most influential nodes from which the "GOAL" node is the easiest to reach. But this is more like a research question from this point onwards, and it is probably not a good fit for further discussion on Stack Overflow.

Tamás
  • 47,239
  • 12
  • 105
  • 124
  • Thanks for the response, Tamás! So if I wanted to find the pagerank for a dataset of three players per goal -- ie: `assister 1 -> assister 2 -> scorer`, do you need to use a personalized pagerank? Or is it okay to do that with a normal pagerank? – Evan O. Jan 25 '18 at 01:05
  • I would just use PageRank for the time being. Once you introduce that special "GOAL" node, you could try calculating the personalized PageRank instead, but I think that in that case, the arrows should be reversed such that all edges point from scorer to assister. (The random walker in PageRank always moves forward along the edges, so if we want to know from which nodes it is easiest to get to the GOAL node, we need to reverse the edges first). – Tamás Jan 26 '18 at 10:31