4

Here is the route for TextRank:

  1. Document to be summarized expressed as tf-idf matrix
  2. (tf-idf matrix)*(tf-idf matrix).Transpose = Adjacency matrix of some graph whose vertices are actually the sentences of above document
  3. Page rank is applied on this graph -> returns PR values of each sentence

Now, this PR values are actually Eigen values of that adjacency matrix
What is the physical meaning or intuition behind this.?

Why Eigen values are actually the ranks ?

Here is the link for Page Rank: http://www.cs.princeton.edu/~chazelle/courses/BIB/pagerank.htm

Here is an extract from above page:
PageRank or PR(A) can be calculated using a simple iterative algorithm, and corresponds to the principal eigenvector of the normalized link matrix of the web.

Link for TextRank: https://joshbohde.com/blog/document-summarization

mach
  • 318
  • 1
  • 5
  • 13

1 Answers1

2

To begin with, your question is a bit mistaken. The eignevalues are not the scores. Rather, the entries of the stationary eigenvector are the scores.

Textrank works on a graphical approach to words. It has a number of variations, but they have the following common steps:

  1. Create a weighted graph where the vertices are entities (words or sentences), and the weights are the transition probabilities between entities.

  2. Find the stochastic matrix associated with the graph, and score each entity according to its stationary distribution.

In this case, the graph is built as follows. First, a matrix is built where the rows are sentences and the columns are words. The entries of the matrix are specified by TF-IDF. To find the similarity between sentences, the normalized matrix is multiplied by its transform. This is because, for each two sentences and a word, there is a similarity between the sentences based on the product of the TF-IDF of the word in each sentence, and we need to sum up over all words. If you think about it a bit, summing up the products is exactly what matrix multiplication by the transpose does.

So now we have a stochastic matrix P which can be interpreted as the probability of transition from sentence i to sentence j. The score is the stationary distribution x, which means that

P x = x = 1 x.

This means that x is the eigenvector associated with the eigenvalue 1. By the Perron-Frobenius Theorem, this eigenvector exists under some mild conditions, and 1 is the largest eigenvalue. This last part is basically Pagerank.

Ami Tavory
  • 74,578
  • 11
  • 141
  • 185
  • Thanks .... So, we get some eigenvector using that above equation...BUT the doubt is: **How is finding some eigenvector relates to PageRank algorithm..?** whats the intuition/physical sense of this relation... ?. Could you please elaborate on this – mach Sep 03 '16 at 05:49
  • The last part *is* Pagerank, basically: it finds the importance by solving for the stationary eigenvector of the stochastic matrix *P*. Text rank basically borrows this idea for sentence ranking applications, and specifies how to build *P* for this case. – Ami Tavory Sep 03 '16 at 06:45
  • And how is it that P matrix is stochastic ( tf-idf values can be anything and might not add to 1) ? – mach Sep 03 '16 at 07:07
  • @mach I explained that in the answer: it's the product of the *normalized* TF-IDF matrix by itself, with the rationale I outlined. – Ami Tavory Sep 03 '16 at 07:25
  • Is it possible to think it visually: P matrix acts on x (which is actually PR vector) but it doesnt change (since it is an eigenvector), that means PR vector doesnt change... How else to understand this intuition visually ? – mach Sep 03 '16 at 07:43
  • 1
    @mach Yes, that's basically it. A visual way (which I'm unsure is more helpful) is to think of a graph where the edges specify transition probabilities (that's *P*). Now you need to ask what are the node probabilities that would fit in with these edge probabilities. Markov chains are not the most intuitive things always. – Ami Tavory Sep 03 '16 at 07:46
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/122560/discussion-between-mach-and-ami-tavory). – mach Sep 03 '16 at 08:04