0

I read somewhere that the added S matrix of 1/n elements together with the fudge factor 0.15 which Google uses is just not accurate and just comes to solve another problem.

On the other hand I have read somewhere else that it does have a meaning. And it is used for random jumps. We first ask whether a surfer wants to continue to click or not. So according to what I read the meaning is -85% continue to click -15% don't.

My question is... this is maybe good for first click. But how does this work in other iterations? How can anyone land at a random page? Isn't it the whole assumption of page rank that every one is linked to the other?

If I can just land on a page without coming from somewhere else then the ranking isn't accurate at all.

But most importantly I don't understand what does the added 1/n matrix mean? If I am at a page I can only click on clicks which I see. What does it mean to say that I can go somewhere else?

If they mean that I just Google search again then why don't call it a second chain? Why include it in the first ?

Also, is it 15% that I randomly jump or 15% that I stop surfing? (Or are they the same thing? )

And to my first question - is it a fudge inaccurate factor that is made to solve other problems or it does really mean something as said above and it IS a correct measurement to include it even by its own merit?

Dan Cornilescu
  • 39,470
  • 12
  • 57
  • 97
bilanush
  • 139
  • 8
  • I'm voting to close this question as off-topic because this is a computer science question, not a programming question. https://cs.stackexchange.com/ might be a better site for this question. – snakecharmerb Jun 19 '18 at 06:02

1 Answers1

1

"Random jumps" could correspond to lots of things:

  • Entering an address in URL bar
  • Visiting a "Favorite" link
  • Visiting a home page (or any one of the links on it!)
  • Visiting a link from a content aggregator / social media

People do actually do these things when browsing online; going to a random page in your index is a very crude approximation of this behavior.

If you're Google or some other entity with lots of surfing/tracking data, you can actually measure the probabilities people "jump into" particular websites to get a better model! The random-jump probabilities don't need to be totally uniform; they just need to be non-zero for every website.

The random-jumps is the simplest way to ensure the matrix/corresponding chain is Ergodic which makes it easier to analyze and guarantees convergence.

Curtis Fenner
  • 1,382
  • 1
  • 9
  • 18
  • Thanks! My question is tho. When do you take into account the possibility of someone stopping completely to surf? Also why is it called a fudge factor ? It is correct. I read that according to this some information is lost due to this fudge factor. According to your interptation it is indeed correct to include this factor. – bilanush Jun 19 '18 at 07:10
  • You don't consider people stopping, because that doesn't fit into Markov model. Stopping surfing is another reason for a random-jump, since they're going to start surfing again later (at a "random" page) – Curtis Fenner Jun 19 '18 at 13:49
  • The higher the "fudge-factor", the less the shape of the network matters. At a 99% rate of randomly jumping, the actual links barely matter; at 0.0001%, the random jumping barely matters (but you will might converge to something odd). – Curtis Fenner Jun 19 '18 at 13:50
  • OK. But it's not a fudge factor. It's real. I don't understand why ppl say it is inaccurate and information being lost. If there is a factor of percentage in which ppl jump randomly then it is indeed an accurate measurement. Google didn't 'cheat ' here. It's the correct equation. – bilanush Jun 19 '18 at 15:32
  • 1
    I quick note on _The random-jumps is the simplest way ..._ Ergodicity is one important aspect, otherwise your websites would decompose into disconnected sub-graphs. The other point is that a uniform random-jump factor keeps the _insanely large_ matrix sparse and you can still efficiently compute the largest eigenvector (the page rank) with the power method. Usually for larger fudge-factors this converges faster. So besides the interpretation there is a _computational / technical_ reason. – jhp Jun 21 '18 at 14:18