Site hosted by Angelfire.com: Build your free website today!

GOOGLE PAGE RANK

The web page http://pt.efactory.de.e-pagerank-algorithm.shtml gives a rather extended discussion of the PageRank algorithm. We quote two sections below:

(I)

The PageRank Algorithm

The original PageRank algorithm was described by Lawrence Page and Sergey Brin in several publications. It is given by

PR(A) = (1 - d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn))

where

So, first of all, we see that PageRank does not rank web sites as a whole, but is determined for each page individually. Further, the PageRank of page A is recursively defined by the PageRanks of those pages which link to page A.

The PageRank of pages Ti which link to page A do not influence the PageRank of page A uniformly. Within the PageRank algorithm, the PageRank of a page T is always weighted by the number of outbound links C(T) on page T. This means that the more outbound links a page T has, the less will page A benefit from a link to it on page T.

(II)

A Different Notation of the PageRank Algorithm

Lawrence Page and Sergey Brin have published two different versions of their PageRank algorithm in different papers. In the second version of the algorithm, the PageRank of a page A is given as

PR(A) = (1 - d)/N + d(PR(T1)/C(T1) + ... + Pr(Tn)/C(Tn))

where N is the total number of pages on the web. The second version of the algorithm, indeed, does not differ fundamentally from the first one. Regarding the Random Surfer nodel, the seocnd version's PageRank of a page is the actual probability of a surfer reaching that page after clicking on many links. The PageRanks then form a probability distribution over web pages, so that the sum of all pages' PageRanks will be one.

Anyone who takes the trouble to check details will notice that in neither case does the set of PageRanks form a probability distribution. Taking ΣiPR(Ti) = 1 as an equation, we see that the number of equations exceeds the number of variables by one. Page and Brin have possibly decided to mislead others in order to keep a trade secret. If this is the case, then there is no secret worth keeping.

An obvious interpolation between the two formulae is

PR(A) = (1 - d)t + d(PR(T1)/C(T1) + ... + PR(Tn)/C(Tn)).

Treating t as an extra variable, the number of equations is now equal to the number of variables, and the system can be solved.

The solution is left as an exercise to the reader.

See also http://stanford.edu/~epsalon/pagerank.pdf (referred to at http://en.wikipedia.org/wiki/PageRank) for whatever it is worth.

Created on 11 Jan 2010

----- SITE MAP -----