Google’s Page Rank (PR) Algorithm Cracked
Clinton Cimring, of Search Engine Partner, deduced that Google’s algorithm is:
PR(A) = (1-d) / N + d (PR(T1)/C(T1) + … + PR(Tn)/C(Tn))
Given that one may assume that an additional inbound link from page X increases the PageRank of page A by
d × PR(X) / C(X)
where PR(X) is the PageRank of page X and C(X) is the total number of its outbound links. But page A usually links to other pages itself. Thus, these pages get a PageRank benefit also. If these pages link back to page A, page A will have an even higher PageRank benefit from its additional inbound link.
The single effects of additional inbound links shall be illustrated by an example.
We regard a website consisting of four pages A, B, C and D which are linked to each other in circle. Without external inbound links to one of these pages, each of them obviously has a PageRank of 1. We now add a page X to our example, for which we presume a constant Pagerank PR(X) of 10. Further, page X links to page A by its only outbound link. Setting the damping factor d to 0.5, we get the following equations for the PageRank values of the single pages of our site:
PR(A) = 0.5 + 0.5 (PR(X) + PR(D)) = .5 + (5+PR(D))
PR(B) = 0.5 + 0.5 PR(A)
PR(C) = 0.5 + 0.5 PR(B)
PR(D) = 0.5 + 0.5 PR(C)
Since the total number of outbound links for each page is one, the outbound links do not need to be considered in the equations. Solving them gives us the following PageRank values:
PR(A) = 19/3 = 6.33
PR(B) = 11/3 = 3.67
PR(C) = 7/3 = 2.33
PR(D) = 5/3 = 1.67
We see that the initial effect of the additional inbound link of page A, which was given by
d × PR(X) / C(X) = 0,5 × 10 / 1 = 5
is passed on by the links on our site.
The Influence of the Damping Factor
The degree of PageRank propagation from one page to another by a link is primarily determined by the damping factor d. If we set d to 0.75 we get the following equations for our above example:
PR(A) = 0.25 + 0.75 (PR(X) + PR(D)) = 7.75 + 0.75 PR(D)
PR(B) = 0.25 + 0.75 PR(A)
PR(C) = 0.25 + 0.75 PR(B)
PR(D) = 0.25 + 0.75 PR(C)
Solving these equations gives us the following PageRank values:
PR(A) = 419/35 = 11.97
PR(B) = 323/35 = 9.23
PR(C) = 251/35 = 7.17
PR(D) = 197/35 = 5.63
First of all, we see that there is a significantly higher initial effect of additional inbound link for page A which is given by
d × PR(X) / C(X) = 0.75 × 10 / 1 = 7.5
This initial effect is then propagated even stronger by the links on our site. In this way, the PageRank of page A is almost twice as high at a damping factor of 0.75 than it is at a damping factor of 0.5. At a damping factor of 0.5 the PageRank of page A is almost four times superior to the PageRank of page D, while at a damping factor of 0.75 it is only a little more than twice as high. So, the higher the damping factor, the larger is the effect of an additional inbound link for the PageRank of the page that receives the link and the more evenly distributes PageRank over the other pages of a site.
The Actual Effect of Additional Inbound Links
At a damping factor of 0.5, the accumulated PageRank of all pages of our site is given by
PR(A) + PR(B) + PR(C) + PR(D) = 14
Hence, by a page with a PageRank of 10 linking to one page of our example site by its only outbound link, the accumulated PageRank of all pages of the site is increased by 10. (Before adding the link, each page has had a PageRank of 1.) At a damping factor of 0.75 the accumulated PageRank of all pages of the site is given by
PR(A) + PR(B) + PR(C) + PR(D) = 34
This time the accumulated PageRank increases by 30. The accumulated PageRank of all pages of a site always increases by
(d / (1-d)) × (PR(X) / C(X))
where X is a page additionally linking to one page of the site, PR(X) is its PageRank and C(X) its number of outbound links. The formula presented above is only valid, if the additional link points to a page within a closed system of pages, as, for instance, a website without outbound links to other sites. As far as the website has links pointing to external pages, the surplus for the site itself diminishes accordingly, because a part of the additional PageRank is propagated to external pages.
The justification of the above formula is given by Raph Levien and it is based on the Random Surfer Model. The walk length of the random surfer is an exponential distribution with a mean of (d/(1-d)). When the random surfer follows a link to a closed system of web pages, he visits on average (d/(1-d)) pages within that closed system. So, this much more PageRank of the linking page – weighted by the number of its outbound links – is distributed to the closed system.
For the actual PageRank calculations at Google, Lawrence Page and Sergey Brin claim to usually set the damping factor d to 0.85. Thereby, the boost for a closed system of web pages by an additional link from page X is given by
(0.85 / 0.15) × (PR(X) / C(X)) = 5.67 × (PR(X) / C(X))
So, inbound links have a far larger effect than one may assume in page rank.
The PageRank-1 Rule
Users of the Google Toolbar often notice that pages with a certain Toolbar PageRank have an inbound link from a page with a Toolbar PageRank which is higher by one. Some take this observation to doubt the validity of the PageRank algorithm presented here for the actual ranking methods of the Google search engine. It shall be shown, however, that the PageRank-1 rule complies with the PageRank algorithm.
Basically, the PageRank-1 rule proves the fundamental principle of PageRank. Web pages are important themselves if other important web pages link to them. It is not necessary for a page to have many inbound links to rank well. A single link from a high ranking page is sufficient.
To show the actual consistence of the PageRank-1 rule with the PageRank algorithm several factors have to be taken into consideration. First of all, the toolbar PageRank is a logarithmically scaled version of real PageRank values. If the PageRank value of one page is one higher than the PageRank value of another page in terms of Toolbar PageRank, than its real PageRank can at least be higher by an amount which equals the logarithmical basis for the scalation of Toolbar PageRank. If the logarithmical basis for the scalation is 6 and the toolbar PageRank of a linking Page is 5, then the real PageRank of the page which receives the link can be at least 6 times smaller to make that page still get a toolbar PageRank of 4.
However, the number of outbound links on the linking page thwarts the effect of the logarithmical basis, because the PageRank propagation from one page to another is divided by the number of outbound links on the linking page. But it has already been shown that the PageRank benefit by a link is higher than PageRank algorithm’s term d(PR(Ti)/C(Ti)) pretends. The reason is that the PageRank benefit for one page is further distributed to other pages within the site. If those pages link back as it usually happens, the PageRank benefit for the page which initially received the link is accordingly higher. If we assume that at a high damping factor the logarithmical basis for PageRank scalation is 6 and a page receives a PageRank benefit which is twice as high as the PageRank of the linking page divided by the number of its outbound links, the linking page could have at least 12 outbound links so that the Toolbar PageRank of the page receiving the link is still at most one lower than the toolbar PageRank of the linking page.
A number of 12 outbound links admittedly seems relatively small. But normally, if a page has an external inbound link, this is not the only one for that page. Most likely other pages link to that page and propagate PageRank to it. And if there are examples where a page receives a single link from another page and the PageRanks of both pages comply the PageRank-1 rule although the linking page has many outbound links, this is first of all an indication for the linking page’s toolbar PageRank being at the upper end of its scale. The linking page could be a “high” 5 and the page receiving the link could be a “low” 4. In this way, the linking page could have up to 72 outbound links. This number rises accordingly if we assume a higher logarithmical basis for the scalation of Toolbar PageRank. |