The Math Behind Google Search

My Passion Projects

Cognitive Harmony Inc.

Founding President of nonprofit Cognitive Harmony Inc.

Cognitive Harmony explores the Primordial prevention methods which aim to prevent or delay the onset of conditions like anxiety and depression in young adults.

At Cognitive Harmony we believe that the seeds of future anxieties and depression stem from the fact that we blend into groups or society to the extent that we lose our individuality.

Making us Aware of Ourselves – Realizing that OUR existence is NOT because of society/group but the group’s existence is because of US.

Member of:

Association for Computational Linguistics (ACL)

The Association for Computational Linguistics (ACL) is the international scientific and professional society for people working on problems involving natural language and computation. It was founded in 1962, originally named the Association for Machine Translation and Computational Linguistics (AMTCL). It became the ACL in 1968.

Association for the Advancement of Artificial Intelligence (AAAI)

The Association for the Advancement of Artificial Intelligence (AAAI) is an international scientific society devoted to promoting research in, and responsible use of, artificial intelligence. AAAI also aims to increase public understanding of artificial intelligence (AI), improve the teaching and training of AI practitioners, and provide guidance for research planners and funders concerning the importance and potential of current AI developments and future directions.

ACL Anthology

The ACL Anthology currently hosts publications and papers on the study of computational linguistics and natural language processing.

Computational Linguistics Journal

The Computational Linguistics journal is the primary archival forum for research on computational linguistics and natural language processing. The journal, sponsored by the Association for Computational Linguistics, has been published for the ACL by MIT Press since 1988, and has been Open Access since the beginning of 2009.

The Math Behind Google Search

Google Search is such an integral part of our lives that we overlook the underlying algorithm that supports its structure. We analyze the basic mathematical foundations of PageRank, namely Markov Chains. Through an example, we develop the intuitive nature of the algorithm and its elegance.

Have you ever wondered why some websites are placed over others on Google? It’s not just a random order but rather an order determined by Google’s PageRank algorithm. The basic concept of the algorithm is that more important webpages have more links on other webpages. In other words, the more links pointing to a specific page, the more important it is in the PageRank algorithm. However, the math underlying the algorithm is fascinating. It utilizes a fundamental concept in linear algebra called Markov Chains.

Like much of linear algebra, Markov Chains are closely related to statistics and probability. The basic idea is that each state within a Markov Chain is dependent on the previous state given a set of probabilities that correspond to each participant.

For example, imagine there are three islands: A, B, C. There are \( p_{A} \) birds on A, \( p_{B} \) on B, and \( p_{C} \) on C. Each year for A, 20% travel to B, 30% travel to C, and 50% stay on the island. For B, 10% travel to A, 80% travel to C, and 10% stay on B. Finally, for C, 5% travel to A, 90% travel to B, and the remaining 5% stay on C. What would be the final populations of each island after two years?

Since the population changes each year due to the migration we must use Markov Chains. We can model the probability matrix as the following equations:

\( p_{A+1} \) = 0.5\( p_{A} \) + 0.1\( p_{B} \) + 0.05\( p_{C} \)

\( p_{B+1} \) = 0.2\( p_{A} \) + 0.1\( p_{B} \) + 0.9\( p_{C} \)

\( p_{C+1} \) = 0.3\( p_{A} \) + 0.8\( p_{B} \) + 0.05\( p_{C} \)

If the populations of the islands are contained in a population vector \( p_{n} \) where the first, second, and third entries are \( p_{A} \), \( p_{B} \), and \( p_{C} \) respectively, and the probabilities in the probability square matrix A, then \( p_{n+1} \) is the matrix-vector product A\( p_{n} \). In this case, each row entry of A corresponds with the probability coefficients in the above equations. The elements of A are the following (\( a_{ij} \) = element at \( i^{th} \) row and \( j^{th} \) column): \( a_{11} \) = 0.5, \( a_{12} \) = 0.1, \( a_{13} \) = 0.05, \( a_{21} \) = 0.2, \( a_{22} \) = 0.1, \( a_{23} \) = 0.9, \( a_{31} \) = 0.3, \( a_{32} \) = 0.8, \( a_{33} \) = 0.05

To get the population of the first year, \( p_{1} =Ap_{0} \) where \( p_{0} \) represents the initial population vector. Then \( p_{2} =Ap_{1} \) or \( p_{2} =A(Ap_{0}) = (AA)p_{0}=A^{2}p_{0}\). In this case, the population vector after two years is \( p_{2} = [0.285p_{A} + 0.1p_{B} + 0.1175p_{C}; 0.39p_{A} + 0.79p_{B} + \)\( 0.145p_{C}; 0.325p_{A} + 0.15p_{B} + 0.7375p_{C}] \)

For n years after the initial population, \( p_{n} = A^{n}p_{0} \) where for large \( n, A^{n} \) will stabilize and converge to the eigenvectors A which can be determined by determining the eigenvalues from the Spectral Theorem and matrix decompositions. The Spectral Theorem makes computing large powers of A much simpler than by brute force.

In the context of PageRank, the entries of the probability matrix are the probabilities of a user moving from one page to another where the probabilities are determined by the quantity of functioning links. Using Markov Chains and the stabilized eigenvector over large iterations, the output is a probability vector that ranks the importance of each webpage.

PUBLISHED BY

Shlok Bhattacharya

View all posts by Shlok Bhattacharya

Wednesday December 20, 2023

Cognitive Harmony Inc.

Founding President of nonprofit Cognitive Harmony Inc.

Cognitive Harmony explores the Primordial prevention methods which aim to prevent or delay the onset of conditions like anxiety and depression in young adults.

At Cognitive Harmony we believe that the seeds of future anxieties and depression stem from the fact that we blend into groups or society to the extent that we lose our individuality.

Making us Aware of Ourselves – Realizing that OUR existence is NOT because of society/group but the group’s existence is because of US.

Member of:

Categories

Featured Post

A little bit about me…

My Passion Projects

Cognitive Harmony Inc.

Member of:

Association for Computational Linguistics (ACL)

Association for the Advancement of Artificial Intelligence (AAAI)

ACL Anthology

Computational Linguistics Journal

Looking for something specific?

Follow My Blog

PUBLISHED BY

Cognitive Harmony Inc.

Association for Computational Linguistics (ACL)

Association for the Advancement of Artificial Intelligence (AAAI)

ACL Anthology

Computational Linguistics Journal

PUBLISHED BY

Related

Leave a Reply Cancel reply

Categories

Featured Post

A little bit about me…

My Passion Projects

Member of:

Looking for something specific?

Follow My Blog