The Math that Drives Machine Learning

My Passion Projects

Cognitive Harmony Inc.

Founding President of nonprofit Cognitive Harmony Inc.

Cognitive Harmony explores the Primordial prevention methods which aim to prevent or delay the onset of conditions like anxiety and depression in young adults.

At Cognitive Harmony we believe that the seeds of future anxieties and depression stem from the fact that we blend into groups or society to the extent that we lose our individuality.

Making us Aware of Ourselves – Realizing that OUR existence is NOT because of society/group but the group’s existence is because of US.

Member of:

Association for Computational Linguistics (ACL)

The Association for Computational Linguistics (ACL) is the international scientific and professional society for people working on problems involving natural language and computation. It was founded in 1962, originally named the Association for Machine Translation and Computational Linguistics (AMTCL). It became the ACL in 1968.

Association for the Advancement of Artificial Intelligence (AAAI)

The Association for the Advancement of Artificial Intelligence (AAAI) is an international scientific society devoted to promoting research in, and responsible use of, artificial intelligence. AAAI also aims to increase public understanding of artificial intelligence (AI), improve the teaching and training of AI practitioners, and provide guidance for research planners and funders concerning the importance and potential of current AI developments and future directions.

ACL Anthology

The ACL Anthology currently hosts publications and papers on the study of computational linguistics and natural language processing.

Computational Linguistics Journal

The Computational Linguistics journal is the primary archival forum for research on computational linguistics and natural language processing. The journal, sponsored by the Association for Computational Linguistics, has been published for the ACL by MIT Press since 1988, and has been Open Access since the beginning of 2009.

The Math that Drives Machine Learning

How do computers find the line of best fit. Through math of course. We explore the fundamentals of linear regression by deriving the line of best fit from data points using concepts in linear algebra. We find the intuition that drives this proof while developing an understanding of its concepts.

Machine learning is growing ever more prominent in society with the implementation of various models like support vector machines, neural networks, random forests, and cluster analysis. All these models are dependent on a foundational understanding of mathematics like linear algebra and calculus. The most simple but widely used model is linear regression which takes a collection of data points of n-dimensional data and finds the closest linear relationship between them. But how does this work? Does it just loop through all the possible linear equations and identify which best fits it? Of course, this would be impractical as there are an infinite amount of linear equations and the algorithm would take forever and may throw a stack overflow exception or timeout error.

Immediately, we can turn to mathematics and its unique properties to optimize this problem. Assuming the data contains different \(\mathrm{x}\) values for 2-dimensional data points (x,y), we can use linear algebra concepts to find the line that best fits the data points. First, we will start off by organizing our \(\mathrm{x}\) values into the \(\mathbf{X}\) vector and the \(\mathrm{y}\) values into the \(\mathbf{Y}\) vector, ensuring that each position of \(\mathbf{Y}\) is the same position of the corresponding \(\mathrm{x}\) value in \(\mathbf{X}\). We could then find the correlation coefficient \(r\) by finding the cosine of the angle made between \(\mathbf{X}\) and \(\mathbf{Y}\) with the formula: \(r=\cos \Theta=\frac{X \cdot Y}{\|X|\||Y||}\) where \(\mathbf{X}\) and \(\mathbf{Y}\) are the vectors. Though this step is not necessary to find the best-fit line, it provides with useful information that indicates the strength and direction of the relationship between the \(\mathrm{x}\) and \(\mathrm{y}\) values. From there we can create a plane \(\mathrm{P}\) that consists of infinite possible combinations of y values that create a perfect line given \(\mathbf{X}\). Using basic knowledge of linear equations, we know that a line can be horizontal so the \(\mathbf{1}\) vector (in which all the elements are 1) must lie in P. Thus, \(\mathrm{P}\) consists of all possible combinations of \(y\) values is spanned by \(\mathbf{X}\) and \(\mathbf{1}\), where each vector within \(\mathrm{P}\) is a linear combination of \(\mathbf{X}\) and \(\mathbf{1}\) meaning that there is a linear equation that perfectly fits \(\mathbf{X}\) and that specific combination of \(y\) values. However, most of the time in reality, it would not be possible to connect the coordinates with a singular straight line. For this reason, we have to find the vector in \(\mathrm{P}\) that is closest to \(\mathbf{Y}\). Though it’s possible to incorporate calculus to minimize the distance between \(\mathrm{P}\) and \(\mathbf{Y}\), we can just use our basic knowledge that the shortest distance between two objects is a line. So in this case, we can find a vector on \(\mathbf{P}\) that lies in the shadow cast by \(\mathbf{Y}\) onto \(\mathrm{P}\). Here, we can use a projection to accomplish just this. However to find \(\mathrm{Proj}_{\mathrm{P}}(\mathbf{Y})\) we must find an orthogonal basis to P. To keep things simple, we’ll just find the vector \(\hat{X}\) that lies in \(\mathrm{P}\) and is orthogonal (or perpendicular) to \(\mathbf{1}\). To do this, we just solve \(\hat{X}=\mathbf{X}\) – \(\mathrm{Proj}_{1}(\mathbf{X})\) which gives us \(\hat{X}=\mathbf{X}-\underline{x} \mathbf{1}\) where \(\underline{x}\) is the mean of all the \(x\) values. Now we can \(\mathrm{Projp}(\mathbf{Y})\) :

\(\mathrm{Proj}_{\mathrm{P}}(\mathbf{Y})=\mathrm{Proj}_{\hat{X}}(Y)+\mathrm{Proj}_{l}(Y)=\mathrm{Proj}_{\hat{X}}(Y)+\underline{y} 1=c \hat{X}+\underline{y} 1\) where \(\mathrm{c}\) is some constant that is known after computing \(\frac{\hat{X} \cdot Y}{\hat{X} \cdot \hat{X}}\). Since \(\hat{X}=\mathbf{X}-\underline{x} \mathbf{1}, \mathrm{Proj}(\mathbf{Y})=\mathrm{c}(\mathbf{X}-\underline{x} \mathbf{1})+\underline{y} 1=\mathrm{c} \mathbf{X}+(\underline{y}-\underline{c} \underline{\mathbf{x}}) \mathbf{1}\) If we substitute \(\mathrm{c}\) and \((\underline{y}-\mathrm{c} \underline{x})\) with \(\mathrm{m}\) and \(\mathrm{b}\) respectively, we get the linear combination representing the best-fit line: \(\mathrm{mX}+\mathrm{b} \mathbf{1}\). In algebra, this translates to \(\mathrm{y}=\mathrm{mx}+\mathrm{b}\).

PUBLISHED BY

Shlok Bhattacharya

View all posts by Shlok Bhattacharya

Friday October 20, 2023

Cognitive Harmony Inc.

Founding President of nonprofit Cognitive Harmony Inc.

Cognitive Harmony explores the Primordial prevention methods which aim to prevent or delay the onset of conditions like anxiety and depression in young adults.

At Cognitive Harmony we believe that the seeds of future anxieties and depression stem from the fact that we blend into groups or society to the extent that we lose our individuality.

Making us Aware of Ourselves – Realizing that OUR existence is NOT because of society/group but the group’s existence is because of US.

Member of:

Categories

Featured Post

A little bit about me…

My Passion Projects

Cognitive Harmony Inc.

Member of:

Association for Computational Linguistics (ACL)

Association for the Advancement of Artificial Intelligence (AAAI)

ACL Anthology

Computational Linguistics Journal

Looking for something specific?

Follow My Blog

PUBLISHED BY

Cognitive Harmony Inc.

Association for Computational Linguistics (ACL)

Association for the Advancement of Artificial Intelligence (AAAI)

ACL Anthology

Computational Linguistics Journal

PUBLISHED BY

Related

Leave a Reply Cancel reply

Categories

Featured Post

A little bit about me…

My Passion Projects

Member of:

Looking for something specific?

Follow My Blog