Artificial Intelligence Computational Linguistics Math

The Clash of Math and Linguistics – Zipf’s Law

Zipf’s Law provides a new world where math and linguistics are interconnected. This relationship is not seen often as seen to be completely opposite of one another. However, Zipf’s Law makes statistical analysis significantly easier by using linguistic concepts that may otherwise be taken for granted.

Apparently, the top 100 words used in the English language make up about 50% of what we say, hear and read. This really brings into fruition the lacking vocabulary humans carry despite there being thousands of words in the English dictionary.

This analysis of language is known as Zipf’s Law. In short, the law basically describes the relationship between the popularity of words and their frequency in any given language.

More specifically, the rank of the word is inversely proportional to the frequency of the word when compared to the most common word. For example, in the English language, “the” is the most common word. The next most common word is “of”, followed by “and”, and so on. Basically, on a frequency table, “of” shows up half as frequently as “the” while “and” one-third as frequently. If given a sample of 100 words, say 30 of them was the word “the”.

Based on the previous conclusion, “of” should appear 15 times while “and” should appear 10 times within the word set. Though this sample isn’t necessarily realistic, many novels follow this algorithm which begs the question: why is this happening? The only way to answer the question is to utilize linguistics and how humans adapt to language.

The reason Zipf’s Law is so important is that it is utilized in many fields of statistics such as population trends, neural networking, and even the predictions of wartime casualties. However, this provides an interesting relationship we don’t see often. This relationship connects linguistics and math which are typically polar opposites of each other, but Zipf’s Law seems to connect the two. Though Zipf’s Law seems to reside in the branch of linguistics, it actually connects two completely different sides of the spectrum, which provide an interesting relationship between linguistics and math.


Shlok Bhattacharya

Leave a comment

Share This:


Leave a Reply

Your email address will not be published. Required fields are marked *

x Logo: Shield Security
This Site Is Protected By
Shield Security