Zipf's Law
One of the oddest ideas in mathematical statistics is that of Zipf's Law. That is, a great number of kinds of data (especially from the physical and social sciences) can be approximated with a Zipfian distribution (a form of a discrete power law probability distribution). Clear, right!
In understandable form, Zipf's law states that given a text involving standard English words, the frequency of any word is inversely proportional to its rank in the frequency table. For example, the most frequent word occurs approximately twice as often as the second most frequent word, three times as often as the third most frequent word, etc. And, Zipf's law even holds for texts written in other languages.
George Kingsley Zipf, an American linguist, created his Law in the first half of the twentieth century. He discovered it by tabulating and graphing word frequencies in literary texts.
If you don't believe it, grab a text and start counting the frequencies of different words. It is necessary to do an entire text or repeated random sampling before the described frequency patterns appear (usually or crudely).
A few odd twists. First, mathematician Benoît Mandelbrot subsequently generalized Zipf's Law and included it in his development of fractal dimensions. The revised law is now known as the ZipfMandelbrot Law.
Second, it has been discovered that Zipf's Law holds true for diverse situationsthe numbers of visitors to web sites, the sizes of towns, the sizes of companies in a country, income rankings, and most recently, the linking together of packets in Linux software distributions .
The actual mathematics underlying Zipf's Law is not trivial. Also, no one really seems to know why it occurs...possibly being an artifact of a growth process.
NOTE: This news note was written in memory of the death this past week of Benoît Mandelbrot.
