Home > Statistic of the Week > Archive List > Detail

<< Prev 9/21/2014 Next >>

The The The The The The .....

Try this experiment....

  • Pick up ten random books (variety)...locate random text selections (about 100 words long) in each book, and tabulate a count of each word used. With all data combined, what are the most common written words (i.e. Top Ten)?
  • Repeat the experiment, this time using recordings of someone talking...such as a newscaster, talk show, conversation with friends, etc. Again, tabulate a count of each word used. With all data combined, what are the most common spoken words (i.e. Top Ten)?
First, do the two Top Ten lists overlap...or do they differ considerably?

Compare your results with this Top Ten data from the British National Corpus, after its survey of a wide range of texts using almost 90 million words (W = written, S = Spoken):

  1. the (W) vs. be (S)
  2. of (W) vs. the (S)
  3. and (W) vs. I (S)
  4. a (W) vs. you (S)
  5. in (W) vs. and (S)
  6. to (W) vs. it (S)
  7. is (W) vs. have (S)
  8. was (W) vs. a (S)
  9. it (W) vs. not (S)
  10. for (W) vs. do (S)
Why do you think some of words in the lists differ?

Do you agree with the corresponding claim that "In written English, one word in every 16 is the? Explain.

Source: Adapted from R. Ash's Top 10 of Everything: 2008