Thursday, January 12, 2006

Vocabulary

How large is your vocabulary? Is it voluminous? Or capacious? Or commodious?

Actually this may not be an easy question to answer. Linguists identify four different sets of words that you know based on how each word is used or received. You have a listening vocabulary, a speaking vocabulary, a reading vocabulary, and a writing vocabulary. I'm guessing that my reading vocabulary is the largest and my speaking vocabulary is the smallest, although I imagine it could be different for some folks.

Obviously most people can understand more words they hear and read than they can actually use in a sentence. Some people refer to this as your active vocabulary (what you can use), versus your passive vocabulary (what you can understand). Vocabulary tests almost always measure what you can recognize from multiple choices, which is quite a bit different than what you can articulate out of the clear blue sky.

Once we pick the vocabulary we're talking about, however, we still can't answer the question of how big a vocabulary can be because we may not yet be sure what a word is.

Seriously?

Yes, the strictest definition of a word would be a string of letters with space on either side. But what about a word's various forms like go, going, gone, goes, and went? Are each of these "words" or are they just forms of the main word go?

Linguists call that base form a lemma. In English, the ratio of total words to lemmas is a fairly low 1.6:1 (a 1000 word document only represents about 625 different lemmas). This would mean that Shakespeare's works, though they included 29,066 word forms, probably only used about 18,000 lemmas. This is based on something called the "Brown Corpus," a million word sampling of words from English documents that gives us a fair idea of how frequently words are actually used. And some words are used a lot more often.

We actually use just a handful of words to do most of our communication. Almost half (49.6%) of the Brown Corpus is just the repetition of the 100 most frequently used lemmas, while 58% of the lemmas are used only one time out of a million words. The few common words are so important that, according to this, if you want to understand 80% of everything that's written, you only need to know 2854 words, representing only 2124 lemmas.

That's right, you could function with only 3000 words. Well… almost. Those rare, low frequency words are key to comprehension. If you didn't understand the last word of the previous sentence, that sentence was worthless to you. Therefore the more words you know, the better off you are.

So even though the modern English language might contain about 170,000 lemmas (about 250,000 words total, maybe half a million counting technical terms), the average person only uses about 5000 words on a daily basis.

And though they may know more words than that, they may never find occasion to use them.

By the way, if you'd like to test your word power, I recommend www.wordsmart.com's wordsmart challenge. To improve your vocabulary you can buy a word-of-the-day calendar or use Reader's Digest's "Word Power", but the single best way to learn is to use the dictionary. When you encounter a new term, stop and look it up. You can't get a better return on your investment of time than this simple discipline. And this way you automatically insure that you'll only be learning words that you'll actually encounter in life.

No comments: