3. Google is good but not perfect

The previous chapters gave you an idea about how you can benefit from Google in language learning, and presented some of the search options you may find useful in the future. This chapter sums up the benefits of using Google as a corpus tool, but also discusses its risks and limitations.

A. Why it's good to google for language learning

The advantages of googling. To begin with, Google offers you a quick and easy way of checking your language problems. It's only a mouse click away. Second, the Internet is larger than any other corpus, and it's constantly growing: you'll be sure to get hits for new words and latest language trends that you won't find in other corpora. Third, Google provides results also from discussion forums and blogs, which come very close to spoken language. You may learn things you perhaps wouldn't from traditional sources that tend to focus on texts.

Try it yourself!
Try a search for words you won't find in a dictionary just yet. Think of an interesting word or expression that you have seen or heard somewhere. It may be a line from a sitcom, lyrics of a song, or an item from an article in a newspaper or a magazine. Go to Google and look it up.

If you are short of ideas, you can try a search with the following, relatively recent additions to the English vocabulary:

  • "eye candy"
  • "jump the shark"
  • retrosexual

B. Issues concerning the use of the Internet as a corpus

Take a look at the following examples found on the Internet using Google as a tool.

  • Concerning of next pieces, explanations will be add later.
  • Notice of Decision Details Concerning of Stock Option Issue
  • Articles concerning about scales and the weighing industry
  • I am concerning about the issue of global warming, I am a Chinese.
  • I would agree that there are many people in Ukraine who are more concerning about eating and getting proper medical care than they are about religion.
  • For further information concerning on Himos...
  • Our products have helped our customers to boost their work concerning on network maintenance.
  • The subjects could be concerning on some of following topics.
Search results:
concerning of 69,000; concerning about 96,400; concerning on 156,000.

Now compare the sentences above with the sample below, which has been extracted from the British National Corpus (BNC). You will learn to use the BNC yourself as you proceed to the next section in the Corpus Library.

  • bringing with them a load of information concerning the course of the war
  • He turned down several questions concerning his past activities
  • The Intimate Machine raises many issues concerning the social impact of computers
  • light may be shed on the problems concerning the modes of production
  • there is anecdotal evidence concerning illiterate traders in other cultures who seem to have remarkable memories
Search results:
concerning on/about/of
0 hits ; concerning 3353.

Think about the two sets of sample sentences above and consider the questions below. Then click Bernie the Owl on the right for more information.

  • What do the samples tell about the way concerning is used?
  • What could explain the difference between the search results?
  • In the light of the examples, what can be said about the reliability of the Internet as a source for English language?

C. Why use a real corpus instead of sticking to Google?

This section on Google presented some ideas on how the Internet search engines can be used for the benefit of your English. In sum, Google is easily available and fast to use, and provides multiple search options. It accesses the Internet, an enormous body of language that certainly is up-to-date as new items are added every minute world wide. You can use it as a dictionary, for your grammar, and for idioms, fixed expressions and collocations. Why would you like to use a real, more complicated corpus instead?

Apart form these undeniable benefits, as a corpus tool Google is not very practical. For example, you can only search for specific words or phrases, not word categories or inflected forms. As you probably realised just above, search results are not always reliable because the texts on the Internet are produced by people of a diversity of nationalities, levels of education and knowledge of English. Although the search results can be limited to certain domains (e.g. .edu, ac.uk, Google Scholar), you virtually have no control over the register. In other words you cannot choose to search within spoken language, or newspaper articles, or academic presentations. Furthermore, the way in which search results are presented is not ideal for language learning purposes.

We are not saying that you shouldn't use Google - you should, in fact, as it can help you in many ways. But, for some things you may want to use a tool that is specifically designed for language learning purposes. The "real" corpora can be used for almost all the same functions as Google, but in a more accurate and precise manner. They do all the things mentioned above that Google can't do:

  • You can look up single words and expressions, but also word categories, inflected forms, synonyms and words derived form the same root (for example actual, actually, actualizing).
  • The language in the corpora is mainly produced by people who speak English as their first language.
  • You can limit the search to a certain register - a function that can become handy as you will learn later
  • The search results are presented in the form of a KWIC list, which makes it easy to get the big picture on the use of the language items you searched for.
  • As a bonus, you will be able to obtain frequency statistics and lists of collocates directly, you don't have to construct them by yourself the way you did with depend on/of, for example.

As you go on working in the Corpus Library, you will learn to use two of the actual corpus tools. The next section familiarises you with a search tool that uses the British National Corpus (BNC) and the Corpus of Contemporary American English (COCA). The third section makes use of MICASE, a corpus of academic spoken language collected at the University of Michigan.

Before moving on to the next section, you may want to take a short introduction to the two corpus tools in the Corpus Library Help Centre.

That was all about Google for now. Move on to the next section: BYU: BNC & COCA.

To the bookshelf