1. Corpus basics

In this section you'll get to know the basics of using a real corpus as a tool for language learning. This chapter shows you how a corpus works and what can be concluded from the search results. Finally, it introduces the four steps of corpus investigation.



How a corpus works. A corpus is a collection of texts and/or speech in electronic form. Each word of the texts is given a specific tag according to the part of speech it represents: for example research can be studied as a noun or as a verb. As you look up a certain word, the results are usually presented in the form of a KWIC-list, although other options are also possible. The KWIC-list shows you how the word you searched for is used in real sentences, and studying the sample provides you with a wide range of information considering that particular language item.


Two corpora, similar search interface

In this part of the Corpus Libarary you'll learn to use a corpus search tool through practical examples. The links next to Edna on the left take you directly to the British National Corpus (BNC, 1980's-1993) of 100 million words and to the Corpus of Contemporary American English (COCA, updated twice a year). Notice that the examples, instructions and screenshots are taken from the BNC, but they apply to the COCA as well, as the interface is practically the same.

Before going to the corpus you may want to take a look at a BNC screenshot with explanations at the Corpus Library Help Centre. Familiarise yourself also with the information, explanations and support that are integrated in the corpus tool. As you enter the site, bottom right you see an introduction. Read the text and just click on any of the links to search the corpus. Feel free to modify the search to look for things that might be of interest to you. Notice also that the American corpus (COCA) also provides a Brief tour for non-linguists found under the More information... drop-down menu.



WARM-UP. When just one look can speak volumes

To get familiar with the BYU-interface, let’s start by a simple search. Please read the instructions carefully, and then go to the BNC/COCA through a link on the left.

Try it yourself!

  • Type the word look in the search string field titled WORD(S). Press enter or click the search button.
  • In the upper half of the righ-hand frame you get a list of the findings, in this case just one item. Click on the item to open a KWIC-list of 100 sample sentences in the lower frame. You can get more results by clicking the link at the top of the KWIC-list: examine at least the first 200 of the search results.
  • Notice also that by clicking the item number in the KWIC-list (the first column on the left) you get the expanded context (a larger piece of text and references) for the phrase in question.
  • Now observe the sample sentences and answer the questions below.
  • When ready with your answers, click Bernie the Owl on the right for feedback.

Type your answers under each question in the text field below:





Was that it? The answer is no, this is not all about corpora. The purpose of this exercise was to hit it off with corpus searches and illustrate the wide range of information a simple search with a single word can bring. By looking into the search results instead of just looking at them you can really get to know a word. In a way a corpus offers a shortcut to the kind of knowledge that native speakers have on their language.

 

The BNC/COCA functions

The corpus interface we are about to use enables you to work on the English language in several ways. With BNC and COCA you can

  • search by word, phrase, part of speech ( e.g. adjectives, prepositions) and lemma (e.g. all forms of be: am, are, were, being)
  • find and compare synonyms of a given word
  • find words that collocate (group together, are used side by side)
  • explore the usage (context, genre, collocates) of a word/expression
  • compare the use of words and their collocates across time periods and genres
  • find words that stem from a specific word, wordfamilies (e.g. conclude; inconclusive, conclusion)
  • explore a genre for its specific features (e.g. typical verbs in Academic English)

All these functions and possibilities serve the same end: researching the language to better understand the meaning, usage and context of words.

 

The four steps in corpus investigation

Before proceeding to more advanced search possibilities, a short discussion is needed on the actual process of corpus investigation for the purpose of language learning. The process can be seen as consisting of four steps:

Step 1. Formulating the question.
At its simplest the question derives naturally from the problem you are wanting to solve, for example "Should I write nonetheless or none the less".

Step 2. Devising a search strategy.
You have to create a search string and adjust the search options that best work to extract the essential information from the corpus.

Step 3. Observing the examples.
Among the samples produced by your search you have to be able to discriminate between the relevant and the irrelevant ones.

Step 4. Drawing conclusions.
What do the samples selected in the previous step tell you? At this stage it is important not to jump into conclusions but to base your case well on sound evidence and logic.

In the Corpus Library you get help in these steps through the exercise instructions and feedback, especially in the first chapters. The tasks become more independent and less structured as you move on. If you want to learn more details about the steps in corpus investigation, click here.

 

Now that you know the principles, continue to the next section and try out Simple searches.


To the bookshelf