1. Corpus basics: simple searches

In this chapter you'll get to know the basics of using a real corpus as a tool for language learning. This section shows you how the corpus can be used and what can be concluded from the search results. You'll also learn how to make easy searches with individual words and expressions for the benefit of your grammar and vocabulary.



How a corpus works. A corpus is a collection of texts and/or speech in electronic form. Each word of the texts is given a specific tag according to the part of speech it represents: for example research can be studied as a noun or as a verb. As you look up a certain word, the results are usually presented in the form of a KWIC-list, although other options are also possible. The KWIC-list shows you how the word you searched for is used in real sentences, and studying the sample provides you with a wide range of information considering that particular language item.



WARM-UP. When just one look can speak volumes

To get familiar with the VIEW, let’s start by a simple search. Please read the instructions carefully, and then go to the VIEW by clicking the button hanging under Edna on the left.

Before going to the corpus you may want to take a look at a screenshot with explanations at the Corpus Library Help Center.

The VIEW site is built with three frames. When you enter the site, the lower frame of the main part presents an introduction titled Overview. Read the text and click on the links to find out more, if you like.


Try it yourself!

  • The search string box is placed in the upper left corner of the VIEW site. Type the word look in the search string field. Press enter or click the search button.
  • In the upper half of the main frame you get a list of the findings, in this case just one item. Click on the item to open a KWIC-list of 100 sample sentences in the lower frame. You can get more results by clicking the link at the bottom of the KWIC-list: examine at least the first 200 of the search results.
  • Notice also that by clicking the item number in the KWIC-list (the first column on the left) you get the expanded context (a larger piece of text and oadditional information)for the phrase in question.
  • Now observe the sample sentences and answer the questions below.
  • When ready with your answers, click Bernie the Owl on the right for feedback.

Type your answers under each question in the text field below:





Was that it? You may now be thinking: "So? Is this what corpora are about?" The answer is no, this is not all. The purpose of this exercise was to hit it off with corpus searches and illustrate the wide range of information a simple search with a single word can bring. By looking into the search results instead of just looking at them you can really get to know a word. In a way a corpus offers a shortcut to the kind of knowledge that native speakers have on their language.


 

Three basic suggestions for using a corpus

After completing the warm-up exercise you are probably eager to find out how to really benefit from a corpus in your language studies. Here you have three basic suggestions for the use of the VIEW interface. Do the corresponding exercises, and you'll be well on the way of mastering the corpus.


A. You may know a word but do you know how to use it?

What do you do if you have a certain word or phrase in mind that you find interesting or would like to use in your presentation or essay, but you are a bit uncertain of how it can be used? A dictionary helps you in understanding the meaning, but often does not offer enough examples on the usage of a specific word or expression. This is exactly the type of situation where a corpus can be helpful.


Try it yourself!
Let's take a look at a word that is commonly used in academic language and still causes some confusion in its use: research.  Many of the Finnish students use  the word in singular, whereas some use the plural form researches. With the help of the VIEW interface we will try and find out whether one of these uses is more accurate than the other.  

Again, read the instructions first and then go to the VIEW.

  • In the VIEW interface, type the word research in the search box.
  • Pay attention to the See POS tags –option at the bottom and select yes. If this is set to no, all of the uses of the word will be grouped together – POS tagging enables us to focus on the word in its certain position.
  • You can check your search settings by clicking the screenshot on the right. When they are correct, press enter or click the search button in the VIEW.

Now you will be able to see the search results in the upper part of the main frame. You should see a list that tells you how the word research is distributed in several positions: it can be a noun (NN1), a verb (VVI& VVB), or the system may be uncertain of into which group the word belongs (VVI-NN1). In this case, we are interested in research as a noun. Note the two figures concerning research (NN1) on the first line: the number of tokens (center) and the amount per million (last column).

Type the two figures in the assigned places in the text field below.

Using the same settings, perform a similar search with the word researches. Add also these numbers into the table.

(NN1) Tokens in reg.1 Per mil. in reg 1
Research
Researches


Now you have two sets of numbers, which should answer the intitial question. Consider the questions below. When ready, click Bernie the Owl for information.

  • Is it more common to use research in singular or in plural?
  • What can you find out about the use of the two forms by examining the context in which they occur ?





Good to know.
When a corpus is created, each word is given a POS-tag according to the part of speech it represents, for example verb, adjective or preposition. The tagging system in this BNC-corpus is quite sophisticated: you can encounter POS-codes such as VBB (present tense forms of the verb be (are, am), or PNX (reflexive pronouns such as myself). If you are interested, you can take a look at the POS-codes here. Selecting the See POS-tags -option as was done above is just one of the possibilities, you'll learn more about the use of POS-codes later.

B. Knowing how to use an expression

Corpus searches are not limited only to single words, but also strings of words and expressions can be looked up. Unlike in Google, there's no need for quotation marks, writing the string in the search box suffices.

Search tip!
Use the Reset button to clear the search settings after each search, at least when you don't want to use exactly the same settings. Do it this time also, before starting with the task below.

Try it yourself!
Try to answer the following two questions using VIEW as a tool. The search strings you should use are in italics. Third, look up a text string of your choice:

  • What longer phrase is into account most commonly part of?
  • How many different meanings for come around can you find?
  • Choose an expression or a string of words that you are interested in, and look it up in the corpus.


C. Using the corpus for your grammar

Just like Google, a corpus can also be a practical tool for checking your grammar. For example, you can check whether to use a plural or a singular verb form with words such as everybody, data and staff, or you can see how the meaning of a verb changes according to the preposition it takes as you did in the warmup exercise with the verb look. These are only a few examples, of course, and as you become familiar with the corpus you will realise its full potential.

Try it yourself!
In the exercise below you see expressions and phrasal verbs that often cause confusion among learners of English. You are asked to choose the right alternative to fill the blanks in the sample sentences. All the answers can be found with the help of VIEW: start by typing the verb in question in the search string box and add the prep.ALL POS-tag from the drop down menu. Feedback and search tips will appear on the right when you have provided your answer.


1. Participants were asked to fill ___ the questionnaire.

Choice 1 in
Choice 2 up
Choice 3 no preposition
Choice 4 out

2. We should discuss ___ this matter more thoroughly, don't you think?

Choice 1 about
Choice 2 no preposition
Choice 3 with
Choice 4 on

3. I would like you to comment ___ the problem.

Choice 1 no preposition
Choice 2 about
Choice 3 to
Choice 4 on

4. What are the main differences ____US English and British English?

Choice 1 in
Choice 2 at
Choice 3 to
Choice 4 between

5. What is the reason ____ all this happening?

Choice 1 for
Choice 2 with
Choice 3 of
Choice 4 in


Good to know.
The POS-tags can be used in many ways. First, they can be used to replace a given part of speech (word class) in a phrase or expression: for instance just above you looked up verb+preposition combinations with strings such as fill [pr*]. Second, POS-tags can also be used as search terms themselves: for example string [nn*] [pr*] will get you a list of common noun+preposition combinations. Third, POS-tags can be attached to the search word with a dot to define the word class of the search word. In the following chapter you will search with the string research.[nn*], which lists the uses of research as a noun and excludes the cases when it is used as a verb.

The four steps in corpus investigation

Finally, before proceeding to more advanced search possibilities, a short discussion is needed on the actual process of corpus investigation. The process can be seen as consisting of four steps:

Step 1. Formulating the question.
At its simplest the question derives naturally from the problem you are wanting to solve, for example "Should I write nonetheless or none the less".

Step 2. Devising a search strategy.
Basically this stage means that you have to create a search string and adjust the search options so that you'll be able to extract the essential information from the corpus.

Step 3. Observing the examples.
Among the samples brought to you by your search you have to be able to discriminate between the relevant and irrelevant ones.

Step 4. Drawing conclusions.
What do the samples selected in the previous step tell you? At this stage it is important not to jump into conclusions but to base your case well.

In the Corpus Library you get help in these steps through the exercise instructions and feedback, especially in the first chapters. The tasks become more independent and less structured as you move on. If you want to learn more details about the steps in corpus investigation, click here.

 

Now that you know the basics, move on to the next section to learn about Words that fit together.


To the bookshelf