The four steps in corpus investigation  

Step 1. Formulating a question

In practice

This step simply means that to begin a corpus search you have to have the question you want to find the answer to. In general this step should not pose much problems - it may seem that the questions come naturally, for example "Which preposition should I use with 'qualified'?" However, sometimes this step can be complicated: it may be difficult to get a grab of the actual question when you are revising your own text, for instance, as the problematic areas are not always obvious.

In the Corpus Library the problems and questions are preformulated, so you do not have to worry about this stage in the beginning. However, as the basis of the actual search, this step should be considered to be the most important one.

Try to be precise and specific

Consider whether open questions would be better than the yes/no type
(e.g. "what comes after x?" instead of "does y come after x?")

Keep in mind both lexical and grammatical issues

Step 2. Devising a search strategy

In practice

Next, you have to know how to find what you are looking for. Devising your search strategy requires some understanding on how the corpus you intend to use works, what can be found in it and how. You also have to know certain things about how language is structured, for example how parts of speech are organised to phrases. Sometimes you may need to try several strategies or multiple searches to find what you are looking for.

In the Corpus Library you will be acquainted with several search strategies. In the examples the search string you should use is given to you and accompanied with guidance and tips. Later on, you can use the Library as a manual when running independent searches and devising your own strategies - pay attention to the Search tip and Good to know -boxes.

Keep in mind the possible search options when planning your strategy - you may use our Help Center and the help-options provided in BNC/COCA and MICASE

Consider which parts in a phrase are fixed, which variable. Use POS-tags or wildcards to replace the variable parts.

If you get many irrelevant results, restrict your search

Be prepared to revise your strategy when needed. Trials and errors are a good way to learn.

Step 3. Observing the examples

In practice

The sample sentences provided by the corpora are not always easy to interpret. You may get hundreds of examples, among which only a few may be relevant. On one hand, it is important to be able to discriminate between what is relevant and what is not, and to keep an open mind when viewing the results as you may be able to notice something you were not actually looking for.

The exercises in the Corpus Library are structured in a way that should gradually make it easier for you to interpret the search results and discern the essential. However, much depends on how you use your own observational and analytical skills.

Make sure to have understood the examples correctly and seek out the ones that the best match your target sentence

Try to set your own assumptions aside, as the words you are expecting to find may not be there, and vice versa

Try not to focus solely on the most frequent usage of a word, as the type you are looking for may be a less common case

Step 4. Drawing conclusions

In practice

You may have a large set of relevant examples, or just a few of them. Remember that what matters is the quality of your sample, not the quantity of the examples. However, if you get far more or far less search results than you expected, you should consider the significance of this. What does it tell you? What conclusions can be drawn?

Another point that should not be forgotten, is the type of conclusions that can be drawn. For instance, the absence of a certain combination doesn't indicate that this combination doesn't exist at all, but tells us that there is no evidence of this in the corpus.

In the Corpus Library feedback is given to you after each exercise. Use the feedback to verify your own findings and to develop your analytical and observational skills.

Even one example may be enough, if it is of good quality.

If you are surprised by a large/small number of examples, consider what this could mean. You may also need to revise your search strategy.

Remember to relate your conclusions to the initial question you formulated at Stage 1.


Adapted from: Claire Kennedy and Tiziana Miceli An evaluation of intermediate students' approaches to corpus investigation,
in Language Learning and Technology Vol. 5, No. 3, September 2001, pp. 77-90.