This website uses cookies. A cookie is a small piece of code that gives your computer a unique identity, but it does not contain any information that allows us to identify you personally. For more information on how TESOL International Association uses cookies, please read our privacy policy. Most browsers automatically accept cookies, but if you prefer, you can opt out by changing your browser settings.


Corpus What?

The relatively new field of corpus linguistics has much to offer language teachers, says Federica Barbieri. See Luciana Diniz and Kate Moran's Portal article, "Corpus-Based Tools for Efficient Writing Instruction," Essential Teacher, September 2005 (pp. 36-39).

Over the past few decades, the field of applied linguistics has been enriched by a new way of doing linguistic analysis: corpus linguistics.

What Is Corpus Linguistics?

In a nutshell, corpus linguistics is an approach to the study of language that relies on the use of computer-assisted techniques to analyze large, principled databases of naturally occurring language (corpora). Corpus-based analysis is interested in the language actually used in naturally occurring texts rather than in what is theoretically possible.

To obtain descriptions of language use that represent the way language behaves in real life, corpus linguists base their analysis on large collections of texts stored on a computer.

What Should a Corpus Look Like?

First, a corpus should be large. If it is too small, it will not include enough examples of the feature you are trying to study. How large is large enough depends on the targeted linguistic features and on the goals of the analysis.

Size, however, isn't everything. A corpus must be representative of the language variety (i.e., its genre, register, and dialect) that it is supposed to describe. For example, a corpus of research articles would not be suitable for the study of the use of tag questions in casual conversation.

What Does Corpus Linguistics Have to Do with Language Teaching?

Corpus linguistics has been applied to three domains of language teaching practice and research: (1) the development of instructional materials, (2) the analysis of learner language, and (3) classroom instruction.

Materials Development

By providing information about the frequency of use of linguistic features, corpus linguistics can inform decisions about priorities in EFL/ESL teaching materials (Conrad 2000). Numerous studies have compared corpus findings with EFL/ESL textbook presentations of particular linguistic features. These studies challenge textbook descriptions and suggest that teaching materials design could greatly benefit from the empirical information about language use provided by corpus linguistics.

Analysis of Learner Language

Since the early 1990s, corpus linguists have been compiling corpora of (mostly written) learner language with a view to analyzing these corpora to investigate second language acquisition and interlanguage. Recently, corpus linguists have used learner corpora in other areas, including classroom teaching, language teaching materials, and language teacher training.

Corpora in the Classroom

One of the best known uses of corpora in the language classroom is concordancing, launched by Tim Johns in the early 1980s (see Johns 1986). Aconcordancer is simply a computer program that searches a corpus for a selected word or phrase. The program then presents every instance of that word or phrase occurring in the corpus in key-word-in-context (KWIC) format--that is, in the centre of the screen, surrounded by the words that come before and after the searched word. By looking at corpus instances of the searched word or phrase in the form of concordance lines, you can observe patterns of use that would go unnoticed otherwise. An excellent introduction to this method can be found in Reading Concordances (Sinclair 2003).

Corpus linguists have been exploring other ways of using corpora in the classroom. Bernardini (2000), for example, promotes discovery learning, a development of the data-driven approach to language learning promoted by Tim Johns. In discovery learning, learners browse large corpora in open-ended, exploratory ways.


Through enhanced materials development, analysis of learner language, and discovery learning, corpus linguistics can enrich language teaching and research. The following resources are good places to begin exploring corpus linguistics.

Introductions to Corpus Linguistics

Biber, D., S. Conrad, and R. Reppen. 1998. Corpus linguistics: Investigating language structure and use. Cambridge: Cambridge University Press.

Hunston, S. 2002. Corpora in applied linguistics. Cambridge: Cambridge University Press.

Corpus Linguistics and Language Teaching

Aston, G., ed. 2001. Learning with corpora. Bologna, Italy: Cooperativa Libraria Universitaria Editrice Bologna.

Sinclair, J. M., Ed. 2004. How to use corpora in language teaching. Amsterdam: John Benjamins.

Wichmann, A., S. Fligestone, T. McEnery, and G. Knowles, eds. 1997. Teaching and language corpora. New York: Longman.

Concordancing Software

Barlow, M. 2000. MonoConc Pro 2.0. Houston, TX: Athelstan.

Scott, M. 2000. Wordsmith Tools (Version 3.0). Oxford: Oxford University Press.

Online Corpora

Davies, M. VIEW: Variation in English words and phrases. (interface for searching the British National Corpus)

University of Michigan, English Language Institute. Michigan Corpus of Academic Spoken English (MiCASE).

Other Resources

Johns, T. 2000. Tim Johns data-driven learning page. Click here to vew learning page.

Zanettin, F. 2005. Corpus linguistics, translation, and language learning.


Bernardini, S. 2000. Systematising serendipity: Proposals for concordancing large corpora with language learners. In Rethinking language pedagogy from a corpus perspective, ed. L. Burnard and T. McEnery, 225-234. Frankfurt am Main: Peter Lang.

Conrad, S. 2000. Will corpus linguistics revolutionize grammar teaching in the 21st century? TESOL Quarterly 34:548-560.

Johns, T. 1986. Micro-Concord: A language learner's research tool. System 14 (2): 151-162.

Sinclair, J. 2003. Reading concordances. London: Longman.

Federica Barbieri ( is a PhD student in the applied linguistics program at Northern Arizona University, in the United States.