Teaching Pragmatics Logo

Spoken English Corpora

Academic Corpus: The Michigan Corpus of Academic Spoken English (MICASE)

This corpus contains about 1.8 million words of transcribed speech from almost 200 hours of recordings from the University of Michigan. The transcribed speech events include lectures, classroom discussions, lab sections, seminars, advising sessions, and service encounters. The 61 transcripts are available for free download. 

Conversational corpora: Santa Barbara Corpus of Spoken American English.

This corpus is based on a large body of recordings of naturally occurring spoken interaction from all over the United States. Includes face-to-face conversation, telephone conversations, card games, food preparation, on-the-job talk, classroom lectures, sermons, story-telling, and town hall meetings. The Santa Barbara Corpus provides the main source of data for the spontaneous spoken portions of the American component of the International Corpus of English. 

Television corpus: Lexical Tutor website

Compiled by applied linguistics graduate students at Concordia University, Montreal. Contains 10 TV shows – five comedies and five dramas. The sub-corpora from the 10 shows were compiled by downloading transcripts freely available on the internet.