A corpus can broadly be defined as a ‘’principled collection of texts available for qualitative and quantitative analysis’’ (O’Keefe, McCarthy, and Carter, 2007, p.1). It is principled in that it is built according to specific design criteria that have to do with the size, balance and representativeness of the language in the texts. The analyses of these principled collections of texts may be quantitative (frequency of words in the texts) or qualitative (beyond the word level using corpus techniques such as concordancing and cluster analysis)
The different design criteria of corpora render different types of corpora. Thus, based on the medium or language form there are spoken, written, or mixed corpora. With reference to the number of languages (and varieties of language), there are monolingual (one language often representing the national corpora such as the BNC) comparable (two or more languages or varieties) or parallel (two or more languages with their equivalent translations) corpora. Based on the date of origin there are synchronic or monitor (using contemporary language) and diachronic or historical (tracking the evolution of language). Finally, with reference to the types of texts, there are general (including many types of texts) and specialized (texts of particular type) corpora. Examples of specialized corpora are the pedagogic (all the language produced within classrooms including teacher, textbooks) and learner corpora (language produced by learners).
Corpora are considered invaluable for a number of reasons that have to do with the quality, quantity, and ease of processing of the language included in them. Thus, one of the advantages of corpora is that the texts included in them refer to genuine instances of language; a learner corpus, for instance, includes the language as produced by learners in the classroom and not fictitious examples. Furthermore, corpora offer more and better samples of language; the concordancing of a native corpus, for instance, offers many more and better examples of language use than a dictionary. Finally, the digital form of corpora allows high speed searches and analyses of the language included in them.
The advantages of corpora have rendered them indispensable in a number of disciplines. Thus, corpora are used in (a) lexicography, (b) grammar, (c) translation, (d) discourse analysis, (e) forensic linguistics, (f) sociolinguistics, and (g) pedagogy (directly through the Data Driven Learning (DDL) or indirectly through material designing)
Links for Corpora
Corpus query system: Sketch Engine
Sketch Engine for Language Learning
English corpora
- BNC
- COCA
- List of corpora
- List of learner corpora
- Corpora uploaded to Sketch Engine
- International Corpus of Learner English (ICLE): ICLE connection guide – ICLE manual
Greek corpora
Software tools
- Sketch Engine
- #LancsBox: Lancaster University corpus toolbox
- Text inspector
- Wmatrix corpus analysis and comparison tool
- AntConc
- Wordsmith
AUTh on corpus linguistics
- PhD theses
- Xargia, M. (2022). The written Greek Adolescent English as a Foreign Language corpus. An analysis of argument construction. Ph.D. Thesis, School of English, Au.Th.
- Chasioti, Triantafyllia. 2020. An interdisciplinary approach to the study of literature: a corpus linguistic analysis of Margaret Atwood’s novels. http://ikee.lib.auth.gr/record/320243
- Fotiadou, Georgia. 2010. Voice morphology and transitivity altermations in Greek: evidence from corpora and psycholinguistic experiments. http://ikee.lib.auth.gr/record/115216
- Katsika, Kalliopi. 2009. Sentence processing strategies in adults and children: PP attachment in corpora and psycholinguistic experiments. http://ikee.lib.auth.gr/record/114376
- Papaioannou, Vasiliki. 2018. Teaching English as a Foreign Language through a Data-driven Learning perspective – using an annotated pedagogic corpus of English textbooks in a Greek high school class. https://ikee.lib.auth.gr/record/299808
- Voyiatzis, Anastasios. 2019. Words in crisis: Metaphor valence in persuasive communication. https://ikee.lib.auth.gr/record/305441
- Zapounidis, Thomas. 2017. Young learners’ input and output in the 3rd Experimental Primary School of Evosmos: The Young Learner Corpus of English (yoLeCorE). https://ikee.lib.auth.gr/record/295293
- MA theses
- Chardalias, Asterios. 2015. On Cognition, Culture, Discourse and Corpora: High Seas and Smitten Hearts. http://ikee.lib.auth.gr/record/281906
- Daviti, Aggeliki. 2016. A corpus-based study of future will and be going to in Greek EFL textbooks. https://ikee.lib.auth.gr/record/283342
- Karanasiou, Thaleia. 2018. The expression of epistemic modality in Greek EFL learners’ argumentative essays: A contrastive corpus-based study. https://ikee.lib.auth.gr/record/297658
- Kordali, Christiana. 2017. A cross-disciplinary comparison of research article abstacts: A corpus-based study. https://ikee.lib.auth.gr/record/295512
- Lazoglou, Maria. 2017. A rhetorical analysis of conference abstracts in Greek and English: a corpus-based analysis. https://ikee.lib.auth.gr/record/293592
- Matou, Palmyra. 2020. Frequency lists in EFL coursebooks (C2) compared to the frequency lists of native English corpora. http://ikee.lib.auth.gr/record/320683
- Margari, Aikaterini. 2017. Coxhead’s AWL use and collocations produced in Greek and American university students’ argumentative essays: a comparative study. https://ikee.lib.auth.gr/record/292580
- Moreti, Katerina. 2019. A rhetorical analysis of discussion sections in MA dissertations in English: a corpus-based comparison between native and non-native writers. https://ikee.lib.auth.gr/record/306393
- Xanthou, Despina-Christina. 2018. The effect of CLIL on the writing and vocabulary skills of 3rd graders in the 3rd Experimental Primary School of Thessaloniki. https://ikee.lib.auth.gr/record/297031
- Xargia, Maria. 2014. Greek teenagers’ argument construction in written English: a contrastive rhetoric study of argumentative writing
- Projects
- Pedagogic Corpora of English Language Course Books
- ECCo : English coursebooks corpus
A pedagogic corpus compiled from texts found in five different course book titles- CEFR level B1+: Docx file – ECCo metadata - YoLeCorE : Young Learner Corpus of English
Learner corpus of Greek young learners (grade 4 or 9 year old children) learning English as a foreign language: Docx file
Links to other universities
- Centre for Corpus Research, University of Birmingham
- University Centre for Computer Corpus Research on Language, University of Lancaster
- “Corpus Linguistics: Method, Analysis, Interpretation”: Free online course from Lancaster University
- Lancaster Stats Tools online
- Centre for English Corpus Linguistics, Université catholique de Louvain
- Vienna-Oxford International Corpus of English, University of Vienna
Associations
Journals
- International Journal of Corpus Linguistics, John Benjamins Publishing Company