Learner Corpora
This is the first draft of a collection of links to learner corpora. This includes free publicly available corpora, others for pay if they are not already on jones (see JonesServer), and some which are not publicly available but you may contact the author/referent and try to obtain a copy or sample copy for your research.
Corpus |
Author/Referent |
Links |
Notes |
Cambridge Learner Corpus part of the Cambridge International Corpus (CIC) |
Cambridge University Press and Cambridge ESOL |
|
|
Corpus Escrito del Español L2 (CEDEL2) |
Cristóbal Lozano |
|
|
Corpus parlato di italiano L2 |
Osservatorio project |
|
|
English L2 - Hebrew L1 corpus |
Tina Waldman |
|
|
EVA spoken corpus |
A. Hasselgren |
|
|
FRIDA (French Interlanguage Database) |
Sylviane Granger |
http://www.fltr.ucl.ac.be/fltr/germ/etan/cecl/Cecl-Projects/Frida/fridatext.htm |
|
International Corpus of Learner English (ICLE) |
Sylviane Granger |
|
|
ISLE Speech Corpus |
|
||
English L2 - Japanese L1 learner corpus |
Asao Kojiro |
|
|
JEFLL (Japanese EFL Learner) Corpora |
Yukio Tono (Meikai University, JAPAN) |
|
|
JPU Corpus |
József Horváth |
|
|
LONGDALE - Longitudinal Database of Learner English |
Sylviane Granger |
|
|
Longman Learners' Corpus |
http://www.pearsonlongman.com/dictionaries/corpus/learners.html |
|
|
Louvain International Database of Spoken English Interlanguage (LINDSEI) |
Gaëtanelle Gilquin, Claire Hugon, and Sylviane Granger |
http://www.fltr.ucl.ac.be/fltr/germ/etan/cecl/Cecl-Projects/Lindsei/lindsei.htm |
|
Multimedia Adult ESOL Learner Corpus |
Adult ESOL Lab School |
http://www.labschool.pdx.edu/research/methods/maelc/intro.html |
|
PICLE - Polish sub-corpus of ICLE |
Przemek Kaszubski |
|
|
Spanish Learner Language Oral Corpus (SPLLOC) |
Laura Dominguez |
|
|
Standard Speaking Test (SST) Corpus |
Communication Research Laboratory and ALC Press |
|
|
Thai English Learner Corpus (TELC) |
Assumption University, Thailand |
|
|
Tswana Learner English Corpus (TLEC) |
Bertus van Rooy |
|
|
VOICE (Vienna-Oxford International Corpus of English) |
Vienna University and supported by Oxford University Press |
|
other
MICASE Native, nearnative, and non-native |
University of Michigan, English Language Institute |
|
TED Translanguage English Database |
European Language Resources Association (ELRA) and the LDC. |
http://www.elda.org/catalogue/en/speech/S0031.html http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2002S04 http://www.phonetik.uni-muenchen.de/Forschung/Publications/Lamel_ICSLP94.ps |
COMPARA Portuguese-English parallel corpus |
|
|
Chinese Learner English Corpus (CLEC) |
Shanghai Foreign Language Education Press |
|
Learner Business Letters Corpus |
Someya Yasumasa |
|
The English-Swedish Parallel Corpus |
Lund University Department of English |
|
English-Norwegian Parallel Corpus (ENPC) |
University of Oslo Department of British and American Studies |
http://www.hf.uio.no/ilos/forskning/forskningsprosjekter/enpc/ |
HKUST(Hong Kong University of Science and Technology) Corpus of Learner English |
J. Milton |
|
TELEC Secondary Learner Corpus (TSLC) |
TELEC Teachers of English Language Education Center, Department of Curriculum Studies, The University of Hong Kong |
no webpages available |
Corpus of Young Learner Interlanguage (CYLI) |
Vrije Universiteit Brussel |
no webpages available |
ELFA Corpus |
Tampere University |
http://www.tay.fi/laitokset/kielet/engf/research/elfa/corpus.htm |
Tools
The Linguistic Annotation Wiki might be useful if you are building your own corpus.
Other sites
Here are some other sites with lists of learner corpora.
Thank you to Elena Cotos at Iowa State whose list of learner corpora led to the creation of this list.
