Automatic part-of-speech tagging of the Tartu Corpus of Estonian Learner English with CLAWS7: impact of learner errors
Articles
Liina Tammekänd
University of Tartu, Estonia
Reeli Torn-Leesik
University of Tartu, Estonia
Published 2023-12-28
https://doi.org/10.15388/Taikalbot.2023.20.9
PDF
HTML

Keywords

learner English
automatic POS-tagging
learner errors
TCELE
CLAWS7

How to Cite

Tammekänd, L., & Torn-Leesik, R. (2023). Automatic part-of-speech tagging of the Tartu Corpus of Estonian Learner English with CLAWS7: impact of learner errors. Taikomoji Kalbotyra, 20, 126-140. https://doi.org/10.15388/Taikalbot.2023.20.9

Abstract

The present paper, which is a continuation of Tammekänd and Torn-Leesik’s (2022) study, aims to examine how learner errors affect the CLAWS7 tagger’s automated assignment of part-of-speech (POS) tags to a sample of 24,812 words of the Tartu Corpus of Estonian Learner English (TCELE). Learner errors causing tagging errors in the sample were identified, based on which a working error taxonomy was created. The POS-tagged and error-tagged samples were collated and compared to map correlations between learner and tagging errors. Error groups that correlated with significantly increased rates of tagging errors were identified. Possible reasons were suggested to account for the impact of learner errors on the tagger’s performance. The CLAWS7 tagger misanalysed only 2.8% of forms representing learners’ language errors but assigned wrong tags to every fifth spelling error (22%).

PDF
HTML
Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Downloads

Download data is not yet available.

Most read articles by the same author(s)

<< < 5 6 7 8 9 10 11 12 13 14 > >>