Abstracts AALL09

Abstracts of presentations accepted for the CALICO Pre-Conference Workshop on

Automatic Analysis of Learner Language (AALL’09)
From a better understanding of annotation needs to the development and standardization of annotation schemes

(Alphabetical list, by first author)

Combining Automated and Manual Techniques for Increased Efficiency and Accuracy in Corpus Error Annotation

Øistein E. Andersen (University of Cambridge, UK)

Manual error annotation of learner corpora is notoriously time-consuming and error-prone, whereas existing automatic techniques cannot detect and correct all kinds of error reliably. We present a method whereby the two can be combined and show that this hybrid paradigm enables faster and more consistent annotation by allowing the human annotator to concentrate on complex/interesting errors and letting the computer take care of simple/uninteresting ones which nevertheless have to be corrected. Further analysis of the data is expected to provide more information about the limits of automatic annotation as well as the amount of context needed for the error annotation task.

[ back to program ]

Ambiguous Errors in Annotation Schemes and Evaluation

Adriane Boyd (The Ohio State University)

Ambiguous errors provide a particular challenge in the development of annotation schemes and evaluation metrics. We will discuss a range of possibilities for the treatment of ambgiuous errors in learner corpus annotation schemes and examine the effects of annotation choices when learner corpora are used for direct evaluation of systems that analyze learner language.

[ back to program ]

What Results Give which Strategies in Error Annotation?

Ana Díaz-Negrillo and Salvador Valera (University of Jaén, Spain)

This paper presents various strategies in error tagging, specifically their shortcomings and benefits regarding: i) identification and classification of errors, and areas of special difficulty; ii) scope of error tagging, that is, how much context is to be tagged; and iii) amount and type of information in error tags, that is, how many types of information may be provided and for what purposes. The discussion aims at pointing to the results obtained in each strategy and, thus, to directions that may be desirable for standards according to the purposes of annotation.

[ download the slides , back to program ]

Morphological Analysis for Russian Learner Language

Markus Dickinson and Joshua Herring (Indiana University)

We explore the development of a system for detecting and diagnosing morphological errors in Russian. Important questions concern what type of linguistic analysis is needed for accurate feedback, whether adequate resources are available, and how existing resources can be adapted under the time and resource constraints typical of the development stages of an ICALL project. We use a freely-available part-of-speech (POS) lexicon to quickly build a finite-state morphological analyzer for Russian learner language. This implementation allows for the detection of errors in paradigm choice, morphological choice, allomorphy, and spelling.

[ download the slides , back to program ]

The use of NLP Technologies to Engineer Oral Proficiency Test Items

Ross Hendrickson and Deryle Lonsdale (Brigham Young University)

This presentation describes a newly created Elicited Imitation (EI) test item creation tool. Prior work in EI test administration by us and others has resulted in the identification of several salient features that prove useful in assessing language achievement. We have selected twenty-four of these features for processing within the EI item development tool, and in this presentation we explain which they are and why they were chosen. The tool operates in an analysis mode and a generation mode. It can be used with a GUI for interactive item development, or via the command line for batch processing.

[ download the slides , back to program ]

Multilevel Learner Corpora

Hagen Hirschmann, Amir Zeldes, and Anke Lüdeling (Humboldt-University Berlin, Germany)

This paper presents multilevel corpus architectures as a means of studying learner language. Limitations of current inline-annotated corpora are discussed and contrasted with advantages offered by state-of-the-art rich annotation architectures. The focus here is on the representation of conflicting and competing annotation. As an example, we introduce the Annis web interface and the German learner corpus Falko, which utilize a relational database representation of a stand-off XML format, establishing first steps in applying multilevel architectures to learner data.

[ download the slides, back to program ]

Towards an Automatic Analysis of Metaphorization in Students’ Writings

Huaqing Hong and Jianzhen Zheng (Nanyang Technological University, Singapore)

A number of researchers have pointed out that mastery of grammatical metaphor, i.e. reconstrual of experience into more abstract, general level, represents a landmark in the development of children’s writing ability and affords them access to educational and school knowledge. It is thus important to identify in what way students’ use of such a kind of rhetorical device in their artifacts. However, there has had no a handy tool which can automatically capture the distributive properties of the metaphorization in learner language. This paper reports our proposal of developing an automatic analysis toolkit to examine and compare the patterns of grammatical metaphors in a corpus of Singaporean students’ writings.

[ back to program ]

Does Automated Feedback in a Proofreading Tool Help an English Language Learner?

Claudia Leacock (Butler Hill Group), Michael Gamon (Microsoft Research), and Chris Brockett (Microsoft Research)

Microsoft Research’s ESL Assistant is a prototype web-based proofreading tool developed for English Language Learners. It focuses on error types typically made by non-native writers and enables them to launch web searches for the original string and suggested revision. A freely-available version was launched in June 2008 for which we have logs from thousands of users. These logs contain original text, suggested rewrites, and whether the suggestions were accepted. An analysis of these logs show that users can distinguish correct suggested rewrites from incorrect ones and that, when the system offers an incorrect solution, the users can supply the correct solution on their own.

[ download the slides , back to program ]

Human Judgment on Article and Noun Number Usage

John Lee (MIT), Joel Tetreault (Educational Test Service), and Martin Chodorow (Hunter College of CUNY)

Errors involving the noun phrase are frequent in English learner text. For each English noun phrase, the writer must make challenging choices with regard to the number of the head noun and to the presence or absence of an article. While the task of predicting the correct article has been well studied, that of predicting noun number has received scant attention. In this project, we explore this gap by addressing three issues that have not been adequately analyzed: interaction between article and number, multiple correct answers, and importance of context.

[ back to program ]

Annotation of Spelling Errors for Korean Learner Corpora

Seok Bae Jang (Brigham Young University), Sun-Hee Lee (Wellesley College), and Markus Dickinson (Indiana University)

We provide an annotation scheme for spelling errors of Korean language learners and the relevant error taxonomy that can be used to guide automatic error diagnosis and feedback instruction in a Korean ICALL system. Although the Korean writing system reflects syllabic structure, there are frequent mismatches between a syllable and a character, due to phonological rules and morphological boundaries. With an annotation scheme of five categories layered on top of other error annotation, we develop an annotated corpus with writing samples of Korean language learners and examine the efficacy of our new annotation scheme and taxonomy for Korean.

[ download the slides , back to program ]

Automating Measures of L2 Syntactic Complexity

Xiaofei Lu (Pennsylvania State University)

A major challenge facing second language acquisition (SLA) researchers who apply syntactic complexity measures to large language samples is the labor-intensiveness of manual analysis. This has not only limited the size of the language samples analyzed in many studies, but it has made it difficult to empirically evaluate the validity of the complexity measures using large-scale corpus data. This study aims to fill this important gap by designing and evaluating a computational system that can automatically measure the syntactic complexity of language samples of any size using fourteen different syntactic complexity metrics current in SLA research.

[ download the slides , back to program ]

Improving Automated Oral Testing: Identifying Features and Enhancing Speech Recognition

Jeremiah McGhee, Aaron Johnson, Ross Hendrickson, Meghan Eckerson, Malena Weitze, Ben Millard, Ray Graham, Deryle Lonsdale, and Dan Dewey (Brigham Young University)

This paper extends our work in the crossover between elicited imitation (EI) tests and automatic analysis techniques including speech recognition. We have administered an EI test to hundreds of adult ESL learners, automatically scored them, and correlated the results with other human-scored tests. We have also performed post-hoc analysis to assess how linguistic features contributed to EI item difficulty. In this presentation we synthesize and integrate these two strands of research. With a new hand-annotated corpus of some 40,000 EI item responses, we can more extensively explore interactions with internal test characteristics and externally-supplied holistic data about the learners themselves.

[ download the slides , back to program ]

Towards Analyzing Korean Learner Particles

Chong Min Lee (Georgetown University), Soojeong Eom (Georgetown University), and Markus Dickinson (Indiana University)

Building from the idea of modifying treebank annotation to detect Korean particle errors, we extend the work in several ways. First, we evaluate the system with a Korean learner corpus instead of artificial data. Several issues arise for proper evaluation: some errors, e.g., misspelled particles, should be modified, and discourse-error annotation needs to be added to evaluate different error types. Secondly, a pre-built POS tagger should be adapted for ill-formed input. Finally, to overcome the genre limitations of one corpus, we merge the annotation of two corpora. With these steps, we can use the parse output to begin error detection.

[ download the slides , back to program ]

Students Naturally Speaking

Trevor Shanklin (San Diego State University)

In this presentation, we will outline the procedures we have taken using NVivo 8.0 qualitative software to analyze a pilot study of spoken data from 15 English tests. The software allows us to tag audio files directly, with comments, although the written transcripts are also provided. The procedures should then extend to the data for tests in currently eight different languages. The analysis is intended to 1) help assess the success of individual test items; 2) help develop feedback to the test taker; and 3) offer support to curriculum planners. Initial structures investigated are use of past-tense verbs in items requiring past narration and use of subordination.

[ download the slides , back to program ]

Towards Automatically Acquiring Models of ESL Errors

Joel Tetreault (Educational Test Service) and Martin Chodorow (Hunter College of CUNY)

One of the biggest challenges in developing automatic NLP grammar error detection systems for learner writing is the relatively scant amount of annotated training data available. We believe that current models could be improved if they are augmented with knowledge of typical learner errors, thus circumventing some of the problems associated with building a training corpus. In this probe study, we propose a method to automatically mine examples of common learner errors from the web. While this work focuses on preposition errors, the methodology can be applied to other error types as well.

[ back to program ]

Increasing the Reliability of a Part-of-speech Tagging Tool for Use with Learner language

Sylvie Thouesny (Dublin City University, Ireland)

Since errors in part-of-speech tagging result in larger errors in the analysis of incorrect grammatical or lexical forms, it is essential to encode all components in a text with robust and consistent tags. Following a description of the creation and annotation of a learner language corpus, this presentation will explicate how the tagging accuracy of TreeTagger was increased by means of (a) identifying the lemmas that are unknown for the tagger, (b) checking the part-of-speech tags automatically obtained against an extended set of common-sense rules based on recurrent tagging errors and (c) cross-referencing the part-of-speech tags with the error-encoded tags.

[ download the slides , back to program ]

Automated Pronunciation Scoring for L2 English Learners

Su-Youn Yoon, Mark Hasegawa-Johnson, and Richard Sproat (University of Illinois at Urbana-Champaign)

This study aims at developing an automated pronunciation scoring method for second language learners of English. The method in this study is characterized by the combination of confidence scoring and classifiers. Classifiers were trained for the specific English phonemes where L2 learners make frequent errors. By combining two methods, the accuracy of automated scoring method increased from 83% to 86% (relative 17%). Furthermore, in contrast to the confidence scoring method, SVM can explain what the incorrect phonemes are like. This information can be used as a key to provide the valuable feedback to correct the error.

[ back to program ]

Automated Speech Fluency Scoring Using Simple Speech Technology

Su-Youn Yoon and Richard Sproat (University of Illinois at Urbana-Champaign)

This study aims at developing an automated scoring method of speech fluency for second language learners. Developing an automated scoring method using a word recognizer is difficult task due to the insufficient training data and the low accuracy rate of the recognizer. Temporal features are strongly correlated with L2 learners’ fluency levels, and they do not require the word identity information. Therefore, a phone recognizer and a disfluency detection algorithm were used instead of word recognizer in this study. The temporal features, calculated based on the proposed method, showed statistically significant correlation with human experts’ scores. This method can be applied for languages with fewer resources.

[ back to program ]