Member Login

Reset Password



Vol 26, No. 2 (January 2009)

[article | discuss (0) | print article]

CALL Evaluation for Early Foreign Language Learning: A Review of the Literature and a Framework for Evaluation

Eduardo Garcia Villada
Drake University

This paper provides a critical review of the literature between 1980 and 2005 on the evaluation of resources for computer-assisted language learning (CALL) with a particular emphasis on elementary education. An analysis of that literature indicates that there are a range of approaches to evaluation that have been somewhat monolithic and possibly linked to a postpositivist approach to research in education. Today's elementary classroom, where foreign languages are learned, requires a more interpretivist approach to the evaluation of CALL resources informed by the principles of second language acquisition (SLA) theory, instructional design theory, early foreign language (FL) teaching methodologies, and the connection of FL curriculum with other curriculum areas to promote meaningful student-centered interactions. A critique of the literature using factors of multivocality, contextualization, and interpretation is followed by the proposition that these may provide a framework for future CALL evaluation. The evaluation of CALL has a complex relationship with elementary education and therefore CALL evaluation may best be integrated within teacher education programs.

CALL Evaluation for Early Foreign Language Learning: A Review of the Literature and a Framework for Evaluation


CALL evaluation, CALL for Foreign Languages at the Elementary School (FLES), Evaluation Paradigms, "Perspectives Interaction Paradigm," Interpretivism.


Computer-assisted language learning (CALL) materials can be valuable in the elementary classroom (Ford-Guerrera, 1997; Oxford, 1998) and take a number of forms. This paper reviews research on the evaluation of CALL materials for this age group. The review includes guidelines designed to evaluate materials such as practice exercises with audio, text, and video for vocabulary, grammar, and pronunciation, as well as instructional software in the form of tutorials, simulations, games, and problem solving. Researchers and developers of CALL materials have emphasized the need for evaluation of such materials (e.g. Chapelle, 2001).

The evaluation of CALL materials intersects with instructional design, teaching practice, and learning activities. Over the last three decades, CALL design and its evaluation have gone through a process of conceptualization (Gruba, 2004), and outcomes of this process are tools and procedures developed to evaluate the instructional effectiveness of CALL materials. Accordingly, CALL evaluation draws its foundations from accepted models of instructional design, pedagogy, aesthetics, and usability of the materials (Chapelle, 2001; Decoo, 1984; Hamburger, 1990; Hubbard, 1987, 1988; Laurillard, 1991; Levy, 1997; Pederson, 1987; Phillips, 1986; Thompson, 1999). As a result, CALL evaluation serves to inform those who design

0x01 graphic


and select CALL materials on how well those materials may meet the needs of FL teachers and students. These needs vary considerably with age and pedagogy in FL education (Curtain & Dahlberg, 2004).

The evaluation of CALL has been commonly approached from one of three perspectives: the developer, the teacher, or the student. These single-sided perspectives reflect the postpositivist epistemology that has been a widely accepted, and is perhaps the dominant, belief system in behavioral and social science research (Guba, 1990). From this standpoint, the main purpose of CALL evaluation has been in relation to the effectiveness or the impact that the use of CALL may have for language learning.

As suggested by Hamburger (1990) and Hubbard (1987), CALL evaluation is too complex to be addressed by single-sided perspectives. In the postpositivist research tradition, where teachers are "guided" by experts, teachers may not feel responsible for doing CALL evaluation. In other cases, due to the complexity of defining their own evaluation criteria and finding time, teachers who do evaluate CALL materials often adopt evaluation tools that are developed by others who may not have a classroom perspective. When experts do the design, development, and evaluation of CALL materials alone, teachers have little influence over the relevance of those materials to specific classroom and student contexts. To address this issue, teachers' voices and experiences (Freeman & Johnson, 1998; Widdowson, 1993), and students' voices (Lincoln, 1995) need to be folded into the discourse of CALL studies to encourage improvement in current research and evaluation paradigms, particularly in the field of elementary FL education.

As an alternative, an integrated, multiple perspective approach to CALL evaluation research that reflects an interpretivist epistemology (Guba, 1990) could provide a means of integrating these missing voices. The purpose of such an approach is to facilitate professional development for practitioners so that teachers take an active role as researchers, generate their own findings, reflect deeply on their observations, and share their ideas with other teachers (Gage, 1989; Kelly & McAnear, 2003; Widdowson, 1993). From an interpretivist standpoint, the link between CALL evaluation research and practice indicates a multifaceted and contextualized process in which findings take into account diverse voices (Freeman & Johnson, 1998; Lincoln, 1995; Widdowson, 1993), including the combined perspectives of the developer, the teacher, and the students (Squires & McDougall, 1994).

There is relevant literature for CALL evaluation, but much of it explicitly targets materials for adults or may not specify an age group at all. The research on adults may inform evaluation of CALL for elementary schools, but pedagogical approaches are very different with young learners when compared to those of adult learners (Kennedy, 1988; White & Genesse, 1996). Pedagogical approaches for children learning a second language might include songs, rhymes, games, and physical activities, while approaches for adults can include more printed materials and the use of established cognitive strategies, such as first to second language skills transfer. Age groups may also learn differently. For example, research indicates that adults learning a second language learn more quickly than children, but children learn more easily and over time obtain better levels of second language proficiency than adults (Larsen-Freeman & Long, 1991). Consequently, it is important to consider young learners--and how teachers work in elementary classrooms--and address evaluation in this perspective because an evaluation of CALL materials that is not specific to this context is unlikely to work for elementary education. Approaches to CALL evaluation need to take into consideration what researchers, developers, teachers, and students do with those materials (Chapelle, 1990) especially when they are used in communicative activities mediated through technology with others outside the classroom. In addition to the literature on CALL evaluation, the standards


promoted by the general literature on software evaluation (Squires & McDougall, 1994) and the International Society for Technology in Education (ISTE) (Kelly & McAnear, 2003) identify software evaluation as an expected competency for teachers, student teachers, and administrators.

The purpose of this paper is to argue that an interpretivist framework should be used for evaluation of CALL for the elementary classroom. It does so by reviewing prior literature on CALL evaluation over the last 25 years. This paper is organized in two main sections. The first section provides the review and a critique of the literature followed by an alternative framework to help inform developers, researchers, and teachers and to improve the evaluation of CALL resources for elementary education.


Before detailed studies are reviewed, it is useful to critique recent literature reviews on CALL. In 2003, two major reviews of CALL literature were published (Liu, Moore, Graham, & Lee, 2003; Zhao, 2003). Liu et al. reviewed the research literature published during 1990-2000 on the topic of computer use in FL learning, with a specific focus on how computer use can enhance language acquisition. Only one database, the Educational Resources Information Center (ERIC), was the source of their data, with analysis of 246 articles published in 21 journals during this time period. Of those 246 articles, Liu et al. examined only the empirical research studies, which narrowed the literature to 70 articles. One important finding in analyzing these studies was that the majority of these research articles focused on college-level research studies, with only two studies investigating K-12, and those two articles focused on high school settings. The authors organized the literature by topic or trend, but the specific topic of CALL evaluation and selection was not addressed in the researchers' review.

Zhao (2003) published a review on technology use and FL learning, limited to articles written in English and published between 1997 and 2001. Zhao's purpose was to provide a meta-analysis of effectiveness of CALL materials in terms of student learning. Again, only one database, ERIC, was searched for articles on CALL and second language acquisition (SLA), with a result of over 300 articles. Zhao applied a number of selected criteria to determine whether there were suitable empirical studies on effectiveness among these articles. First, Zhao narrowed the number of journals down from 22 to just 5, using quantitative and also subjective processes to determine "representative" and "more research-oriented" journals. From these few journals, Zhao applied additional criteria for examining articles in these journals and ended up with only nine studies that met all his criteria. The review focused on college-level language learners, and the evaluation and selection of CALL materials were not addressed. However, Zhao noted a "shocking" lack of research on K-12 second language learning with CALL and concluded this meant that K-12 second language teachers were not using technology and that K-12 and university researchers were not interested in conducting empirical research in K-12 second language settings.

K-12 FL teachers are using technology in their teaching, although there are few empirical research studies documenting their use (Rosenbusch, García Villada, & Padgitt, 2003a, 2003b). Accordingly, based on published reports, the review of the literature in this article considers the specific topic of how K-12 instructors have selected and/or evaluated the CALL materials they use and the issues that emerge from their selection/evaluation of these materials. One expected outcome of this study is to promote and enhance K-12 instructors' use of technology, as well as to contribute to the body of research in this neglected area.



For the present review, four databases--ERIC, Linguistics and Language Behavior Abstracts (LLBA), Dissertations Abstracts International (DAI), and WorldCat--were searched to find relevant literature from 1980-2005 on the topic of CALL evaluation for foreign languages at the elementary school level. These four databases were chosen as a means of retrieving comprehensive results from the published journal articles, books, and dissertations.

The specific search string used in these databases was "(computer assisted language learning or computer assisted instruction) and (elementary or K-12 or children) and (foreign language or second language or FLES or Spanish) and (software evaluation or software selection)." In the ERIC and LLBA databases, the search was also limited to only peer-reviewed journal articles. The search produced different results from each database since the four databases are constructed differently and do not contain exactly the same journals or content, even though there is some overlap.

For example, the entire search string strategy did not retrieve any records at all in the DAI database. When the keywords "software evaluation or software selection" were removed, the search only produced four citations which, lacking these critical key concepts, were not relevant. Moreover, the DAI database only produced one citation when the keywords "evaluation" and "selection" were excluded, but the keyword "software" was included. These results indicate a lack of research at the dissertation level conducted on this topic from 1980 to 2005.

Similarly, the LLBA database did not produce any results with the entire search string, but when the keywords "software evaluation or software selection" were left out, the search produced 16 citations. When the keywords "evaluation" and "selection" were excluded, but the keyword "software" was included in the search string, two citations were found. Again, database search results that omitted the key concepts of "software evaluation" and "software selection" were judged to be not relevant to the specific focus of this paper.

The search in the WorldCat database produced eight records, and the results from the ERIC database yielded 10 citations when the entire search string was used. To verify that records from these four databases lacking the keywords (and thus the key concepts) of "evaluation" and "selection" were not relevant, abstracts of these materials (research articles, dissertations, books, and web resources) were examined, and an initial set of 30 records was selected. From these records, the cited references or bibliographies in the peer-reviewed articles were also reviewed.

Based on this review, the record set was then narrowed to only those materials addressing selection guidelines and checklist criteria, which resulted in a set of only 24 citations. Table 1 presents those citations listed in alphabetical order.


0x01 graphic


0x01 graphic

*NS = not specified


0x01 graphic

*NS = not specified


0x01 graphic

*NS = not specified


0x01 graphic

*NS = not specified


0x01 graphic

*NS = Not specified

Of these 24 entries, one focuses on K-6, another covers 9-12, three others cover K-12, and three more cover K-16. The remaining citations cover college years or do not specify a particular grade focus. The 24 records included 14 journal articles, 5 books, 2 ERIC documents, and 3 web resources.

These results indicate that little research was published in the decades between 1980 and 2005 on the topic of CALL evaluation for FL learning in elementary schools. In part, these results corroborate the findings of Liu et al. (2003) and Zhao (2003). However, the reviews by Liu et al. and Zhao confirm that there is a substantial body of research on CALL for adults. While only eight of the 24 citations found in the present study deal with K-12, the remaining 16 articles were still included because they have the potential to guide the research that has not yet been done in elementary schools.

CALL evaluation includes multiple perspectives, especially the perspectives of teacher, developer, and student. A thorough review of the entries in Table 1 reveals that 22 of these works address CALL selection and evaluation from the teacher perspective, six entries from the developer, one was not specified, and one from the student perspective. Note that some works address more than one perspective, so numbers here add up to more than 24. The objectives of each evaluation identified in the studies in Table 1 are as follows: to select CALL materials (11), to establish criteria for CALL selection or design (5), to determine the effectiveness of CALL (2), to judge CALL design (2), to facilitate training in CALL selection (2), to develop an SLA theory in the context of CALL (1), and to submit CALL reviews for publication (1). This suggests that areas of interest to the K-12 community are to provide practical assistance in the selection of CALL materials and, to a lesser extent, the establishment of criteria for selection or design.


The critical analysis in this paper is founded in interpretivism (Guba, 1990). Therefore, it is important to understand some of the foundations that influence and support the evaluation of CALL resources from the combined theoretical perspectives of CALL design and FL teaching and learning.


Willis, Jost, and Nilakanta (2002) articulate the value for adopting an interpretivist approach in instructional design development and research. According to Willis et al., a major difference between research from the postpositivist and the interpretivist paradigms resides in the goal of the research. Postpositivism assigns universality to the research conclusions and seeks the "Truth," while interpretivist research seeks understanding. "Truth" implies more control over the research process, while "understanding" implies production of knowledge in context. These paradigmatic contrasts were also studied by Levy (1997) as they pertain to the field of CALL design and research. Willis et al. have called attention to the foundational issues of research on instructional design and technology. Based on their assumptions, the spirit of interpretivism in instructional design and technology may be captured in three elements. The first element is multivocality: interpretivism allows multiple perspectives and voices on the topics of educational research and practice. The second element is contextualization: interpretivism emphasizes local and authentic realities instead of the "one size fits all" that is sought in educational practice. The third element is interpretation: interpretivism allows the unfolding of meaning through the interpretation and the insights of the researcher and the practitioner. Interpretivism provides guidance, not specific rules, to interpret knowledge about the world. An interpretivist view of the relationship between theory and practice in educational technology constitutes a theoretical foundation for a holistic CALL evaluation method that combines elements of design, teaching, and learning in local and authentic classroom settings. Squires and McDougall (1994) proposed a general software evaluation approach or "perspectives interaction paradigm" (PIP) that includes the designers', teachers', and students' perspectives in the evaluation of software. PIP may be valuable for CALL evaluation in elementary school contexts.


This section discusses the 24 publications listed in Table 1 in relation to the three elements important in interpretivist research, namely, multivocality, contextualization, and interpretation because these elements may well support a framework of interpretivist CALL evaluation. The discussion concludes with a review of those papers that attempt to consider several of these elements together.


Multivocality refers to the diversity of voices and points of view from the participants in a specific setting, as opposed to a single or unified, monolithic voice or perspective (Willis et al., 2002). Fourteen of the 24 publications selected for this review take the single perspective of the teacher. For example, authors such as Evans and Gibson (1989), Lillie, Hannum, and Stuck (1989), Hertz (1984), and Strei (1983) proposed checklists from the teacher perspective for FL software evaluation (see column 3 in Table 1). Similarly, Hamerstrom, Lipton, and Suter (1985) and Phillips (1986) developed guidelines from the teacher's perspective for software evaluation in the context of specific projects in which the goals were to develop CALL materials. In few cases is the perspective of students combined with that of teachers (Curtain & Shinall, 1987; Hubbard, 1987, 1988; Komoski & Plotnick, 1995). Designers', teachers', and students' perspectives are only rarely taken into account and combined into a holistic view (Chapelle, 1998, 2001) to provide the best multivocality.

Further detail is now provided for six of the papers, five from the perspective of the teacher and one from the designer. The CALICO Journal has considered software evaluation


a topic of interest to its readers since the early 1980s (CALICO, 1983; Burston, 2003). In 1983, in a column titled "Wanted: Courseware Reviewers and Reviews," the journal editors provided guidelines for consideration by software reviewers and invited readers to add more ideas to the proposed guidelines. In the following issue of the CALICO Journal, Strei (1983) argued that current forms of software evaluation were "excessively wordy and imprecise or so brief" and that a set of standardized evaluation guidelines would benefit the language teacher profession. Thus, he developed a 3-page checklist to guide the evaluation of CALL materials for selection. The checklist was intended for computer drills and provided space for technical information related to the software, the different levels of students' language skills, and the language skills being practiced. The guidelines did not go beyond descriptive features of the software and disregarded student views. Students' prior knowledge along with students' preferred language-learning strategies and diversity of learners were issues not considered.

Hertz (1984) proposed a checklist for evaluation of language arts computer materials that may have application in the assessment of CALL materials. The checklist is a series of questions related to the description of the material, with "yes" and "no" response categories. Hertz's checklist considers the single perspective of teachers and assumes an average type of learner. Despite some considerations about the learner characteristics, the checklist does not provide space for adaptability to learner diversity, nor does it give consideration to the teacher's point of view. Although it may be assumed that the designer's view is included in Hertz's checklist because there are questions about aesthetic aspects of the materials, the checklist fails to consider basic principles of instructional design such as instructional methodologies and strategies in the presentation and design of instructional activities. The designer's view is therefore not valuable.

Taylor's (1985) guidelines present software evaluation from the teacher's perspective but position teachers as passive consumers of knowledge prescribed in the guidelines. Instead of teachers developing their own selection criteria, teachers' views are excluded, and they are directed instead to rely on the recommendations given by organizations that produce a significant number of software evaluations, such as those by the Association for Supervision and Curriculum Development and the Children's Software Review. Each publishes a fee-based searchable database called "Only the Best" and "Children's Software Finder," respectively. These databases provide reviews of commercially available multimedia products, including software, video games, and websites. Similar to Taylor's guidelines, the guidelines used in these databases lack consideration from people who have knowledge about elementary classroom practices, especially teachers, about specific software titles.

Laurillard (1991) approaches evaluation from the designer perspective with the student in mind. She grounds her guidelines for CALL design on psycholinguistics research in general and SLA in particular. Laurillard emphasizes the importance of the use of language-learning theory in the design of CALL materials. She is interested in the demands imposed on learners when they learn language with the goal of obtaining communicative competence, and argues that in such situations learners are overloaded with cognitive demands. Thus, learners' prior knowledge has to be considered before giving consideration to instructional design principles in CALL. Those principles, according to Laurillard, are understood as the classical elements of computer-based instruction, and they include aspects such as presentation of the information, triggers of motivation, opportunities for practice, and feedback for remediation. Laurillard predicts success in CALL design when learner variables and design principles that are based on SLA and instructional design are taken into account.

These six examples demonstrate the lack of depth of a single perspective, which becomes even more problematic when the absence of contextualization is considered.



Contextualization involves the identification of the particular conditions in the settings under which the CALL materials are being used. With the exception of Hertz (1984), the majority of CALL evaluation guidelines discussed so far are intended for college level students or do not specify a targeted grade level. For example, neither the guidelines provided by CALICO editors (1983) nor Strei's (1983) checklist consider either the context or local situations surrounding the software under use. Those who do specify a precollegiate level in their guidelines are Taylor (1985), Hamerstrom et al. (1985), Lillie et al. (1989), and Treadwell (1999).

During the 1980s, CALL evaluation was primarily completed with checklists that aimed to evaluate CALL materials within a technical, postpositivist paradigm. These checklists included features of the products, technical requisites, suitability and adequacy of the activities, documentation, strengths and weaknesses, and appraisals in the form of an overall recommendation. Taylor (1985) argues that teachers have to evaluate context and judge how appropriate the instructional strategies are for their particular classroom situations, but his guidelines position the teachers as passive consumers of knowledge. Even when Taylor asks how the software matches local and state curricular needs, he does not articulate a practical way to put the evaluation in context and does not address questions about how relevant the features of the software might be in different contexts.

In the 1990s, evaluation checklists continued to be used (Laurillard, 1991; CARLA, 1998; Thompson, 1999), but their use was criticized in favor of empirical evaluations of CALL use in classroom contexts (Chapelle, 1998). For example, Laurillard's (1991) guidelines consider the student variables, but the author does not address pedagogical issues when CALL materials are used in classroom contexts. Once again, teachers are seen as consumers of the body of knowledge prescribed in the guidelines, and it is implied that as long as CALL is developed using "theory," teachers and students will be satisfied when they engage with the CALL materials; in reality, this is hardly the case. CALL evaluation goes beyond the evaluation of CALL design.

Recently, Susser (2001) came to the defense of checklists. He concentrates on six of the common arguments against checklists, namely, the "yes/no" format, the distance from classroom realities, the lack of experimental evidence, the commitment to specific methodology, the validity, and the required expertise for use. Susser argues that a possible reason why checklists have been the target of criticism is that there are two opposing principles of pedagogy, technical and humanistic, that place practitioners in different camps of preference with respect to checklists. The technical camp favors checklists, and the humanistic camp perceives them as a presence/absence dichotomy that is fundamentally opposed to choice. One could interpret Susser's explanation to be founded on the postpositivist/interpretivist contrast expressed by Guba (1990).

Over the years, evaluation guidelines have become more contextualized. For example, research on elementary FL program evaluation (Rosenbusch et al., 2003a) shows that language-learning outcomes by young children in technology-enriched environments need to be studied in the context of the specific program, include the perspectives of teachers and students, and be analyzed longitudinally over time. Similarly, the checklist approach to CALL evaluation from the 1980s and 1990s has changed to favor the use of multiple research methodologies for CALL evaluation in which classroom-based research and student language outcomes are analyzed. CALL usage and its evaluation in the context of the elementary classroom are not simple versions of college classroom settings; they are inherently very different. However, while context is important, the guidelines are yet open to interpretation.



Interpretation implies that the evaluator's critiques and reflections have a necessary personal and subjective component. This element of interpretivist evaluation is frequently overlooked by those who have proposed CALL evaluation guidelines. For example, Strei (1983) expects the evaluator (the teacher) to provide comments and a summary of the judgment to justify adoption or rejection but does not articulate how teachers might critique or reflect on software, in addition to using his checklist. Similarly, teachers who use Hertz's (1984) checklist are not encouraged to provide an opinion or to give an appraisal of the software under consideration. Interpretation in Hertz's checklist is addressed by questions with a "yes/no" response category. For example, using an interpretivist evaluation approach, the question "Does the level of difficulty vary according to the demonstrated ability level of the student?" is likely to merit a reflective answer rather than a simple "yes/no" response. As an alternative to the use of checklists, Owston and Dudley-Marling (1986) suggest a short written narrative of the software's uniqueness but do not elaborate on the nature of the narrative. The element of interpretation is also missing in the guidelines formulated by Hamerstrom et al. (1985), Phillips (1986), and Taylor (1985). Guidelines from these authors do not consider any narrative appraisal, interpretation, or opinion in CALL evaluation.

From a CALL developer's perspective, O'Neal and Fairweather (1984) argue that the most important part of an evaluation is to complete a needs analysis of the "training and development" needs before selecting CALL development tools. Instead of focusing on the evaluation of CALL materials, the authors concentrate on the issue of productivity (the most appropriate and cost-effective authoring tool) and the time it takes for a developer to learn and develop an hour of instruction. Even though O'Neal and Fairweather proposed a multivocal and contextualized evaluation by maintaining that the evaluation model that yields the right authoring tool for maximum effectiveness in a given context uses a combination of multiple lessons, multiple users, and multiple developers, the authors assume a postpositivist view of evaluation by suggesting that "above all, throughout the evaluation, [evaluators] maintain the distinction between those notions that are based on evaluative data and those that are based purely on opinion whether [the evaluator's] or others" (p. 46). The value of multiple methodologies (empirical and judgmental) to examine CALL is not valued by these authors (see Chapelle, 2001).

In the elementary school context, interpreting software may be restricted by the fact that teachers are busy and find it difficult to evaluate CALL materials in depth before use. It is also difficult for elementary FL teachers to get involved as CALL designers and/or offer their reflections on their CALL practice (Rosenbusch et al., 2003b).

Although teachers are busy, there have been numerous concerted efforts from local, state, and national FL teacher organizations to promote the use of technology at the K-12 level. For example, through the creation and funding of language resource centers throughout the United States, the US Department of Education has substantially increased the participation of teachers and scholars in the infusion of research, technology, and best practices in FL education ( In particular, the National K-12 Foreign Language Resource Center (NFLRC) at Iowa State University has sponsored summer institutes with the goal of promoting teachers' use of technology and action research, as well as establishing partnerships between teacher educators and K-12 FL teachers. The impact of one of those institutes has been documented by Rosenbusch, Kemis, and Moran (2000). Regarding an action research institute that took place at the NFLRC at Iowa State University in 2002, the institute leaders worked with participants through the year and met again with them the following summer. Several participants published research projects; however, the research was not all early language learning focused (Donato & Hartman, 2002; Rosenbusch, personal communication, 2007).


Other language resource centers, such as those housed at the University of Michigan, the University of Minnesota, San Diego State University, the University of Oregon, University of Hawai'i, Georgetown University, George Washington University, and the Center for Applied Linguistics also continue training teachers in the use of technology. For one example, the Center for Applied Second Language Studies (CASLS) at the University of Oregon promotes the development of computer-enhanced materials and assessments for young students. These initiatives from language resource centers include teacher training in classroom use of and integration of technology, as well as classroom-based teacher-lead action research on FL instruction.

The Center for Applied Linguistics also maintains an extensive collection of printed resources, called Ñandutí, which are available on the web for FL teachers in grades preK-8 (Center for Applied Linguistics, 2007). Ñandutí has a section for teachers interested in the use of technology in the classroom and, through the use of a listserv, also provides a searchable online forum for teachers in which public postings and queries are archived. Teachers often discuss, network, and help each other in technology-related topics for elementary classrooms. The Foreign Language Teaching Forum (FLTEACH) is another listserv for language teachers that archives its postings and makes that information searchable (

National, local, or state conference presentations provide another means of disseminating teacher-led CALL evaluations and classroom-based research results. For example, during the 2006 American Council on the Teaching of Foreign Languages (ACTFL) conference there were a total of 622 presentations listed in the program; only 65 (10.5%) of these presentations were on the topic of technology, and 36 (6%) on research. Out of those 65 presentations on technology, 38 (58%) were intended for a general audience and targeted all applicable levels of instruction, 18 (28%) presentations pertained to higher education, and only 9 (14%) were applicable to PreK-12 contexts. One of these Pre-K12 presentations was on the use of WebQuests for learning French. Similarly, out of the 36 presentations on the topic of research, 3 (8%) focused on action research, all targeting a general audience. On the topic of teacher-led research-based CALL evaluation for early (PreK-6) language learning, there were no presentations explicitly listed in the program.

Journals such as Learning Languages and The Language Educator, whose readers are FL teacher educators and teachers of young students, have sections that review CALL materials. These sections are generally brief descriptions of the materials that may or may not include the reviewer's judgments about the merits of the piece under consideration. Teachers may submit reviews to these journals, but there still needs to be a distinction made between reviews that teachers do independently for classroom use and reviews for publication. In addition, teacher magazines and newsletters publish brief commentaries and reviews on CALL, but that information is not indexed and thus not retrievable by other researchers. Unfortunately, searching a standard database of professional literature, such as ERIC, on the topic of teacher-led action research on CALL evaluation and use in elementary schools (using the search string: "[computer or technology] and action research and evaluation and elementary school and foreign language") retrieves only three records.

Despite these activities, it is difficult to identify and retrieve published research and research reports due to a number of factors. For example, searches in the two listserv archives mentioned above are useful for retrieving queries and postings but do not retrieve full-text materials from research reports, conference papers, or presentations. Various professional organizations and associations may sponsor certain research projects and post reports and information about them on their websites. However, that information may or may not be available from one day to the next because of the ephemeral nature of website content in general.


As for conference proceedings, published reports, and scholarly journal articles, one major difficulty in identifying and accessing published literature on technology training and action research initiatives involves the absence of a centralized repository agency to collect and provide easy access to reports and findings of such research.

Working towards Interpretivism

With the increasing complexity of CALL evaluation, researchers have begun to inject multiple perspectives, context, and interpretation in their evaluation guidelines. Some of the authors and works included in this literature review have taken more ambitious approaches and have attempted to address several of the three elements of interpretivism in combination, sometimes with varying degrees of success.

Curtain and Shinall (1987) propose a program for training teachers in the use, evaluation, and selection of technology for FL learning. The authors stress the importance of multiplicity of voices, and, in theory, context, and interpretation and outline the content to be included in teacher training in CALL evaluation. They use the concept of multiple voices (teachers' and students') and make explicit the importance of teachers developing insight into evaluation and creating their own evaluation criteria. Curtain and Shinall express their concerns about the several roles that teachers are asked to play when using CALL. Teachers sometimes assume the responsibility of both developing and teaching with electronic materials. Thus, according to these authors, it is essential to offer teacher training that includes development of expertise in content, learning theory, lesson design, software evaluation, and an appreciation for computer programming. In the opinion of Curtain and Shinall, training should also prepare teachers to judge, try out the materials in their own context, and test whether their initial impressions change over time. They argue that training in evaluation is important and that the CALL content and the needs of students should be addressed in this type of training. Although the program that Curtain and Shinall propose is ambitious, they finally settle for a series of open-ended questions--or checklist--that lacks an organizational scheme. The questions on the checklist appear to comprise random criteria and a list that could be answered with "yes" or "no" responses. For example, the question "is the material authentic culturally?" (p. 285) can be answered with a single "yes" or "no" without elaboration. Even though the teacher-training program they propose is feasible, their decision to use a checklist is due perhaps to practical issues such as teachers' lack of time. In practice, Curtain and Shinall's evaluation falls short of including the interpretivist elements of contextualization and interpretation.

In addition to the teacher training issues in CALL evaluation and development, researchers have also addressed multiple elements of the interpretivist approach to evaluation. For example, Hubbard (1987, 1988) suggests software evaluation guidelines that include multivocality and contextualization, but he does not consider interpretation in the checklist. On the other hand, working mainly from the single point of view of the instructional designer, Pederson (1987) also proposes guidelines that consider the element of contextualization and the relevant foundations in conducting research on CALL. Pederson maintains that "CALL is highly context-bound and must, therefore, take such variables as learner differences, learning task, and the computer's coding options into account" (p. 100). This may translate into the idea that multiple perspectives must be used to evaluate CALL, a perspective that considers the learner's, the teacher's, and the designer's point of view. In fact, Pederson notes that "the wise language teacher should examine evaluative research reports carefully for clear educational objectives, a specific target audience, and an adequate evaluative consensus from classroom teachers, students and CALL experts" (p. 109). Pederson situates the learner at the center by maintaining the opinion that "because language is processed internally by individuals with many different attitudes, learning styles, and learning preferences, the key learner


variable(s) that are called into play must be considered in research design along with the task(s) and coding element(s)" (p. 115). In essence, Pederson proposes a more comprehensible approach to CALL evaluation because it considers multivocality and contextualization. However, these guidelines are lacking in the area of interpretation because no provision is made to include an interpretation of what happens in classroom contexts when teachers and students interact with each other and with the CALL resources.

Other authors such as Evans and Gibson (1989) have developed CALL evaluation criteria based on elements that the authors used to develop a prototype for a searchable database of FL software for college undergraduate students using a computer laboratory. The criteria were packaged as a template, and graduate students (mainly high school teachers) in a FL technology class conducted the evaluations. This information was ultimately used to provide recommendations for software purchases in the laboratory. The organization scheme provides descriptive information, the content, and the goals of the materials being evaluated. However, critical information such as the perspective of language students was not based on student performance or on any observation of authentic classroom use. Instead, the authors provided a general statement citing "very positive" student satisfaction with the format of the reviews and the appearance of the compilation. Although the overall appraisals and judgments made by graduate students were an important component of the project, classroom use and contextualization of the software were not considered. One interesting aspect of this evaluation procedure is that it included conflicting or contradicting reviews that may account for the multiplicity of voices and opinions in an interpretivist evaluation. However, a holistic evaluation with the additional perspectives of teachers and designers, along with elements of judgment, interpretation, and experimentation, was lacking.

Hamburger (1990) argues that CALL evaluation depends on answers given to questions that address language-learning goals, evaluation standards used to judge CALL, second language and linguistics theoretical perspectives, and whether the evaluation focuses on the whole CALL system or on its parts. The author gives emphasis to the effectiveness of CALL as a way to provide an "impartial assessment of the CALL system" (p. 24). He also stresses the issue of the context in which teachers and students provide a perspective in the design and interact with CALL (i.e., multivocality and contextualization). However, Hamburger does not agree with evaluations that are led by a teacher who is also the designer because he believes that such evaluations contain flaws in their experimental design. In this sense, Hamburger favors a postpositivist view of CALL evaluation that is focused on retention of learning, impartial and external evaluations, and effectiveness of the CALL system. Clearly, the element of interpretation is missing in these guidelines. According to Hamburger, CALL evaluation should focus on the content and expertise of the system, as well as the representation that the system has for student responses in context and on their spontaneous actions. Hamburger concludes that CALL evaluation depends on teaching objectives, theoretical perspectives, roles of the CALL system, and the stages of CALL development.

Goodfellow (1993) has developed guidelines for the development of software for vocabulary learning from the perspective of the "teacher as designer." The author maintains that interest in learning process and language theory "has to some extent replaced the measure of performance as the object of CALL evaluation" (p. 101). Later, Goodfellow (1995) exemplifies the combination of CALL design guidelines and language acquisition models with a program entitled "Storyboard" that "illustrate[s] the complexity of the interrelation between vocabulary knowledge, reading ability, inference and production skills" (p. 210). Goodfellow's example illustrates a CALL evaluation that considers the user and the context but fails to provide an interpretation of the learning experience by limiting the inclusion to the learner's variables to only that of data collected on the learner's linguistic competence in a CALL context and excluding the learner's affective reactions and experiences while learning with CALL.


Komoski and Plotnick (1995) propose a list of seven steps for general software selection that an evaluator should take into account when conducting "responsible software evaluations." Their evaluation was taken from the teacher perspective and considers what is important for students. In the opinion of Komoski and Plotnick, the evaluators should first establish the objectives for the evaluation by asking questions such as why the software is needed and what the evaluators' needs are. After that, the evaluators specify the type of the software that they are looking for, identify the software titles by looking at software databases, and read reviews written by others. The evaluators then decide how the software is going to be relevant to their context and make recommendations of titles for selection. After the software is used by students, the evaluators track the students' performance and rate the software features based on the criteria that the evaluators have developed. The evaluators are encouraged to create written records of recommendations of software to be used in which anecdotal experience from evaluators and students is collected for further analysis and ultimate improvement of the software. Komoski and Plotnick do not intend to impose these guidelines but rather suggest that teachers adapt them. Komoski and Plotnick's "responsible software evaluation" comes close to being an interpretivist evaluation because it includes multivocality (teacher and students, but not designers), contextualization, and interpretation of the evaluator's (the teacher) experiences that incorporate anecdotal information of what happens while students are using the software.

Chapelle (1998) argues against use of simple checklists stating instead that CALL evaluation should be guided by SLA research. Further, Chapelle (2001) considers three important elements as principles of CALL evaluation. First, she insists that CALL evaluation criteria should be built on SLA research. Second, she states that a theory of CALL evaluation is necessary. Third, she proposes that multiple research methods should be used in CALL evaluation. Those methods are guided by the distinction between judgmental and empirical evaluation approaches in which evaluation is a critical part of CALL design and observation of task completion by learners is critical. In addition, Chapelle states that the evaluation criteria and theory should be used to evaluate the CALL materials, the activities that teachers design, and the activities that students engage in.

These more complex approaches to evaluation developed over time and moved away from the idea of a simple checklist to one that includes the perspectives of developers, teachers, researchers, and students; the context in which the materials are used; and the interpretations brought by those who design, use, and evaluate the CALL materials. It is particularly important to evaluate CALL for elementary classrooms where CALL materials must fit the needs of young learners, be child-centric, and enhance FL learning.


For over 20 years, authors have proposed software evaluation guidelines that typically consider just one element (multivocality, contextualization, or interpretation) of the interpretivist evaluation at a time. Generally, those guidelines reflect either the teacher or the developer perspective. Even though authors deem what students are doing with the CALL materials to be an important part in the evaluation guidelines, this review demonstrates that their approaches lack multivocality, contextualization, and interpretation consistent with FL learning in the elementary classroom. When authors consider the teacher's perspective, the evaluation guidelines do not go beyond a descriptive approach; judgments with an interpretation are lacking. If teachers are to assume the role of a developer as Curtain and Shinall (1987) suggest, they may consider an evaluation model that examines the design and development aspects of CALL, not only from the instructional design perspective but also in combination with learning and teaching theories of SLA and FL teaching perspectives.


Authors who have provided CALL evaluation guidelines for teachers may assume that teachers would use them, but those guidelines do not typically elaborate on the ideas or hypotheses that teachers may have and want to test when working with CALL, nor do they consider criteria for the context in which the software is used. Multivocality--the interpretation of teachers' and students' experiences with CALL--is often missing. Postpositivist guidelines do not consider the experiences of teachers and students as a point of departure. Those guidelines disregard the diversity of tasks in CALL, as well as the prior knowledge of teachers and students.

A few authors have considered a multifaceted view of evaluation from the combined perspective of either designer, teacher, or student (Chapelle, 2001; Hamburger, 1990; Hubbard, 1987, 1988; Laurillard, 1991; Pederson, 1987), but their work may not fully address two important interpretivist elements, namely, the adaptability to local contexts (contextualization) and the interpretation of teacher and student experiences with CALL.

The evolution of the guidelines shows the introduction of new themes into the discussion of software evaluation. A major focus of more generic software evaluation research in the 1980s was the study of effectiveness of computer-assisted instruction in learning outcomes to deliver content and instruction (Kulik & Kulik, 1991). The review presented here shows other themes that are complementary to CALL evaluation such as adaptability of general software evaluation guidelines to the specific context of language learning; teacher education; searchable databases for language laboratory use and management of resources; blending of ideas from several disciplines for the design of CALL resources including SLA, language-learning theories, instructional design, and development; and publication and dissemination of findings in the FL teaching profession. In addition, this literature review illustrates how the published guidelines have been valuable in framing the components of CALL evaluation. These guidelines have guided evaluations published in the professional literature for the last 20 years. However, as new paradigms for research on CALL are suggested (Chapelle, 1997, 2001), these guidelines can be improved so that they provide an interpretivist view of the CALL selection and evaluation processes. Thus, more flexible, open-ended ways to conduct CALL evaluation for software selection are needed and, more important, ways based on discovery of selection criteria and reflections on the particular needs and contexts that help teachers make choices of specific CALL materials.

This analysis of the literature on evaluation of CALL resources is located at a time in which FL educators are beginning to take an interpretivist approach to FL teaching (Kohonen, 2001; Reagan, 1999). An interpretivist approach to CALL evaluation would combine issues of instructional design and SLA theory, FL teaching methods, and classroom-based practice with teachers' dialogue and reflection. The shortcomings of previously published guidelines and the use of checklists as an instrument for CALL evaluation illustrate the need for teacher training in CALL evaluation. The reviewed literature also explains how FL teachers can be assisted to make more informed decisions about CALL design and software selection by trying alternative evaluation strategies to see whether those approaches give better results than strategies such as checklists (McDougall & Squires, 1995; 1997).

A brief illustration of the development and evaluation of a CALL activity for elementary first and second grade classrooms is provided here. The designers were high school teachers who adapted and used Dodge's (1995) WebQuest template to develop the Cinderella WebQuest. These teachers had experience in web design and knowledge of computer integration across the curriculum. The teachers, as designers, evaluated the activities and infused K-2 specific content (addition concepts) with reading and writing of fairy tales. An ESL teacher in elementary school considered the Math standards (National Council of Teachers of Mathematics, 2000), the ESL standards (Teaching English to Speakers of Other Languages, 2000), the


student technology standards (International Society for Technology in Education, 2000), and the potential of the Cinderella WebQuest for use in her multiage multicultural classroom context. The ESL teacher adapted the Cinderella WebQuest materials to teach Math and English to young ESL learners in first and second grade. In evaluating the activities, the ESL teacher was guided by Chapelle's (2001) judgmental evaluation criteria for teacher-developed CALL activities. The outcome of this evaluation was to adopt and adapt the WebQuest for the specific language-learning context of ESL elementary school. The ESL teacher used the activity in her classroom, and her students reported their enjoyment, attitudes, relevance of the activity through a self-assessment instrument.

An empirical evaluation was conducted in the classroom and data were collected from students' task performances and individual classroom presentations as evidence of student language learning through negotiation of meaning and problem solving. Finally, a website was developed to share the experience and provide instructions for teachers and students who would be interested in using the WebQuest. A more fully developed process of CALL evaluation from an interpretivist perspective is found in García Villada (2006), and a description of a program in which elementary teachers developed CALL projects for teaching Spanish in elementary schools is provided by Rosenbusch, et al. (2003a, 2003b).

Proposed Multivocality

The idea of bringing the perspectives of those who interact with CALL resources into the evaluation is supported in recent formulations of FL education paradigms that propose interpretivist methodologies for FL teaching and learning (Kohonen, 2001; Reagan, 1999). Traditionally, SLA experts, language laboratory personnel, multimedia developers, and media specialists have been responsible for the task of selecting and evaluating CALL resources, and their perspectives in CALL assessment speak with a monolithic "expert" voice. The limitations of this sort of postpositivist approach to software evaluation, inherent in checklists and other technical classification paradigms (Squires & McDougall, 1996), are linked to the fact that the view of general software evaluation is changing.

For example, Squires and McDougall (1994) have proposed the multiple perspectives interaction paradigm (PIP) which includes the diverse voices of the main actors in software assessment: designer, teacher, and student. PIP is intended for general software evaluation, but it shows promise for CALL evaluation. Evaluations using the PIP approach are an important alternative to the conventional postpositivist approach. As stated by Squires and McDougall (1996),

The Perspectives Interactions Paradigm provides a comprehensive framework for thinking about educational software, and moves away from the (predominantly technical) attributes of educational software packages, and toward more educational issues, such as learning processes, classroom activities, teacher roles, curriculum issues, and student responsibility for learning. This is achieved by generating considerations associated with the interactions between pairs of the perspectives of the teacher, the student(s) and the designer. (p. 155)

The interactions put forward by Squires and McDougall (1996) are one-on-one interactions between teacher and student, teacher and designer, and designer and student. One modification of the PIP evaluation would be to add a three-way conversation, instead of a two-way dialog, among the multiple actors that play a role in the evaluation. In addition,


local, school, district, and national level voices could also be included, as Guskey (2002) has described his evaluation approach to teacher professional development. Critiquing not just the classroom context but also the multilevel wider context (local, state, national) may lead to a multilevel approach to professional development for CALL evaluation. As Curtain and Shinall (1987) pointed out, teachers often assume the tasks of developing and teaching with CALL materials. Thus, FL teachers require training that includes development of expertise in content, learning theory, and CALL design. Training should also prepare FL teachers to judge, try out the materials, and conduct empirical CALL evaluations. For these reasons, Guskey's (2002) suggestions for professional development in evaluation may be useful for FL teacher development.

The need for a bridge between research and practice in CALL evaluation issues has also been identified in this literature review. FL teachers would benefit from collaborating and dialoguing on issues of CALL design and evaluation and also educational technology research theory. The perspectives of designers, who may be interested primarily in technical issues or aesthetics, rather than pedagogical issues and real classroom teaching, are also important. The perspectives of evaluators testing the effectiveness of the material, as well as getting student perspectives, are equally important. Therefore, the need for an interpretivist evaluation method that brings in all these multiple and equally valid perspectives in which the opinions of designers, teachers, and students are juxtaposed is clearly seen.

Proposed Contextualization

Contextualization is considered by Chapelle (1998), Squires and McDougall's (1994) PIP model, Pederson (1987), and Komoski and Plotnick (1995). However, most of the literature does not focus on the K-6 context, and specific guidelines for FLES and CALL have not been documented in the literature (Nutta, Feyton, Norwood, Meros, Yoshii, & Ducker, 2002; Zhao, 2003).

CALL evaluation guidelines do not tell much about K-6 teachers' frameworks or their perspectives on using CALL in their classrooms. Contextualization emphasizes that the focus of the evaluation is not on the instructional material itself, but on its potential use within the context for which it is intended. In this sense, CALL evaluation depends more on the activities developed by teachers--and on what students do with the activity--than on the resource itself. Teachers plan, review, and use materials and activities suitable for their particular situations and students. CALL material that fits well with one teacher's specific situation may not be useful for other teachers even in the same school. In FL learning in the elementary school context, evaluators need to specify the type of FLES program, the type of school, and the type of setting in which the CALL resources are to be used.

In an interpretivist framework, teachers could benefit from developing their own CALL materials. This would help teachers become familiar with options in authoring systems, built-in features of the software, and the software's limitations (O'Neal & Fairweather, 1984; Sussex, 1991; Wildner, 2000). In addition, students often react favorably to the use of CALL materials that are developed by their own teachers (Pederson, 1987; Kreutzer & Neunzig, 1997).

A focus on context cannot originate from outside CALL "experts." Teachers themselves are in the best position to provide the insider's story about the benefits of CALL in their classrooms, and that information can be shared with designers, researchers, and other teachers. However, teachers are busy, and the lack of time in their already busy schedules makes it difficult to participate in classroom-based research and CALL design and evaluation.


Proposed Interpretation

The third element of interpretivist evaluation comes from the tradition of hermeneutics. As pointed out in this review, interpretation is frequently overlooked by those who have proposed CALL evaluation guidelines. In his work on educational evaluation, Eisner (1976) referred to interpretation as a component of his "connoisseurship" model. According to Eisner, evaluators should possess "refined taste" and be sensitive to educational phenomena. Eisner's model proposes that in order to carry out an educational evaluation, the evaluator would need to critique, interpret, and evaluate the phenomena under investigation.

Similarly, in his discussion about the paradigm debate in educational research, Gage (1989) envisioned an interpretivist approach in which teachers become actively involved in research on teaching, generate their own findings, reflect deeply on their observations, and share their ideas with other teachers. It follows then that an important component of the interpretivist evaluation is that teachers keep reflective journals of the evaluation results as a way of documenting their opinions and experiences. If evaluators submit their interpretations as narratives, they would tell the story of how CALL was used, the considerations given to pedagogy, and the way the lesson was taught. They could also address questions such as: Was the CALL material used effectively? What were the students' experiences? Are students learning? If so, what outcomes were observed? What acceptable evidence of CALL use do they demonstrate? Did students interact with each other during the CALL activity? What were the teachers' impressions and observations about the use of CALL with their students?

An interpretivist framework suggests the telling of the story, relating what happens in context, and incorporating student and teacher voices in the descriptions of CALL use at the elementary school. However, narrations, critiques, and reflections on classroom experiences are not addressed in guidelines for evaluating CALL. One possible explanation is that they are part of a qualitative approach to inquiry that has not been the tradition in CALL evaluation. The postpositivist empirical tradition has emphasized objectivity over subjectivity and the voices of experts over those of practicing teachers.

Overview of the Proposed Framework

The discussion above offers a basis for three elements of an evaluation frame for CALL specifically appropriate for elementary education. The three elements are multivocality, contextualization, and interpretation. It is apparent that the multiple voices from the authors who design, use, and evaluate CALL resources, the support required from teachers to develop and evaluate CALL, and the context in which those materials and activities are used are essential elements of an interpretivist CALL evaluation. In addition, an interpretation that involves the telling of the story of the use of CALL in the classroom including teachers' and students' perspectives constitutes a fundamental element of the framework.


Over time, the evaluation of CALL materials has developed to reflect the complexity of the FL classroom into which the use and evaluation of CALL must be integrated. The literature shows that single-sided approaches have been commonly used in the evaluation of CALL materials for FL education, especially in the earlier stages. These approaches have resulted in evaluations that focus on the effectiveness of CALL for learning and promote the use of checklists that lead to generalizations that are not always applicable to specific contexts. An interpretivist approach provides guidance to developers and teachers who are interested in developing, using, or evaluating CALL.


This paper has analyzed the literature and presented a multivocal, contextual, and interpretive framework for future CALL evaluation. A review of the literature in the field of CALL evaluation revealed that examples of interpretivist evaluations are lacking. Interpretivism emphasizes issues such as biases in evaluation, classroom-based research, familiarization with research on CALL evaluation, awareness of the limitations of single-sided approaches, and formulation of teacher-developed evaluation criteria.

Interpretivist CALL evaluation is a conversation among the multiple actors that play a role in the evaluation process. The perspective of the FL teachers would enrich the CALL design and evaluation process by providing the classroom view that is relevant to SLA researchers, teacher educators, and CALL designers. The perspectives of designers would expand from including technical issues of design and aesthetics to pedagogical aspects of real classroom contexts. The perspectives of evaluators testing the effectiveness of CALL materials and that of the students are equally important.

In interpretivist CALL evaluation, teachers are in the best position to share their own stories about the benefits of CALL for their students. Teachers plan, review, and use materials and activities suitable for their particular situations and students. The focus of an interpretivist evaluation is not simply on the CALL resource itself, but on the potential use of the CALL material within the context for which it is intended. Thus, interpretivist CALL evaluation depends more on the activities developed by teachers and what students do with the activity than on the resource itself.

A final component of an interpretivist framework proposes that teachers critique, interpret, and evaluate their experiences with CALL materials. Such an approach would also involve teachers doing research on teaching, plus generating their own findings, reflecting deeply on their observations, and sharing their ideas with other teachers. It is indeed challenging for elementary FL teachers to get involved as CALL designers or to offer their reflections on their CALL practice. However, teachers receive support from local, state, and national FL teacher organizations to infuse technology at the K-12 level. As detailed earlier, when elementary FL teachers engage in technology use and action research, it is often quite difficult to identify and access any published research or research reports describing their activities and what was learned. The availability of a centralized repository agency to collect and provide easy access to reports and findings of such research can have a positive effect on the growth and dissemination of teachers' knowledge.

As we have seen, the goal is that the use of this integrated approach in FL teacher development would facilitate the inclusion of teachers' and students' perspectives and experiences working with CALL into the current educational research on CALL evaluation and strengthen teacher education programs. This approach would also strengthen the connection between research and practice in FL education. Further research is required to turn these recommendations into an approach that can be used in K-6 FL teacher education.


Burston, J. (2003). Software selection: A primer on source and evaluation. CALICO Journal, 21, 29-40. Retrieved December 1, 2008, from

CALICO. (1983). Wanted: Courseware reviewers and reviews. CALICO Journal, 1, 53-54. Retrieved December 1, 2008, from

CALICO. (2001). Software evaluation outline. Retrieved October 22, 2008, from


CARLA. (1998). Language software database search. Center for Advanced Research on Language Acquisition (CARLA) and the College of Liberal Arts Language Center at the University of Minnesota, Minneapolis. Retrieved October 22, 2008, from

Center for Applied Linguistics. (2007). Ñandutí Foreign Language Learning Grades PreK-8. Available from

Chapelle, C. (1990). The discourse of computer-assisted language learning: Toward a context for descriptive research. TESOL Quarterly, 24, 199-225.

Chapelle, C. (1997). CALL in the year 2000: Still in search of research paradigms. Language Learning & Technology, 1, 19-43. Retrieved April 30, 2002, from

Chapelle, C. (1998). Multimedia CALL: Lessons to be learned from research on instructed SLA. Language Learning & Technology, 2, 22-34. Retrieved April 30, 2002, from

Chapelle, C. (2001). Computer applications in second language acquisition: Foundations for teaching, testing and research. Cambridge: Cambridge University Press.

Curtain, C. O., & Shinall, S. L. (1987). Teacher training for CALL and its implications. In W. F. Smith (Ed.), Modern media in foreign language education (pp. 255-285). Lincolnwood, IL: National Textbook Company.

Curtain, H., & Dahlberg, C. (2004). Languages and children, making the match: New languages for young learners (3rd ed.). Boston: Pearson/A and B.

Decoo, W. (1984). An application of didactic criteria to courseware evaluation. CALICO Journal, 2, 42-46. Retrieved December 1, 2008, from

Dodge, B. (1995). WebQuests: A technique for Internet-based learning. Distance Educator, 1, 10-13.

Donato, R., & Hartman, D. (2002). Action research in foreign language education: 2002 Summer Institutes at Iowa State University; June 27-July 3, 2002. Available from

Eisner, E. (1976). Educational connoisseurship and criticism: Their form and functions in educational evaluation. Journal of Aesthetic Education, 10, 135-150.

Evans, G., & Gibson, A. (1989). What's in it for me?: Evaluating software and video. Journal of Educational Techniques and Technologies, Winter (1989-1990), 54-58.

Ford-Guerrera, R. (1997). Technology & the elementary foreign language classroom. (ERIC Document Reproduction Service No. ED410750)

Freeman, D. & Johnson, K. (1998). Reconceptualizing the knowledge-base of language teacher education. TESOL Quarterly, 32(3), 397-417.

García Villada, E. (2006). Technology integration for teaching and learning Spanish in elementary schools: Voices of designers, teachers, and students. Unpublished doctoral dissertation, Iowa State University, Ames.

Gage, N. (1989). The paradigm wars and their aftermath: A "historical" sketch of research on teaching since 1989. Educational Researcher, 18, 4-10.

Goodfellow, R. (1993). CALL for vocabulary: Requirements, theory and design. Computer Assisted Language Learning, 6, 99-122.

Goodfellow, R. (1995). A review of the types of CALL programs for vocabulary instruction. Computer Assisted Language Learning, 8, 205-226.

Gruba, P. (2004). Computer assisted language learning (CALL). In A. Davis & C. Elder (Eds.), The handbook of applied linguistics (pp. 623-648). Malden, MA: Blackwell.


Guba, E. (Ed.). (1990). The paradigm dialog. Newbury Park, CA: Sage Publications.

Guskey, T. (2002). Does it make a difference?: Evaluating professional development. Educational Leadership, 59(6), 45-51.

Hamburger, H. (1990). Evaluation of L2 systems, learners and theory. Computer Assisted Language Learning, 1, 19-27.

Hamerstrom, H., Lipton, G., & Suter, S. (1985). Computers in the foreign language classroom: No longer a question. CALICO Journal, 3, 19-21, 48. Retrieved December 1, 2008, from

Hertz, R. (1984). A software evaluation guide for the language arts. CALICO Journal, 1, 21-23. Retrieved December 1, 2008, from

Hubbard, P. (1987). Language teaching approaches, the evaluation of CALL software, and design implications. In W. F. Smith (Ed.), Modern media in foreign language education: Theory and implementation (pp. 227-254). Lincolnwood, IL: National Textbook Company.

Hubbard, P. (1988). An integrated framework for CALL courseware evaluation. CALICO Journal, 6, 51-72. Retrieved December 1, 2008, from

International Society for Technology in Education (ISTE). (2000). Technology foundations standards for students. Eugene, OR: Author.

Kennedy, B. (1988). Adult versus child L2 acquisition: An information processing approach. Language Learning, 38, 477-496.

Kelly, M. G., & McAnear, A. (Eds.). (2003). National educational technology standards for teachers: Preparing teachers to use technology. Eugene, OR: International Society for Technology in Education.

Kohonen, V. (2001). Towards experiential foreign language education. In C. N. Candlin (Ed.), Experiential learning in foreign language education (pp. 8-60). Harlow, NY: Longman.

Komoski, K., & Plotnick, E. (1995). Seven steps to responsible software selection. ERIC Digest. (ERIC Document Reproduction Service No. ED382157)

Kreutzer, M., & Neunzig, W. (1997). Computer assisted learning teacher training methodology and evaluation of a seminar for language teachers. CALICO Journal, 14, 65-79. Retrieved December 1, 2008, from

Kulik, J., & Kulik, C. (1991). Effectiveness of computer-based instruction: An updated analysis. Computers in Human Behavior, 7, 75-94.

Larsen-Freeman, D., & Long, M. H. (1991). Introduction to second language acquisition research. New York: Longman.

Laurillard, D. (1991). Principles for computer-based software design for language learning. Computer Assisted Language Learning, 4, 141-152.

Levy, M. (1997). Computer-assisted language learning: Context and conceptualization. Oxford, UK: Clarendon Press.

Lillie, D. L., Hannum, W. H., & Stuck, G. B. (1989). Computers and effective instruction: Using computers and software in the classroom. New York: Longman.

Lincoln, Y. (1995). In search of students' voices. Theory into Practice, 34, 88-93.

Liu, M., Moore, Z., Graham, L., & Lee, S. (2003). A look at the research on computer-based technology use in second language learning: A review of the literature from 1990-2000. Journal of Research on Technology in Education, 34, 250-273.

McDougall, A., & Squires, D. (1995). A critical examination of the checklist approach in software selection. Journal of Educational Computing Research, 12, 263-274.


McDougall, A., & Squires, D. (1997). A framework for reviewing teacher professional development programmes in information technology. Journal of Information Technology for Teacher Education, 6, 115-126.

National Council of Teachers of Mathematics (NCTM). (2000). Principles and standards for school mathematics: Problem-solving standards preK-2. Retrieved April 30, 2002, from

National Standards in Foreign Language Education Project. (1996). Standards for foreign language learning: Preparing for the 21st century. Lawrence, KS: Allen Press.

Nutta, J., Feyton, C., Norwood, A., Meros, J., Yoshii, M., & Ducker, J. (2002). Exploring new frontiers: What do computers contribute to teaching foreign languages in elementary school? Foreign Language Annals, 35, 293-306.

O'Neal, F., & Fairweather, P. (1984). The evaluation and selection of computer-based training authoring systems. CALICO Journal, 2, 27-30, 46. Retrieved December 1, 2008, from

Owston, R., & Dudley-Marling, C. (1986). A criterion-based approach to software evaluation. Journal of Research on Computing in Education, 20, 234-244.

Oxford, R. (1998). Uses of advanced technology for early language learning. In M. Met (Ed.), Critical issues in early second language learning (pp. 137-147). Glenview, IL: Scott Foresman-Addison Wesley.

Pederson, K. M. (1987). Research on CALL. In W. F. Smith (Ed.), Modern media in foreign language education: Theory and implementation, (pp. 99-131). Lincolnwood, IL: National Textbook Company.

Phillips, M. (1986). Approaches to the design of software language teaching. CALICO Journal, 3, 37-38, 48. Retrieved December 1, 2008, from

Reagan, T. (1999). Constructivist epistemology and second/foreign language pedagogy. Foreign Language Annals, 32, 413-425.

Rosenbusch, M., García Villada, E., & Padgitt, J. (2003a). IN-VISION project evaluation: The impact of one year of language study on K-5 students. In K. H. Cárdenas & M. Klein (Eds.), Traditional values and contemporary perspectives in language teaching: Selected papers from the 2003 Central States Conference (pp. 149-165). Valdosta, GA: Lee Bradley.

Rosenbusch, M., García Villada, E., & Padgitt, J. (2003b). The attitudes of classroom teachers toward early language learning. NECTFL Review, 5(2), 21-28.

Rosenbusch, M., Kemis, M., & Moran, K. (2000). Changing practice: Impact of a national institute on foreign language teacher preparation for the K-6 level of instruction. Foreign Language Annals, 33, 305-319.

Squires, D., & McDougall, A. (1994). Choosing and using educational software: A teachers' guide. London: The Falmer Press.

Squires, D., & McDougall, A. (1996). Software evaluation: A situated approach. Journal of Computer Assisted Learning, 12, 146-161.

Strei, G. (1983). Format for the evaluation of courseware used in computer-assisted language instruction (CALI). CALICO Journal, 1, 43-46. Retrieved December 1, 2008, from

Susser, B. (2001). A defense of checklists for courseware evaluation. ReCALL, 13, 261-276.

Sussex, R. (1991). Author languages, authoring systems, and their relation to the changing focus of computer-aided language learning. System, 19, 15-27.

Taylor, R. (1985). Microcomputer courseware evaluation sources. ERIC Digest. (ERIC Document Reproduction Service No. ED270102)


Teaching English to Speakers of Other Languages (TESOL). (2000). ESL standards for grades PreK-3. Retrieved April 30, 2002, from

Thompson, I. (1999). Foreign language multimedia software. NFLRC, University of Hawaii. Retrieved October 22, 2008, from

Treadwell, M. (1999). 1001 of the best Internet sites for educators K-college. (ERIC Document Reproduction Service No. ED429560)

White, L., & Genesse, F. (1996). How native is near-native? The issue of ultimate attainment in adult second language acquisition. Second Language Research, 12, 233-265.

Widdowson, H. (1993). Innovation in teacher development. Annual Review of Applied Linguistics, 13, 260-275.

Wildner, S. (2000). Technology integration into preservice foreign language teacher education programs. CALICO Journal, 17, 223-250. Retrieved December 1, 2008, from

Willis, J., Jost, M., & Nilakanta, R. (2002). Qualitative research methods for education and instructional technology. Greenwich, CT: Information Age.

Zhao, Y. (2003). Recent developments in technology and language learning: A literature review and meta-analysis. CALICO Journal, 21, 7-27. Retrieved December 1, 2008, from


I am grateful to many persons who provided information, critiques, and support during the writing of earlier drafts of this paper. Special thanks to Professors Helen Ewald and Jerry Willis. I wish to express my sincere appreciation to Professors Carol Chapelle, Niki Davis, Marcia Rosenbusch, Ann Thompson, and Donna Merkley. I am grateful to the anonymous reviewers of my work who helped me better capture the elements of an interpretivist CALL evaluation framework for foreign languages in the elementary school.


Eduardo García Villada (Ph.D., Iowa State University) is Assistant Professor of Second Language Acquisition in the Drake University Language Acquisition Program (DULAP). He has 12 years of experience as a college Spanish language instructor, 3 years as an elementary school Spanish teacher, and is a certified Spanish Oral Proficiency Interview tester by the American Council on the Teaching of Foreign Languages. His research focus is on the interplay between CALL, Spanish language proficiency, and student experiences and attitudes toward learning Spanish language and Latino cultures through CALL materials.


Eduardo García Villada, Ph.D.

Drake University

Language Acquisition Program

217 Meredith Hall

2507 University Avenue

Des Moines, IA 50311

Phone: 515 271 4505

Fax: 515 271 1870