Test Development

The ELTiS evaluates students’ ability to comprehend academic English, which is defined as English used in secondary school classrooms in the United States in the service of learning. The scores can therefore be used to evaluate prospective international exchange students’ ability to understand spoken and written language in secondary classrooms in the United States. These scores may also be used to place students in English language development programs and to evaluate their proficiency upon completing such programs.

It is important to note that the ELTiS is not a measure of achievement in content area learning such as mathematics or history, or of a student’s ability to deal with the various social and psychological factors associated with student exchange programs. The scores provide information about a student’s ability to comprehend English in school contexts.

Rigorous Test Development
The ELTiS is based on a comprehensive research program that helped identify the English language demands in United States classrooms. The program started with a literature review to identify published research related to English learners studying in English-medium content area classrooms and to identify areas needing further study.

In addition, the team organized an Advisory Board, comprised of leading scholars in the fields of second language acquisition, language testing, education, and psychometrics. The members of the Advisory Board played a critical role in shaping and directing the research effort.

An extensive needs analysis was conducted, which focused on the following five research strands:

Strand 1: Analysis of Selected State English Language Proficiency (ELP) and Academic Content Standards.
Conducted in 2003-2005, this research focused on the types of language use requirements that standards in each academic content area pose for the students, as well as the types of language learning objectives students are expected to achieve in English as a Second Language (ESL) classes. The standards came from 13 states, including California, Florida, Illinois, New York, North Carolina, and Texas.
Strand 2: Analysis of Textbooks Used in Mainstream Content Classes.
Under the direction of the Ballard & Tighe assessment team, experienced teachers and researchers analyzed textbooks to understand the types of written materials ELLs have to use and comprehend in mainstream content classes. The analysis included 96 chapters from textbooks being used at various grade levels in U.S. schools in 2003-2004 to teach language arts/reading, science, math, and social studies.
Strand 3: Analysis of Classroom Videos and Live Classroom Observations.
Researchers investigated content area classroom interaction using 57 classroom videos and 16 live observations to find out the kinds of tasks that students encounter and the skills they need to participate in classroom activities. The classrooms were located in 17 states nationwide. The observation focused on classroom interaction between the teacher and the students as well as interaction among students in pair and group work.
Strand 4: Analysis of Statewide Achievement Tests.
Researchers studied and analyzed statewide achievement tests to understand the language demands students encounter on the tests.
Strand 5: Feedback from Teacher Focus Groups.

Ballard & Tighe researchers conducted focus groups with both mainstream and ESL/bilingual teachers to cross-validate the results of the other four strands of research and fill in any gaps. The data encompassed over 20 hours of group discussion with 100 educators from Arizona, Colorado, Georgia, Illinois, and Texas. There were two focus groups of content area teachers in the areas of language arts/reading, math, science, and social studies and four groups of ESL/bilingual teachers.


The data from the five strands of research provided the basis for the ELTiS test blueprint and item specifications. Once the specifications had been drafted, they were submitted to the Advisory Board and to a group of experienced educators for review and revised on the basis of their comments. The finalized specifications guided item writing by teachers and testing specialists, as well as review rounds by another group of teachers and testing specialists. In subsequent rounds of stringent reviews, the items were reviewed by testing experts for technical appropriateness, by current classroom teachers across the United States for grade-level appropriateness, and also by a select panel of educators who screened the items for bias and cultural/language sensitivity. Draft test forms were then constructed for field testing with English learners and native English speakers from 17 states in the United States. The items that met both content criteria and the psychometric quality criteria for a good standardized test were then assembled into pilot test forms, which were field tested to finalize and evaluate the tests, gather psychometric data, and to finalize the procedures for operational test administration.