In our last NISS Affiliate Update we provided an overview of the type of research that NISS Research Associates are involved in. We are following this up with a more indepth and technical viewof this work. Our first installment here is from NISS Research Fellow Ya Mo.
The Impact of Writing Prompts on English Language-Learners’ Writing Performance
One of my research interests is to study predictors of students’ performance and items’ functioning in large-scale assessments using psychometric measures and quantitative methods. In one of our current papers “The impact of writing prompts on English language learners’ writing performance,” with colleagues from Boise State University and the National Institute of Statistical Sciences, we examined 2007 and 2011 National Assessment of Educational Progress (NAEP) data for features of writing prompts that might have impacted English Language Learners (ELLs) and non-ELLs differently using a combination of differential item functioning (DIF) and textual analysis.
Because all DIF analyses needed to be conditioned on students’ ability in parametric or nonparametric ways, the five plausible values, which were random draws from students’ individual estimated ability distribution in NAEP data, became less ideal but one of the few possible options. We compared the results from using the five plausible values as conditioning variables and those from using the sum scores of the two prompts that students wrote in response to and the item response theory (IRT) person ability estimates using the generalized partial credit model from the two prompt scores; except for a couple of cases, the results were consistent. Thus, students’ plausible values were used as the conditioning variable for the subsequent analyses.
We used two methods—Standardized Mean Difference (SMD) and logistic regression with residual analyses – to detect DIF in the prompts. SMD required that students’ observed scores, or in this case, students’ plausible abilities, were categorized into groups, and the means of ELL and non-ELLs’ essay scores were compared within that group.
The logistic regression method employed in this study was the proportional odds-ratio model. We used a three-step modeling process based on logistic regression (Zumbo, 1999) as a main method of analysis, along with a residual-based procedure similar to Breland et al.’s (2004) residual analyses design. We dichotomized the polytomous essay scores into five binary variables according to the cumulative logit dichotomization scheme. The five dichotomized essay variables were simultaneously regressed on one of the plausible values for each examinee, the ELL group indicator, and the ability-by-group interaction variable in a step-wise fashion. After the nested models were fitted to the data, we compared the model-data fit (chi-square statistics) and the size of R square coefficients.
Meanwhile, we calculated the expected essay scores for each examinee through ordinal logistic regression using only the plausible value and calculated the residual scores by subtracting the expected essay score from their observed essay score, separately for both the ELL and non-ELL groups. We then computed the residual-based effect sizes by dividing the mean residual score difference between ELL and non-ELL groups by the pooled standard deviation of the residual scores for both groups to gauge the amount of the group difference.
Aligned with previous research on writing assessment, we created a coding taxonomy that examined the rhetorical specificity, cohesion, word use, and cultural accommodation of the prompts. Two researchers (with expertise in language development and literacy instruction) coded the prompts with the prompt coding taxonomy independently and provided qualitative feedback about the prompts describing whether or not they felt the prompts might favor a particular group of students.
We then used Coh-Metrix 3.0 (McNamara et al., 2014) to compare and contrast the wording and text structures of the eleven prompts. The Coh-Metrix’s detailed technical analysis result based on the 108 measures complemented the experts’ judgment by providing information on the number and length of words, sentences, and paragraphs, statistics for cohesion analysis, a detailed index of frequencies of different parts of speech used, a “Flesch Reading Ease” score, and “Coh-Metrix L2 Readability”.
Based on the DIF and textual analysis, the prompts identified as functioning differently for ELLs and non-ELLs varied in syntactic simplicity, word concreteness, connectivity, and coherence. The results suggest that providing ELLs and non-ELLs with a fair opportunity to respond to writing prompts involves the words in prompts being concrete and specific; the sentences containing different forms of coreference and all sorts of connectives; the text being coherent through anaphor references. In addition, specific rhetorical properties should be identified explicitly and an adequate culturally relevant context and explanation of cultural phenomena should be provided for students.