Measurement Model

Grounded in the Rasch Model

Numerical ratings of performance-based assessments are of the ordinal level of measurement. Once a critical mass of data for each of the assessments has been obtained, the ordinal-level raw ratings can be transformed into interval-level scaled scores through a scale calibration process that is grounded in the Rasch model. This conversion improves the accuracy of scores and removes measurement noise that, in turn, improves sensitivity to intervention-induced changes and correlations with other variables. Rasch analysis also assesses the validity of the instrument, particularly if the scale items target the full spectrum of the overall trait being measured. It therefore provides a valuable means to assess the characteristics of an instrument designed to measure a wide range of abilities.

The Rasch model assumes that the probability of a child being rated at a specific performance level for any item is a logistic function of the relative distance between the item level of difficulty and the student’s level of ability. Consequently, it is anticipated that the probability of endorsing a particular rating category will increase monotonically with the difference between the person’s level of difficulty in performing the task and the level of difficulty required for the task. Where the scale data meet the Rasch model expectations, the ordinal raw score is transformed into an interval scale. Among a number of advantages, normally distributed interval-level measurement allows for the use of parametric analysis of data.

The measurement model used in the analysis will be selected from one of three models grounded in the Rasch model. The composite model combines all items within one unidimensional model. A consecutive model would employ three independent unidimensional models. A multidimensional model would integrate the three dependent domains within one model. Four critical aspects of performance-based assessments must be considered prior to the selection of the final measurement model from amongst the three proposed models.

  1. Rating process. The student ratings are obtained through a student performance-teacher rating process. The raw score ratings assigned by teachers adhere to an ordinal level of measurement. The measurement model should address the ordinal-level teacher ratings.
  2. Level of measurement. Some items may be more difficult than others. In order to account for this, a weighting scheme should be developed to account for the item difficulty. The ordinal-level raw score ratings further limit the ability to perform basic arithmetic operations on the groups of scores. The measurement model should allow for the establishment of an interval-level scale for the assessment.
  3. Varied rating levels. The number of rating options within the items can range from j-k throughout the instrument. The measurement model should allow a varying number of rating levels per item.
  4. Multidimensionality. If knowledge domains are related and performance in one domain affects performance in the others, the measurement model should allow for the inclusion of multiple domains into one multidimensional model.

MRCMLM

Assessments that satisfy the above criteria are ideal candidates for the application of the multidimensional random coefficients multinomial logit model or MRCMLM (Adams, Wilson, & Wang, 1997). The approach integrates the Partial Credit Model (Masters, 1982) and is applied when multiple dimensions are present within a single overarching construct. MRCMLM is grounded in the 1-parameter logistic (1PL) IRT model, commonly referred to as the Rasch model (Rasch, 1960).

A marginal maximum likelihood estimation with a Monte Carlo sampling technique for the multiple dimensions is utilized. Parameter estimates for the measurement model will be obtained using the ConQuest 4.5 modeling software (Wu, Adams, and Wilson, 1997).

The multidimensional approach has been used to examine a wide range of constructs such as management characteristics of teachers and principals (Fox, 2002), basic science competency and teacher personality (Wang, Chen, & Cheng, 2004), fifth grade writing ability (Yao & Schwarz, 2006), mathematics problem-solving (Wu, & Adams, 2006), hospital anxiety and depression (Pallant & Tennant, 2007), and mathematics achievement (Ackerman, Gierl, & Walker, 2003).