Abstract:
Quality control of judges� rating behaviors is directly related to the validity of examinees� scores in a performance assessment. The purpose of the present study was twofold: (a) to compare two different estimation techniques under Rasch and nonlinear mixed modeling perspectives; and (b) to compare judge severity estimates under two different scoring designs. SAS NLMIXED and FACETS software packages were used to evaluate the accuracy of the two estimation techniques. The judge scoring design of a live English proficiency test was one of the designs under investigation in this study. Results indicated that the two analytical methods performed comparably in estimating the true values of judge severity. On the other hand, the spiral design of the two judge scoring strategies performed with an acceptable degree of accuracy whereas the true values of the model effects including judge severity were substantially compromised in the nested design. The present study illustrated an example of effective ways to strategize a judge scoring design and to estimate the true values of judge severity in performance testing.