top of page

research
News
&
Events

Untitled design.png

GETTING STARTED WITH LLM EVALUATION:

A PRIMER FOR PSYCHOMETRICIANS

WORKSHOP

Research Papers & Reports

Publications that synthesize our team’s long-standing and emerging research in assessment, psychometric methods, and AI evaluation.

Pink Triangular Pattern

Validity arguments for constructed response scoring using generative AI applications.

Casabianca, J. M., McCaffrey, D. F., Johnson, M. S., Alper, N., & Zubenko, V. (2025).

Pink Triangular Pattern

Measuring the accuracy of true score predictions for AI scoring evaluation.

McCaffrey, D. F., Casabianca, J. M., & Johnson, M. S. (2025).

Pink Triangular Pattern

The rise of artificial intelligence in educational measurement: Opportunities and ethical challenges.

Bulut, O., Beiting-Parrish, M, Casabianca, J. M., Slater, S. C., Jiao, H., Song, D., Ormerod, C. M., Fabiyi, D. G., Ivan, R., Walsh, C., Rios, O., Wilson, J., N., S., Wongvorachan, T., Liu, J. X., Tan, B., & Morilova, P. (2024).

Pink Triangular Pattern

Empirical Bayes estimation for evaluating subgroup biases in artificial intelligence scoring.

Kwon, S. McCaffrey, D. F., Jewsbury, P. & Casabianca, J. M. (2025).

Pink Triangular Pattern

Best practices for constructed-response scoring.

McCaffrey, D. F., Casabianca, J. M., Ricker-Pedley, K., Lawless, R., & Wendler, C. (2022).

Recent & Upcoming Events

Events showcasing our team’s recent contributions and scheduled engagements across conferences, workshops, and invited sessions.

Title
Event Type
Meeting
Location
Date
Can AI generated rationale provide evidence that AI scores are valid?
Paper
Artificial Intelligence in Measurement and Education (AIME) Conference
Pittsburgh, PA, USA
28/10/2025
Getting Started with LLM Evaluation: A Primer for Psychometricians
Workshop
Artificial Intelligence in Measurement and Education (AIME) Conference
Pittsburgh, PA, USA
27/10/2025
Evaluating Rationales: A Comparative Study of LLMs and Human Raters in Assessing Language Learners’ Essays
Paper
National Council for Measurement in Education
Denver, CO, USA
26/04/2025
Validity Evidence for Use and Interpretation of Scores from Generative AI
Paper
National Council for Measurement in Education
Denver, CO, USA
25/04/2025
The Where, What, and How of the Job Market for Measurement Professionals
Panel Discussion
National Council for Measurement in Education
Denver, CO, USA
24/04/2025
Best Practices for AI Scoring
Workshop
National Council for Measurement in Education
Denver, CO, USA
23/04/2025
Best Practices for AI Scoring of Constructed Responses
Workshop
International Association for Educational Assessment
Philadelphia, PA, USA
22/09/2024
bottom of page