top of page
Landscape image of a robot over a bright pale orange background..jpg

Applied AI

Sciences

AI Evaluation & Applied AI Sciences

Honestly, it can be hard to keep up with AI developments. At BroadMetrics, that's our job.  We are here to mentor you in your AI journey, from a psychometric and measurement perspective.

 

Our specialty is in the application of AI Psychometrics. AI Psychometrics is about developing methods to measure and understand the capabilities, biases, and ethical considerations of AI models. Evaluating AI from the lens of psychometrics ensures a principled validation process that surpasses the basics of traditional machine learning metrics.  Evaluating recall and precision are great preliminary steps, but with the high stakes involved with many applications, it's just not enough anymore.

 

In addition to evaluating AI with established batteries of assessments (IQ test, for example), there are new constructs to measure:  trust, safety, clarity, accuracy. To measure these constructs we design rating scales or likert scales, which are then applied by humans. Yes, we need humans to make sure our AI is working as intended! We recruit and train human raters to apply the rating scales for evaluation.  This is particularly important in the evaluation of free response outputs, for example when the AI is giving text summaries as feedback or suggestions.

 

Do you simply need data for training your LLM? We can help with that! Our expertise in ensuring a high quality rating process can reduce the error in human labeling and annotation. ​We might partner with annotation companies but will add a psychometric flair to the process.  â€‹â€‹â€‹â€‹

Human Rater Systems for Evaluation

Design and management of expert rating systems to evaluate AI

AI Validation Audits & Studies

Use validity framework from the field of psychometrics and assessment to comprehensively evaluate your AI

Generative AI Research

Conduct research comparing LLMs, prompting techniques, and more.

LLM Evaluation

Statistical evaluation of AI, using state-of-the-art metrics

​​

​​​

LLM-as-a-Judge

Use of LLM with rating scales and rubrics to evaluate LLM outputs

​​​

AI Construct Definitions

Identification and definition of AI-relevant constructs ​

​

Human Labeling Data

Collection of human annotations to create training data for LLMs

​

AI Evaluation in Assessments

Comprehensive evaluation of AI usage in assessment (of humans) along with validity argumentation

AI Assessments

Identification of relevant established standardized assessments to evaluate AI​ 

Text Reflection on Glass

Example Use Cases

1

Establish System to Collect Expert Human Annotations

Suppose you want to fine-tune an LLM to perform a specific task, but you don’t have any data to do it. Don’t just collect data, optimize your fine-tuning with high quality human annotation data. Work with us to develop a principled system to recruit subject matter experts, evaluate their knowledge to qualify them, develop training materials and benchmarks, and monitor their annotations to ensure they will be useful for training purposes. We will conduct the development work and manage the implementation if needed.

​

2

Develop Likert Scales to Measure Relevant AI Constructs for Evaluation

Suppose you are an AI developer, evaluating generative AI responses in a healthcare context. The AI generates open-ended suggestions to improve health outcomes. In order to ensure that the suggestions are valid, safe, unbiased, and accurate, you need to get human feedback, but don’t have a way to structure the feedback. Consult with us to design a series of scales for your human experts to use to map their evaluations of the AI outputs to numeric scales. We will help define the relevant constructs and develop items to evaluate them to ensure that they yield appropriate measurements of the outputs.  

3

Proposal to Integrate AI into Certification Testing Program

Suppose you are the leader of a small certification company who wants to make the leap to incorporate AI into your assessment pipeline. Consult with us to conduct a thorough review of your testing program to determine where you can integrate new AI applications or refine existing applications. 

 

We will review all of your systems starting with item development to score reporting to derive a plan for your future.

​

​

​

4

Presentation to School District on AI Literacy and AI in the Classroom

Suppose you are a visionary faculty member in your school district and want to educate your faculty and staff on the basics of AI, how AI might be used in the classroom, how students might use AI, AI literacy, and pain points to watch out for. We will prepare a workshop for your staff that will prepare them for the future of their classroom and serve as an ongoing consultant to the district as we all navigate the changing landscape of education together.

​

​

 

BroadMetrics Logo with a human brain and robotic brain

Assessment | AI | Analytics​

BroadMetrics is a psychometric consultancy dedicated to creating solutions to a broad array of assessment problems. In addition to the application of psychometrics to typical assessment scenarios, we apply measurement science principles to the applied AI sciences, evaluating AI and generating quality human labeling data.

100 Menlo Park Drive, Ste 101

Edison, NJ 08818​

​

  • LinkedIn

© 2025 BroadMetrics, LLC

bottom of page