
Applied AI
Sciences
AI Evaluation & Applied AI Sciences
Honestly, it can be hard to keep up with AI developments. At BroadMetrics, that's our job. We are here to mentor you in your AI journey, from a psychometric and measurement perspective.
Our specialty is in the application of AI Psychometrics. AI Psychometrics is about developing methods to measure and understand the capabilities, biases, and ethical considerations of AI models. Evaluating AI from the lens of psychometrics ensures a principled validation process that surpasses the basics of traditional machine learning metrics. Evaluating recall and precision are great preliminary steps, but with the high stakes involved with many applications, it's just not enough anymore.
In addition to evaluating AI with established batteries of assessments (IQ test, for example), there are new constructs to measure: trust, safety, clarity, accuracy. To measure these constructs we design rating scales or likert scales, which are then applied by humans. Yes, we need humans to make sure our AI is working as intended! We recruit and train human raters to apply the rating scales for evaluation. This is particularly important in the evaluation of free response outputs, for example when the AI is giving text summaries as feedback or suggestions.
Do you simply need data for training your LLM? We can help with that! Our expertise in ensuring a high quality rating process can reduce the error in human labeling and annotation. ​We might partner with annotation companies but will add a psychometric flair to the process. ​​​​
Human Rater Systems for Evaluation
Design and management of expert rating systems to evaluate AI
AI Validation Audits & Studies
Use validity framework from the field of psychometrics and assessment to comprehensively evaluate your AI
Generative AI Research
Conduct research comparing LLMs, prompting techniques, and more.
LLM Evaluation
Statistical evaluation of AI, using state-of-the-art metrics
​​
​​​
LLM-as-a-Judge
Use of LLM with rating scales and rubrics to evaluate LLM outputs
​​​
AI Construct Definitions
Identification and definition of AI-relevant constructs ​
​
Human Labeling Data
Collection of human annotations to create training data for LLMs
​
AI Evaluation in Assessments
Comprehensive evaluation of AI usage in assessment (of humans) along with validity argumentation
AI Assessments
Identification of relevant established standardized assessments to evaluate AI​

Example Use Cases
1
Establish System to Collect Expert Human Annotations
Suppose you want to fine-tune an LLM to perform a specific task, but you don’t have any data to do it. Don’t just collect data, optimize your fine-tuning with high quality human annotation data. Work with us to develop a principled system to recruit subject matter experts, evaluate their knowledge to qualify them, develop training materials and benchmarks, and monitor their annotations to ensure they will be useful for training purposes. We will conduct the development work and manage the implementation if needed.
​
2
Develop Likert Scales to Measure Relevant AI Constructs for Evaluation
Suppose you are an AI developer, evaluating generative AI responses in a healthcare context. The AI generates open-ended suggestions to improve health outcomes. In order to ensure that the suggestions are valid, safe, unbiased, and accurate, you need to get human feedback, but don’t have a way to structure the feedback. Consult with us to design a series of scales for your human experts to use to map their evaluations of the AI outputs to numeric scales. We will help define the relevant constructs and develop items to evaluate them to ensure that they yield appropriate measurements of the outputs.
3
Proposal to Integrate AI into Certification Testing Program
Suppose you are the leader of a small certification company who wants to make the leap to incorporate AI into your assessment pipeline. Consult with us to conduct a thorough review of your testing program to determine where you can integrate new AI applications or refine existing applications.
We will review all of your systems starting with item development to score reporting to derive a plan for your future.
​
​
​
4
Presentation to School District on AI Literacy and AI in the Classroom
Suppose you are a visionary faculty member in your school district and want to educate your faculty and staff on the basics of AI, how AI might be used in the classroom, how students might use AI, AI literacy, and pain points to watch out for. We will prepare a workshop for your staff that will prepare them for the future of their classroom and serve as an ongoing consultant to the district as we all navigate the changing landscape of education together.
​
​