Zahra Ashktorab - Open Source Projects

EvalAssist

Active IBM Research Python

EvalAssist is an application that simplifies using large language models as evaluators (LLM-as-a-Judge) of the output of other large language models by supporting users in iteratively refining evaluation criteria in a web-based user experience.

Key Features

Flexible evaluation methods for direct assessment and pairwise comparison

AI-assisted criteria refinement with synthetic data generation

Built-in trustworthiness metrics including positional bias analysis

Scalable toolkit built on Unitxt evaluation library

Integration with diverse LLM judges including IBM Granite Guardian, Llama 3, Mixtral, and GPT-4

Test case catalog with community contribution support

Live Demo GitHub Repository

Publications

10+

Team Members

2024-2025

Active Development

More Projects Coming Soon

I'm actively working on additional open source projects that will be shared here. Stay tuned for updates!