ZeroSumEval: An Extensible Framework for Scaling LLM Evaluation with Inter-Model Competition

Abstract

We present ZeroSumEval, an extensible framework for scaling LLM evaluation through inter-model competition. The framework enables systematic comparison of language models through competitive evaluation protocols.

Publication
Proceedings of the Systems and Demonstration of 63rd Annual Meeting of the Association for Computational Linguistics