A Systematic Survey and Critical Review on Evaluating Large Language Models: Challenges, Limitations, and Recommendations

Abstract

This paper presents a systematic survey and critical review on evaluating large language models, covering challenges, limitations, and recommendations for more rigorous evaluation practices in the field.

Publication
Preprint