evaluation | SBMARUF

A Systematic Study and Comprehensive Evaluation of ChatGPT on Benchmark Datasets

The paper comprehensively evaluates ChatGPT's performance on various academic tasks, covering 140 tasks across diverse fields, highlighting strengths and weaknesses, and introducing a new ability to follow multi-query instructions, ultimately paving the way for practical applications of ChatGPT-like models.