What Language Model to Train if You Have One Million GPU Hours?

Abstract

We investigate the question of what language model to train given a fixed compute budget of one million GPU hours. We explore scaling laws for training LLMs under constrained resources, providing practical guidance for large-scale model training decisions.

Publication
Findings of the Conference on Empirical Methods in Natural Language Processing