What Language Model to Train if You Have One Million GPU Hours?

Teven Le Scao, Thomas Wang, Daniel Hesslow, Lucile Saulnier, Stas Bekman, M Saiful Bari, Stella Biderman, Hady Elsahar, Jason Phang, Ofir Press, Colin Raffel, Victor Sanh, Sheng Shen, Lintang Sutawika, Jaesung Tae, Zheng Xin Yong, Julien Launay, Iz Beltagy

October 1, 2022

Paper

Abstract

We investigate the question of what language model to train given a fixed compute budget of one million GPU hours. We explore scaling laws for training LLMs under constrained resources, providing practical guidance for large-scale model training decisions.

Type

Conference paper

Publication

Findings of the Conference on Empirical Methods in Natural Language Processing