scaling-laws

What Language Model to Train if You Have One Million GPU Hours?

Investigating scaling laws and practical guidance for training language models under constrained compute budgets.