This thesis investigates transfer learning approaches for language model adaptation, focusing on multilingual generalization and scalable training dynamics for large language models.