This talk summarizes the paper Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context. It assumes that audience are already familier with Attention Is All You Need paper and also discuss some high level concepts of it.
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Attention Is All You Need
@NTU, Singapore, Intern’20,21,22 Amazon Web Inc. (@awscloud), T0, BLOOMZ, UXLA, xCodeEval, I train LLM at SDAIA! - Scaling Maximalist, Training lead and Core maintainer of ALLaM.