language-model

mBART: pretraining seq2seq architecture

This talk summarizes the paper [`mBART`](https://arxiv.org/abs/2001.08210) and some pretraining concepts.

Transforme-XL

This talk summarizes the paper [`Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context`](https://arxiv.org/abs/1901.02860). It assumes that audience are already familier with [`Attention Is All You Need`](https://arxiv.org/abs/1706.03762) paper and also discuss some high level concepts of it.