This talk summarizes the paper [`mixup`](https://arxiv.org/abs/1710.09412), [`MixMatch`](https://arxiv.org/abs/1905.02249), [`DivideMix`](https://openreview.net/pdf?id=HJgExaVtwr), [`UDA`](https://arxiv.org/abs/1904.12848).
This talk summarizes the paper [`Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context`](https://arxiv.org/abs/1901.02860). It assumes that audience are already familier with [`Attention Is All You Need`](https://arxiv.org/abs/1706.03762) paper and also discuss some high level concepts of it.