Shows multitask multilingual generalization in language model.
The crystallization of modeling methods around the Transformer architecture has been a boon for practitioners. Simple, well-motivated architectural variations that transfer across tasks and scale, increasing the impact of modeling research. However, …
Over 2,000 prompts for roughly 170 datasets are available through PromptSource framework.
T0 shows zero-shot task generalization on English natural language prompts, outperforming GPT-3 on many tasks, while being 16x smaller!
We propose a trasductive approach for few shot cross-lingual classification.
We propose AugVic, a data augmentation framework for sequence to sequence model (i.e. NMT) using Language Model.
We propose UXLA, a novel data augmentation framework for self-supervised learning in zero-resource transfer learning scenarios.
This talk summarizes the paper [`mBART`](https://arxiv.org/abs/2001.08210) and some pretraining concepts.
This talk summarizes the paper [`Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context`](https://arxiv.org/abs/1901.02860). It assumes that audience are already familier with [`Attention Is All You Need`](https://arxiv.org/abs/1706.03762) paper and also discuss some high level concepts of it.