language-model

Multitask Prompted Training Enables Zero-Shot Task Generalization

Shows multitask multilingual generalization in language model.

What Language Model to Train if You Have One Million GPU Hours?

The crystallization of modeling methods around the Transformer architecture has been a boon for practitioners. Simple, well-motivated architectural variations that transfer across tasks and scale, increasing the impact of modeling research. However, …

PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts

Over 2,000 prompts for roughly 170 datasets are available through PromptSource framework.

Multitask Prompted Training Enables Zero-Shot Task Generalization

T0 shows zero-shot task generalization on English natural language prompts, outperforming GPT-3 on many tasks, while being 16x smaller!

Nearest Neighbour Few-Shot Learning for Cross-lingual Classification

We propose a trasductive approach for few shot cross-lingual classification.

AugVic: Exploiting BiText Vicinity for Low-Resource NMT

We propose AugVic, a data augmentation framework for sequence to sequence model (i.e. NMT) using Language Model.

UXLA: A Robust Unsupervised Data Augmentation Framework for Zero-Resouce Cross-Lingual NLP

We propose UXLA, a novel data augmentation framework for self-supervised learning in zero-resource transfer learning scenarios.

mBART: pretraining seq2seq architecture

This talk summarizes the paper [`mBART`](https://arxiv.org/abs/2001.08210) and some pretraining concepts.

Transforme-XL

This talk summarizes the paper [`Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context`](https://arxiv.org/abs/1901.02860). It assumes that audience are already familier with [`Attention Is All You Need`](https://arxiv.org/abs/1706.03762) paper and also discuss some high level concepts of it.