This talk goes though the pretraining objectives of seq2seq architecture. It also discusses, how mBART is different from pretraining of XLM and it’s derivatives?