This talk goes though the pretraining objectives of seq2seq architecture. It also discusses, how mBART is different from pretraining of XLM and it’s derivatives?
Computer Science enthusiast working in deep learning, natural language processing.