M Saiful Bari

Applied Scientist

Amazon AGI

Biography

M Saiful Bari (Maruf) is working on Multilingual Capability for Nova Series Models at Amazon AGI. Earlier before joining Amazon, he was the Training Lead and one of the Core Maintainers of ALLaM, a sovereign foundational model for English and Arabic language technologies. His research focuses on understanding and advancing large language models, particularly in the areas of scaling, training dynamics, and systematic evaluation of frontier (-> superintelligence) models. He investigates these aspects through three primary areas: (1) scaling behaviors and training dynamics of large language models in terms of data and truthfulness, (2) developing robust evaluation methodologies for frontier class (-> superintelligence) models with scalable-oversight, and (3) exploring efficient learning paradigms through transfer learning at scale. This research combines theoretical frameworks with large-scale empirical studies, leading to both methodological innovations and practical applications.

Interests

Artificial Intelligence
Deep Learning
Natural Language Processing

Large Language Models (LLMs)
Multi-lingual NLP (Machine Translation, Cross-lingual tasks)
NLP for Programming
LLM Safety/Alignment Research

News

2025

May	Joined Amazon AGI-F to work on `Multilingual Capability` of Nova series models.
May	Moved to SF bay area. Roaming around `Sunnyvale.`
April	Left the ALLaM team (😭) at NCAI, SDAIA to join the Nova team at Amazon AGI-F. The ALLaM team has since moved to the newly formed Humain, which has raised a `$100B sovereign investment` to disrupt `GenAI` in the Middle East.
March	Presenting ALLaM at the ICLR 2025, See you in Singapore.
April	ZeroSumEval accepted as ACL'2025 as Demo. Project moved to FAIR.
March	Checkout the fully automated agentic evaluation paper: ZeroSumEval: An Extensible Framework For Scaling LLM Evaluation with Inter-Model Competition
January	Checkout the new HLE benchmark. [paper] [data]

2024

25 December	Received MIT "Innovators Under 35" Award.
17 October	One paper has been accepted at EMNLP 2024 on evaluation. See you in Miami.
13 October	Our paper on scaling evaluation (`scalable oversight`) for frontier-class models has just dropped. Feel free to reach out to me for details.
10 October	We announced ALLaM during the keynote of the Global AI Summit as one of the main priorities for sovereign AI in the Kingdom of Saudi Arabia. We also released the model and evaluation details for the 34B pretraining from scratch model. Keynote [YouTube Link].
22 July	We released our technical paper on ALLaM. Feel free to reach out to me if you have any queries.
22 May	My lab's alignment work was recently released. ALLaM was revealed at IBM Think Keynote [Youtube Link]. ALLaM is a nationwide LLM effort of Saudi Arabia. Paper coming soon ...
15 May	Our paper on xCodeEval has been accepted at ACL 2024. Unfortunately, I won't be traveling to Thailand :(.
16 May	Two paper accepted at ACL'24. See you at Bangkok.

2023

6 Aug	Joined SDAIA as a Senior Resarch Scientist
7 May	Three paper accepted at ACL'23. See you at Toronto.
15 March	Check out our recent pre-print xCodeEval: A Large Scale Multilingual Multitask Benchmark for Code Understanding, Generation, Translation and Retrieval paper.

2022

20 Dec	Check out our recent pre-print SPT: Semi-Parametric Prompt Tuning for Multitask Prompted Learning paper.
18 Dec	Check out our recent pre-print BLOOM+1: Adding Language Support to BLOOM for Zero-Shot Prompting paper.
26 Nov	Returned from Amazon d2l summer internship.
3 Nov	Check out our recent pre-print Crosslingual Generalization through Multitask Finetuning paper.
1-Oct	Our paper (What Language Model to Train if You Have One Million GPU Hours?) got accepted at EMNLP'22 findings.
05-Jul	Joined Amazon d2l team as a summer intern.
11-Apr	Check out our new ACL'22 workshop paper What Language Model to Train if You Have One Million GPU Hours?.
7-Feb	Check out our pre-print PromptSource paper.
2-Feb	Check out my recent talk on T0++ paper.

2021

19-Nov	Our paper T0++ got accepted as a spotlight paper at ICLR'22
15-Oct	Our paper Multitask Prompted Training Enables Zero-Shot Task Generalization is online. Model (T0++) , Dataset (P3)
22-Sept	Check out my Recent talk on Finetuned Language Models Are Zero-Shot Learners at NTU-NLP lab.
25-Aug	My internship work from Amazon got accepted as a short paper in EMNLP.
3-May	Two paper UXLA (in main conference) and AugVic (in findings) got accepted in ACL-IJCNLP-2021 .

2020

6-Nov	Get back to the PhD study after completing my internship with Amazon Lex Team.
15-Oct	Our paper LNMap is accepted in EMNLP-2020.
03-Aug	I will be starting my internship at Amazon, with Lex Team.
28-Aug	Passed my PhD Qualifying Examination (QE).
28-Apr	Preprint of our new paper LNMap released in Arxiv.
24-Apr	Preprint of our new paper MultiMix released in Arxiv.
25-Mar	Gave a talk on mBART (paper ) in NTU-NLP.
13-Feb	Presenting our Cross-lingual-NER paper on AAAI-2020.
4-Jan	Got AAAI-2020 travel scholarship. See you in New York.

2019

9-Nov	I will be a Teaching Assistant, for Graduate Deep Learning Course of NLP , NTU
11-Nov	A paper accepted on AAAI 2020.
15-Jun	A paper accepted on ACL 2019.
25-Jan	Shows MT system to industry and government stakeholders.
15-Jan	Joins NTU-NLP lab as a PhD student.

Experience

April 27, 2025 – Present

San Francisco Bay Area

Applied Scientist

Amazon AGI

Currently:
- I work on Multilingual Training (both pretraining and post-trianing) of Nova series model

My current reasearch interest is:
- Anatomy of Pretraining (see: ALLaM, BLOOM [1] [2] )
- Alignment of LLMs. (see: T0, BLOOMZ, SPT)
- Robust evaluation of frontier models. (see: xCodeEval , ChatGPTEval [1] [2] [3])

August 6, 2023 – April 17, 2023

Saudi Arabia

Senior Research Scientist

National Center for Artificial Intelligence (NCAI), SDAIA

Training and Aligning Large Language Model. During my time here at NCAI, SDAIA,
- I developed the core technology behind ALLaM.
- I was the Core maintainer of ALLaM.
- I lead the Training Team of ALLaM.
For the work of ALLaM, I received MIT TR35 (Innovator Under 35) award.

July 5, 2022 – October 10, 2022

Santa Clara, CA

Applied Scientist Intern

Amazon Development Center U.S., Inc.

SPT: Semi-Parametric Prompt Tuning for Multitask Prompted Learning.

July 1, 2021 – January 28, 2022

East Palo Alto, CA (WFH)

Applied Scientist Intern (Part Time)

Lex Team, Amazon Web Services, Inc.

Cross-lingual Transfer learning under low resource and low parameter scenario.

August 3, 2020 – November 6, 2020

East Palo Alto, CA (WFH)

Applied Scientist Intern

Lex Team, Amazon Web Services, Inc.

Doing research on Cross-lingual Lex Bot.

September 15, 2018 – January 6, 2019

Bangladesh

Software Engineering Intern

Aubichol IT Limited

Worked on an early startup where the key responsibility was to design the architecture of sports analytic and Translation System.

September 15, 2017 – August 27, 2018

Singapore

Research Assistant

Nanyang Technological University

Doing Research on Natural Language Processing.
Responsibilities include:
* Research on MT, NER and Adversarial Training.
* Maintain internal GPU servers
* Maintain group web-site

Recent & Upcoming Talks

The ChatGPT moment: The past, current and future (potential) of LLMs

The talk summarizes how ChatGPT’s release marked a turning point in Generative AI, driven by refined integration of traditional …

Aug 18, 2024 9:00 AM — 10:00 AM Amazon Web Service

Pathways to semi-(un)supervised* NLP Brain

This talk discusses the evolving field of transfer learning, from LSTMs to large language models, and shows new direction on the …

May 1, 2023 12:30 PM — 12:30 PM Cohere for AI

M Saiful Bari

PDF

Multitask Prompted Training Enables Zero-Shot Task Generalization

A talk on T0++ paper.

Feb 7, 2022 4:30 PM — 5:30 PM MICL Lab, Singapore

M Saiful Bari

PDF

See all talks

Publications

Quickly discover relevant content by filtering publications

A Systematic Study and Comprehensive Evaluation of ChatGPT on Benchmark Datasets

The paper comprehensively evaluates ChatGPT’s performance on various academic tasks, covering 140 tasks across diverse fields, …

Md Tahmid Rahman Laskar, M Saiful Bari, Mizanur Rahman, Md Amran Hossen Bhuiyan, Shafiq Joty, Jimmy Xiangji Huang

PDF Paper Code

xCodeEval: A Large Scale Multilingual Multitask Benchmark for Code Understanding, Generation, Translation and Retrieval

We introduce xCodeEval, the largest executable multilingual multitask benchmark to date consisting of 25M document-level coding …

Mohammad Abdullah Matin Khan, M Saiful Bari, Xuan Long Do, Weishi Wang, Md Rizwan Parvez, Shafiq Joty

Paper Code huggingface

See all publications