SBMARUF
Home
Experience
Projects
Talk
Publications
CV
News
Blog
Awards
tokenization
Stop Taking Tokenizers for Granted: They Are Core Design Decisions in Large Language Models
Reframing tokenization as a core modeling decision in LLMs rather than a preprocessing step, arguing for context-aware tokenizer and model co-design.
Beyond Fertility: STRR as a Metric for Multilingual Tokenization Evaluation
A new metric for multilingual tokenization evaluation that goes beyond fertility-based measures.
Cite
×