tokenization

Stop Taking Tokenizers for Granted: They Are Core Design Decisions in Large Language Models

Reframing tokenization as a core modeling decision in LLMs rather than a preprocessing step, arguing for context-aware tokenizer and model co-design.

Beyond Fertility: STRR as a Metric for Multilingual Tokenization Evaluation

A new metric for multilingual tokenization evaluation that goes beyond fertility-based measures.