TurkishMMLU: Measuring Massive Multitask Language Understanding in Turkish
Published in Findings of EMNLP 2024, 2024
TurkishMMLU introduces a multilingual benchmark to evaluate LLMs on diverse and challenging tasks in Turkish, based on a localized version of the MMLU benchmark. The dataset spans 57 subjects across STEM, humanities, social sciences, and professional fields, targeting higher-order reasoning and domain knowledge.
We analyze the performance of multilingual and Turkish-specific LLMs, highlighting critical gaps in reasoning, translation quality, and culturally relevant content. TurkishMMLU provides a valuable tool for both model evaluation and instruction-tuning in underrepresented languages.
The dataset and code are publicly available for the research community to extend and apply in multilingual NLP research.
Recommended citation: Yüksel, A., Köksal, A., Şenel, L. K., Korhonen, A., & Schütze, H. (2024). "TurkishMMLU: Measuring Massive Multitask Language Understanding in Turkish." *Findings of the Association for Computational Linguistics: EMNLP 2024*.
Download Paper
