TurkishMMLU: Measuring Massive Multitask Language Understanding in Turkish

Published in Findings of EMNLP 2024, 2024

TurkishMMLU introduces a multilingual benchmark to evaluate LLMs on diverse and challenging tasks in Turkish, based on a localized version of the MMLU benchmark. The dataset spans 57 subjects across STEM, humanities, social sciences, and professional fields, targeting higher-order reasoning and domain knowledge.

We analyze the performance of multilingual and Turkish-specific LLMs, highlighting critical gaps in reasoning, translation quality, and culturally relevant content. TurkishMMLU provides a valuable tool for both model evaluation and instruction-tuning in underrepresented languages.

The dataset and code are publicly available for the research community to extend and apply in multilingual NLP research.

Recommended citation: Yüksel, A., Köksal, A., Şenel, L. K., Korhonen, A., & Schütze, H. (2024). "TurkishMMLU: Measuring Massive Multitask Language Understanding in Turkish." *Findings of the Association for Computational Linguistics: EMNLP 2024*.
Download Paper

Share on

Bluesky Facebook LinkedIn X (formerly Twitter)

Arda Yüksel

Share on