Ali Safaya & Taner Sezer: Turkish Data Depository Project: Towards a Unified Turkish NLP Res. Plat.
The talk given by Ali Safaya & Taner Sezer at KUIS AI Talks on Mar 22 in 2022.
Turkish Data Depository Project: Towards a Unified Turkish NLP Research Platform [supported by KUIS AI and Türkiye Açık Kaynak Platformu (TAKP)]
Title: Turkish Data Depository Project: Towards a Unified Turkish NLP Research Platform
Abstract:
The Turkish language has been left out of the state-of-the-art Natural Language Processing due to a lack of organized research communities. The lack of organized platforms makes it hard for foreign and junior researchers to contribute to Turkish NLP. We present the Turkish Data Depository () project as a remedy for this. The main goal of TDD subprojects is collecting and organizing Turkish Natural Language Processing (NLP) datasets and providing a research basis for Turkish NLP. In this talk, I will present the results of our ongoing efforts to build TDD. I will go over our recently published user-friendly hub for Turkish NLP datasets (). Moreover, I will present our recently accepted ACL’22 paper on Mukayese (), a benchmarking platform for various Turkish NLP tools and tasks, ranging from Spell-checking to Natural Language Understanding tasks (NLU).
Short Bios:
Ali Safaya is an AI Fellow researcher at KUIS AI Center and a Computer Science PhD Student at Koç University. Ali Safaya has graduated Honour from Computer Engineering in Sakarya University in 2019. His research interests includes Natural Language Understanding, Reading Comprehension. Additionally, he has been working on developing general NLP tools for Turkish and Arabic languages.
Taner Sezer is a lecturer at the linguistics department at Mersin University, the director of the TS Corpus project and a Ph.D. student at Hacettepe University. He is responsible for the development of the TDD Corpus, the text tools and their web interfaces and API’s. He contributes with resources to the progress of the TDD projects.