Abstract: Like many other historical languages, Classical Arabic is hindered by the absence of adequate training datasets and accurate “off-the-shelf” models that can be readily used in processing pipelines. In this paper, we discuss our ongoing work to develop and train deep learning models specially designed to manage various tasks related to classical Arabic texts. We specifically concentrate on Named Entity Recognition, classification of person relationships, toponym classification, detection of onomastic section boundaries, onomastic element classification, as well as date recognition and classification. Our efforts aim to confront the difficulties tied to these tasks and to deliver effective solutions for analyzing classical Arabic texts. Though this work is still under development, the preliminary results presented in the paper suggest excellent to satisfactory performance of the fine-tuned models, successfully achieving the intended objectives for which they were trained.
- Information on the event: https://www.ancientnlp.com/alp2023/;
- The proceedings can be downloaded from here: https://www.ancientnlp.com/alp2023/accepted_papers/;
BibTeX citation:
@inproceedings{eis1600_enchancingNLPmodels_2023,
title = "Enhancing State-of-the-Art NLP Models for Classical Arabic",
author = "Tariq Yousef and Lisa Mischer and Hakimi, {Hamid Reza} and Maxim Romanov",
year = "2023",
language = "English",
isbn = "978-954-452-087-8",
pages = "160--169",
booktitle = "Ancient Language Processing Workshop",
}
References
Yousef, Tariq, Lisa Mischer, Hamid Reza Hakimi, and Maxim Romanov. “Enhancing State-of-the-Art NLP Models for Classical Arabic.” In Ancient Language Processing Workshop, 160–69, 2023.