Semi-Supervised Pattern Based Algorithm for Arabic Relation Extraction

Published in IEEE 28th International Conference on Tools with Artificial Intelligence (ICTAI), 2016

Recommended citation: Injy Sarhan, Yasser El-Sonbaty, Mohamed Abou Elnasr, “A Semi-Supervised Pattern-Based Algorithm for Arabic Relation Extraction”, 28th IEEE International Conference on Tools with Artificial Intelligence, San Jose, California, USA. (2016, Nov.).

While several relation extraction algorithms have been developed in the past decade, mainly in the English language, only few researchers target the Arabic language owing to its complexity and rich morphology. This paper proposes a semi-supervised pattern-based bootstrapping technique to extract Arabic semantic relation that lies between entities. In order to enhance the performance to suit the morphologically rich Arabic language, stemming, semantic expansion using synonyms, and an automatic scoring technique to measure the reliability of the generated patterns and extracted relations were used. To further improve performance, a dependency parser was then used to omit negative relations. The proposed system was tested by applying it to two corpora, which differ in both size and genre, scoring a highest F-measure of 75.06%. Furthermore, the effect of adding stemming and synonyms was also experimentally tested. The results show that this bootstrapping methodology achieves higher performance than existing state-of-the-art methods, and can be expanded to include more relations for use in various NLP tasks.

Download paper here.