JACIII Vol.20 No.6 pp. 941-960
doi: 10.20965/jaciii.2016.p0941


From Linguistic to Conceptual: A Framework Based on a Pipeline for Building Ontologies from Texts

Ali Benafia*1, Smaine Mazouzi*2, Ramdane Maamri*3, Zaidi Sahnoun*3, and Sara Benafia*4

*1University of Hadj Lakhdar (University Batna 1)
05 Avenue Chahid Boukhlouf, Batna 05000, Algeria

*2University of Skikda
BP 26, ElHadaik Road, Skikda 21000, Algeria
*3University of Constantine 2 – Abdelhamid Mehri
Constantine, Algeria
*4Enaf of Batna
Foresterie, Route de Tazoult, Batna, Algeria

October 2, 2015
August 4, 2016
Online released:
November 20, 2016
November 20, 2016
ontology, information extraction, text analysis, similarity measure, linguistic processing

This paper presents a novel approach to extract information for building ontologies for an extensive range of applications from corpora. Our goal is to propose a method that is independent of domains and based on a distributional analysis of semantic units to bring out all the candidate’s informative elements (concepts, entities, semantic relations, named entities etc.). This method is based on a pipeline of four main stages allows for the extraction of information from unstructured text in the form of a suite of decomposable representations (sentences in triplets, ‘argumental structure’ etc.) until a consistent final ontology is obtained. We applied the defined pipeline a repeated sampling of 100 articles randomly drawn from a text corpus (‘Le Monde’ of annual version ‘2013’). The evaluation results of the trial implementation of our system level of accuracy to be up to 74%. The results obtained indicate that the proposed methodology is quite generic and can be easily adapted to any new domain.

