JACIII Vol.20 No.6 pp. 941-960
doi: 10.20965/jaciii.2016.p0941


From Linguistic to Conceptual: A Framework Based on a Pipeline for Building Ontologies from Texts

Ali Benafia*1, Smaine Mazouzi*2, Ramdane Maamri*3, Zaidi Sahnoun*3, and Sara Benafia*4

*1University of Hadj Lakhdar (University Batna 1)
05 Avenue Chahid Boukhlouf, Batna 05000, Algeria

*2University of Skikda
BP 26, ElHadaik Road, Skikda 21000, Algeria
*3University of Constantine 2 – Abdelhamid Mehri
Constantine, Algeria
*4Enaf of Batna
Foresterie, Route de Tazoult, Batna, Algeria

October 2, 2015
August 4, 2016
Online released:
November 20, 2016
November 20, 2016
ontology, information extraction, text analysis, similarity measure, linguistic processing

This paper presents a novel approach to extract information for building ontologies for an extensive range of applications from corpora. Our goal is to propose a method that is independent of domains and based on a distributional analysis of semantic units to bring out all the candidate’s informative elements (concepts, entities, semantic relations, named entities etc.). This method is based on a pipeline of four main stages allows for the extraction of information from unstructured text in the form of a suite of decomposable representations (sentences in triplets, ‘argumental structure’ etc.) until a consistent final ontology is obtained. We applied the defined pipeline a repeated sampling of 100 articles randomly drawn from a text corpus (‘Le Monde’ of annual version ‘2013’). The evaluation results of the trial implementation of our system level of accuracy to be up to 74%. The results obtained indicate that the proposed methodology is quite generic and can be easily adapted to any new domain.

  1. [1] A. Faatz and R. Steinmetz, “Ontology enrichment with texts from the www,” Semantic Web Mining 2nd Workshop at ECML/PKDD, Helsinki, Finland, 2002.
  2. [2] S. Aussenac-Gilles, S. Despres, and S. Szulman, “The terminae method and platform for ontology engineering from texts,” Proc. of the Conf. on Ontology Learning and Population: Bridging the Gap between Text and Knowledge, 2008.
  3. [3] A. Auger and C. Barrière, Pattern based approaches to semantic relation extraction: a state-of-the-art Terminology, Vol.14, No.1, pp. 1-19, 2008.
  4. [4] D. Palmer, Text Pre-processing, Handbook of Natural Language Processing, Second Edition, CRC Press, Taylor and Francis, 2010.
  5. [5] T. S. Tollari, A. Rosni, and T. Enyakong, “Extending Ontology Tree Using NLP Technique,” Proc. of National Conf. on Research & Development in Computer Science REDECS, 2001.
  6. [6] M. Uschold and M. Gruninger, “Ontologies: principles, methods and applications,” Knowledge Engineering Review, Vol.11, No.2, pp. 93-155, 1996.
  7. [7] N. Aussenac-Gilles and M.-P. Jacques, Variabilité des performances des outils de TAL et genre textuel. Cas des patrons lexico-syntaxiques, Dans: Traitement Automatique des Langues, Numéro spécial Non Thématique, Paris : Hermès Sciences, Vol.47, No.2, 2006.
  8. [8] T. Hasegawa, S. Sekine, and R. Grishman, “Discovering Relations among Named Entities from Large Corpora,” Proc. of ACL, 2004.
  9. [9] W. Liu, A. Weichselbraun, A. Scharl, and E. Chang, “Semi-Automatic Ontology Extension Using Spreading Activation,” J. of Universal Knowledge Management, Vol.1, pp. 50-58, 2005.
  10. [10] C. Welty and J. W. Murdock, Towards Knowledge Acquisition from Information Extraction Book, The semantic Web: LISWC 2006, LNCS 4273; pp. 709-722, Springer-Verlag Berlin Heidelberg, 2006.
  11. [11] F. M. Suchanek, M. Sozio, and G. Weikum, “Sofie: a self-organizing framework for information extraction,” Proc. of the 18th Int. Conf. on World Wide Web, pp. 631-640, ACM, 2009.
  12. [12] S. Chakravarthi, Document Ontology Extractor, Applied research in Computer science, Fall, 2001.
  13. [13] K. Tatane, B. Er-raha, S. Mouhim, and C. Cherkaoui, Semi-Automatic Enrichment Approach of ‘Domain Ontology’ by using TALN Tools, Int. J. of Innovative Research in Computer and Communication Engineering (An ISO 3297: 2007 Certified Organization) Vol.1, Issue 10, 2013.
  14. [14] S. Paumier, De la reconnaissance de formes linguistiques à l’analyse syntaxique, Thèse de +doctorat, Université de Marne-la-Vallée, 2003.
  15. [15] L. Romary and E. de la Clergerie, “International standard for a linguistic annotation framework,” Natural Language Engineering, Vol.10, No.3/4, pp. 211-225, 2004.
  16. [16] J.-F. Berroyer, ‘TagEN, un analyseur d”entités nommées : conception, développement et évaluation, Mémoire de D.E.A. d’Intelligence Artificielle, Université Paris-Nord, 2004.
  17. [17] A. Benafia, R. Maamri, and Z. Sahnoun, “An Indexing Approach based on a Hybrid Model of Terminology-extraction using a Filtering by Elimination Terms,” J. of Advances in Information Technology, Vol.4, No.1, pp. 28-39, doi:10.4304/jait.4.1.28-39, 2013.
  18. [18] S. Benafia, Apport du langage naturel dans l’indexation des images Mémoire de master -Université de Biskra, 2012.
  19. [19] C. Fabre and D. Bourigault, Linguistic clues for corpus-based acquisition of lexical dependencies, 2001 Conf., UCREL Technical Papers, Vol.13, Lancaster University, pp. 176-184, 2001.
  20. [20] G. Brown and G. Yule, Discourse analysis, Cambridge: Cambridge University Press.
  21. [21] S. Tyler, The said and the unsaid. mind, meaning, and culture, New York, San Francisco, London: Academic Press, 1983.
  22. [22] C. Welty and J. W. Murdock, Towards Knowledge Acquisition from Information Extraction Book, The semantic Web: LISWC 2006, LNCS 4273; pp. 709-722, Springer-Verlag Berlin Heidelberg, 2006.
  23. [23] W. Dong, M. Charikar, and K. Li, Asymmetric distance estimation with sketches for similarity search in high-dimensional spaces, SIGIR, 2008.
  24. [24] B. Sagot and D. Fiser, Construction d’un WordNet libre du Francais à partir de ressources multilingues, In TALN, Toulon, 2008.
  25. [25] V. I. Levenshtein, Binary codes capable of correcting deletions, insertions, and reversals, Doklady Akademii Nauk SSSR, Vol.163, No.4, pp. 845-848, 1965 (Russian), English translation in Soviet Physics Doklady, Vol.10, No.8, pp. 707-710, 1965.
  26. [26] C. Fillmore, Frame semantics, In the linguistic Society of Korea (Ed.), Linguistic in the morning calm, pp. 111-137 Seoul: Hanshin Publishing Co., 1982.
  27. [27] P. Hanks, Terminology, Phraseology, and Lexicography, In A. Dykstra and T. Schoonheim (eds.), Proc. of the XIV Euralex Int. Congress, 6–10 July, Leeuwarden, x.Ljouwert: Fryske Akademy / Afuk, 2010.
  28. [28] A. Alonso, C. Millon, and G. Williams, “Collocational networks and their application to an E-Advanced Learner’s Dictionary of Verbs in Science (DicSci),” Proc. of eLex 2011, pp. 12-22, 2011.
  29. [29] A. Benafia, Une nouvelle approche de modélisation des systèmes d’information, Thèse de magistère Université de constantine, 1995,
  30. [30] A. Maedche and S. Staab, “Measuring similarity between ontologies,” Proc. of European Knowledge Acquisition workshop (EKAW), Springer 2002.

*This site is desgined based on HTML5 and CSS3 for modern browsers, e.g. Chrome, Firefox, Safari, Edge, IE9,10,11, Opera.

Last updated on Mar. 28, 2017