Performance Analysis of an Ontology Based Crawler Operating in a Distributed Environment

Authors(3) :-Wael A. Gab-ALLAH, Ben Bella S. Tawfik, Hamed M. Nassar

Crawlers are being increasingly utilized to retrieve information from distributed information sources, such as the Web. We have implemented one that makes use of some novel algorithms and techniques, namely, a novel IR architecture, an efficient query expansion algorithm based on WordNet, a new crawling technique based on ontology and a new rapid filtering algorithm based on semantic similarity. The experimental results of the implemented crawler, named Ontology Based Distributed Information Retrieval (OBDIR) system, show superiority to those obtained from systems based on the standard Breadth First (BF) search technique. In this paper we analyze the performance of the OBDIR system. We develop a probabilistic model that captures the operational dimensions of the system. The model makes heavy use of Bayes’ theorem and can help establish a foundational theory for DIR. We study such performance metrics as recall and precision, and allude to other performance tools such as accuracy and ROC space. The study shows that by carefully choosing the keywords the performance of the crawler is enhanced greatly.

Authors and Affiliations

Wael A. Gab-ALLAH
Faculty of Computers & Informatics, Suez Canal University, Ismailia, Egypt
Ben Bella S. Tawfik
Faculty of Computers & Informatics, Suez Canal University, Ismailia, Egypt
Hamed M. Nassar
Faculty of Computers & Informatics, Suez Canal University, Ismailia, Egypt

Information retrieval, Web search, Focused crawler, Ontology

  1. S.SASIREGA, A.Jeyachristy, (2014). "Ontology Based Web Crawler for Mining Services Information Retrieval". International Journal of Computer Science and Mobile Computing, Vol. 3, No. 11, pp.325–330.
  2. Jones, K. (2004). "A Statistical Interpretation of Term Specificity and its Application to Retrieval". Journal of Documentation, Vol. 60, No. 5, pp. 493-502.
  3. Amudaria, S., and S. Sasirekha, (2011). "Improving the precision ratio using semantic based search". Signal Processing, Communication, Computing and Networking Technologies (ICSCCN), International Conference on. IEEE, pp. 465–470.
  4. Fuhr, N. (1992). "Probabilistic models in information retrieval". The Computer Journal, Vol. 35, No. 3, pp. 243–255
  5. Teevan, J. B. (2001). Improving information retrieval with textual analysis: Bayesian models and beyond (Doctoral dissertation, Massachusetts Institute of Technology).‏
  6. Selamat, A. and M. H. Selamat, (2005). "Analysis on the Performance of Mobile Agents for Query Retrieval", Information Sciences, Vol. 172, No. 3, pp: 281–307.
  7. Iosif, E. & Potamianos, A. (2010). "Unsupervised semantic similarity computation between terms using web documents". IEEE Transactions on Knowledge and Data Engineering, Vol. 22, No. 11, pp. 1637–1647.
  8. Stathopoulos, V., & Jose, J. M. (2011). "Bayesian Probabilistic Models for Image Retrieval". WAPA, pp. 41–47.‏
  9. Lavrenko, V. (2010). "Introduction to Probabilistic Models for Information Retrieval," 33rd International ACM SIGIR conference on Research and Development in Information Retrieval, Association for Computing Machinery, New York, pp. 905-

Publication Details

Published in : Volume 2 | Issue 3 | May-June 2016
Date of Publication : 2017-12-31
License:  This work is licensed under a Creative Commons Attribution 4.0 International License.
Page(s) : 334-339
Manuscript Number : IJSRST162389
Publisher : Technoscience Academy

Print ISSN : 2395-6011, Online ISSN : 2395-602X

Cite This Article :

Wael A. Gab-ALLAH, Ben Bella S. Tawfik, Hamed M. Nassar, " Performance Analysis of an Ontology Based Crawler Operating in a Distributed Environment", International Journal of Scientific Research in Science and Technology(IJSRST), Print ISSN : 2395-6011, Online ISSN : 2395-602X, Volume 2, Issue 3, pp.334-339, May-June-2016.
Journal URL : https://ijsrst.com/IJSRST162389
Citation Detection and Elimination     |      | |
  • Jones, K. (2004). "A Statistical Interpretation of Term Specificity and its Application to Retrieval". Journal of Documentation, Vol. 60, No. 5, pp. 493-502.
  • Amudaria, S., and S. Sasirekha, (2011). "Improving the precision ratio using semantic based search". Signal Processing, Communication, Computing and Networking Technologies (ICSCCN), International Conference on. IEEE, pp. 465–470.
  • Fuhr, N. (1992). "Probabilistic models in information retrieval". The Computer Journal, Vol. 35, No. 3, pp. 243–255
  • Teevan, J. B. (2001). Improving information retrieval with textual analysis: Bayesian models and beyond (Doctoral dissertation, Massachusetts Institute of Technology).‏
  • Selamat, A. and M. H. Selamat, (2005). "Analysis on the Performance of Mobile Agents for Query Retrieval", Information Sciences, Vol. 172, No. 3, pp: 281–307.
  • Iosif, E. & Potamianos, A. (2010). "Unsupervised semantic similarity computation between terms using web documents". IEEE Transactions on Knowledge and Data Engineering, Vol. 22, No. 11, pp. 1637–1647.
  • Stathopoulos, V., & Jose, J. M. (2011). "Bayesian Probabilistic Models for Image Retrieval". WAPA, pp. 41–47.‏
  • Lavrenko, V. (2010). "Introduction to Probabilistic Models for Information Retrieval," 33rd International ACM SIGIR conference on Research and Development in Information Retrieval, Association for Computing Machinery, New York, pp. 905-
  • " target="_blank"> BibTeX
    |
  • Jones, K. (2004). "A Statistical Interpretation of Term Specificity and its Application to Retrieval". Journal of Documentation, Vol. 60, No. 5, pp. 493-502.
  • Amudaria, S., and S. Sasirekha, (2011). "Improving the precision ratio using semantic based search". Signal Processing, Communication, Computing and Networking Technologies (ICSCCN), International Conference on. IEEE, pp. 465–470.
  • Fuhr, N. (1992). "Probabilistic models in information retrieval". The Computer Journal, Vol. 35, No. 3, pp. 243–255
  • Teevan, J. B. (2001). Improving information retrieval with textual analysis: Bayesian models and beyond (Doctoral dissertation, Massachusetts Institute of Technology).‏
  • Selamat, A. and M. H. Selamat, (2005). "Analysis on the Performance of Mobile Agents for Query Retrieval", Information Sciences, Vol. 172, No. 3, pp: 281–307.
  • Iosif, E. & Potamianos, A. (2010). "Unsupervised semantic similarity computation between terms using web documents". IEEE Transactions on Knowledge and Data Engineering, Vol. 22, No. 11, pp. 1637–1647.
  • Stathopoulos, V., & Jose, J. M. (2011). "Bayesian Probabilistic Models for Image Retrieval". WAPA, pp. 41–47.‏
  • Lavrenko, V. (2010). "Introduction to Probabilistic Models for Information Retrieval," 33rd International ACM SIGIR conference on Research and Development in Information Retrieval, Association for Computing Machinery, New York, pp. 905-
  • " target="_blank">RIS
    |
  • Jones, K. (2004). "A Statistical Interpretation of Term Specificity and its Application to Retrieval". Journal of Documentation, Vol. 60, No. 5, pp. 493-502.
  • Amudaria, S., and S. Sasirekha, (2011). "Improving the precision ratio using semantic based search". Signal Processing, Communication, Computing and Networking Technologies (ICSCCN), International Conference on. IEEE, pp. 465–470.
  • Fuhr, N. (1992). "Probabilistic models in information retrieval". The Computer Journal, Vol. 35, No. 3, pp. 243–255
  • Teevan, J. B. (2001). Improving information retrieval with textual analysis: Bayesian models and beyond (Doctoral dissertation, Massachusetts Institute of Technology).‏
  • Selamat, A. and M. H. Selamat, (2005). "Analysis on the Performance of Mobile Agents for Query Retrieval", Information Sciences, Vol. 172, No. 3, pp: 281–307.
  • Iosif, E. & Potamianos, A. (2010). "Unsupervised semantic similarity computation between terms using web documents". IEEE Transactions on Knowledge and Data Engineering, Vol. 22, No. 11, pp. 1637–1647.
  • Stathopoulos, V., & Jose, J. M. (2011). "Bayesian Probabilistic Models for Image Retrieval". WAPA, pp. 41–47.‏
  • Lavrenko, V. (2010). "Introduction to Probabilistic Models for Information Retrieval," 33rd International ACM SIGIR conference on Research and Development in Information Retrieval, Association for Computing Machinery, New York, pp. 905-
  • " target="_blank">CSV

    Article Preview