Web Data Segmentation for Terrorism Detection using Named Entity Recognition Technique

Authors

  • Pooja S. Kade  Computer Science & Engineering, RTMNU University, A.C.E, Wardha, Maharashtra, India
  • Prof. N. M. Dhande  Computer Science & Engineering, RTMNU University, A.C.E, Wardha, Maharashtra, India

Keywords:

Data Mining, Web Mining, Patterns, DOMTree Technique, Object Recognition, Segmentation

Abstract

Terrorism has grown day by day, its roots quite deep in some parts of the world. With increasing terrorist activities it has become very important to control terrorism and stop its spread before certain time period. So as identified that internet is a major source of spreading terrorism through speeches, images and videos. Terrorist organizations use internet to brain wash individuals and younger’s and also promote terrorist activities through provocative web pages that inspire helpless people and college student to join terrorist organizations. So here we propose an efficient web data mining system and segmentation technique to detect such web properties and mark them automatically for human review. Websites created in various platforms have different data structures and are difficult to read for a single algorithm so we use DOM Tree concept to extract the web data and SIFT feature for edge extraction that organized web data. Also we use Kmeans algorithm for segmentation and KNN for classification. In this way we may judge web pages and check if they may be promoting terrorism or not. This system proves useful in anti-terrorism sectors and even search engines to classify web pages into the different category.

References

  1. S. Hosseini, S. Unankard, X. Zhou, and S. W. Sadiq, "Location oriented phrase detection in microblogs," in Proc. 19th Int. Conf. Database Syst. Adv. Appl., 2014, pp. 495–509.
  2. C. Li, J. Weng, Q. He, Y. Yao, A. Datta, A. Sun, and B.-S. Lee, "Twiner: Named entity recognition in targeted twitter stream," in Proc. 35th Int. ACM SIGIR Conf. Res. Develop. Inf. Retrieval, 2012, pp. 721–730.
  3. X. Liu, X. Zhou, Z. Fu, F. Wei, and M. Zhou, "Exacting social events for tweets using a factor graph," in Proc. AAAI Conf. Artif. Intell., 2012, pp. 1692–1698.
  4. A. Cui, M. Zhang, Y. Liu, S. Ma, and K. Zhang, "Discover breaking events with popular hashtags in twitter," in Proc. 21st ACM Int. Conf. Inf. Knowl. Manage., 2012, pp. 1794–1798.
  5. X. Meng, F. Wei, X. Liu, M. Zhou, S. Li, and H. Wang, "Entitycentric topic-oriented opinion summarization in twitter," in Proc. 18th ACM SIGKDD Int. Conf. Knowledge Discovery Data Mining, 2012, pp. 379–387.
  6. Z. Luo, M. Osborne, and T. Wang, "Opinion retrieval in twitter," in Proc. Int. AAAI Conf. Weblogs Social Media, 2012, pp. 507–510.
  7. K.-L. Liu, W.-J. Li, and M. Guo, "Emoticon smoothed language models for twitter sentiment analysis," in Proc. AAAI Conf. Artif. Intell., 2012, pp. 1678–1684.
  8. C. Li, A. Sun, and A. Datta, "Twevent: segment-based event detection from tweets," in Proc. 21st ACM Int. Conf. Inf. Knowl. Manage., 2012, pp. 155–164.
  9. X. Liu, S. Zhang, F. Wei, and M. Zhou, "Recognizing named entities in tweets," in Proc. 49th Annu. Meeting Assoc. Comput. Linguistics: Human Language Technol., 2011, pp. 359–367.
  10. X. Wang, F. Wei, X. Liu, M. Zhou, and M. Zhang, "Topic sentiment analysis in twitter: a graph-based hashtag sentiment classification approach," in Proc. 20th ACM Int. Conf. Inf. Knowl. Manage, 2011, pp. 1031–1040.
  11. L. Ratinov and D. Roth, "Design challenges and misconceptions in named entity recognition," in Proc. 13th Conf. Comput. Natural Language Learn., 2009, pp. 147–155.
  12. J. R. Finkel, T. Grenager, and C. Manning, "Incorporating nonlocal information into information extraction systems by Gibbs sampling," in Proc. 43rd Annu. Meeting Assoc. Comput. Linguistics, 2005, pp. 363–370.
  13. G. Zhou and J. Su, "Named entity recognition using an hmmbased chunk tagger," in Proc. 40th Annu. Meeting Assoc. Comput. Linguistics, 2002, pp. 473–480.
  14. S. Guo, M.-W. Chang, and E. Kiciman, "To link or not to link? A study on end-to-end tweet entity linking," in Proc. Conf. North Amer. Chapter Assoc. Comput. Linguistics: Human Language Technol.,2013, pp. 1020–1030.
  15. K. Gimpel, N. Schneider, B. O’Connor, D. Das, D. Mills, J.Eisenstein, M. Heilman, D. Yogatama, J. Flanigan, and N. A. Smith, "Part-of-speech tagging for twitter: annotation, features, and experiments," in Proc. 49th Annu. Meeting. Assoc. Comput.Linguistics: Human Language Technol., 2011, pp. 42–47.
  16. X. Liu, S. Zhang, F. Wei, and M. Zhou, "Recognizing named entities in tweets," in Proc. 49th Annu. Meeting Assoc. Comput. Linguistics: Human Language Technol., 2011, pp. 359–367.
  17. R. Mihalcea and A. Csomai, "Wikify!: linking documents to encyclopedic knowledge," in Proc. 16th ACM Conf. Inf. Knowl. Manage.,2007, pp. 233–242.

Downloads

Published

2017-04-30

Issue

Section

Research Articles

How to Cite

[1]
Pooja S. Kade, Prof. N. M. Dhande, " Web Data Segmentation for Terrorism Detection using Named Entity Recognition Technique, International Journal of Scientific Research in Science and Technology(IJSRST), Online ISSN : 2395-602X, Print ISSN : 2395-6011, Volume 3, Issue 3, pp.217-222 , March-April-2017.