Twitter Data Analysis for BOT Classification

Authors

  • Shweta Patil  Department of Computer Engineering, Zeal College of Engineering and Research,Narhe, Pune, Maharashtra, India
  • Aparna V. Mote  Department of Computer Engineering, Zeal College of Engineering and Research,Narhe, Pune, Maharashtra, India

Keywords:

Twitter, Twitter Bots, Metadata, Feature engineering, Machine Learning, Classification.

Abstract

With millions of users, Twitter is one of the most well-known microblogging platforms. Users are free to write about anything they like, including politics, sports, culinary, fashion, etc. Various assaults, including the dissemination of disinformation, phishing links, and malware, have targeted Twitter. Tweets must be posted by actual people and not by Twitter bots. The existing approaches leverage the user's tweets to make this determination, placing more emphasis on accuracy than efficiency. In this study, a feature engineering pipeline has been created to effectively distinguish between Twitter bots and actual users using user metadata such as default name, description, etc. There has been discussion of several machine learning technique algorithms.An accuracy of 98% using the proposed approach was obtained. The performances of various classifiers like the Decision tree classifier, Random Forest classifier, Multinomial Bayes classifier, KNN, and Logistic Regression classifier are compared to find the best classifier.

References

  1. (2022)The Wikipedia Website-Twitter [Online]. Available:https://en.wikipedia.org/wiki/Twitter
  2. (2022) Leading countries based on number of Twitter users as of January 2022 [Online].Available:https://www.statista.com/statistics/242606/number-of-active-twitter-users-in-selected-countries/
  3. (2022) Twitter Advantages and Disadvantages | How Twitter is Used?, Pros and Cons of Twitter [Online].Available: https://www.aplustopper.com/twitter-advantages-and-disadvantages/
  4. Stefano Cresci, Roberto Di Pietro, Marinella Petrocchi, Angelo Spognardi, and Maurizio Tesconi. 2017. The Paradigm-Shift of Social Spambots: Evidence, Theories, and Tools for the Arms Race. In Proceedings of the 26th International Conference on World Wide Web Companion (WWW '17 Companion). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, CHE, 963–972. https://doi.org/10.1145/3041021.3055135DataSet- http://mib.projects.iit.cnr.it/dataset.html
  5. Ranjana Battur, Nagaratna Yaligar, "Twitter Bot Detection using Machine Learning Algorithms", International Journal of Science and Research (IJSR), https://www.ijsr.net/get_abstract.php?paper_id=ART20199245, Volume 8 Issue 7, July 2019, 304 - 307, #ijsrnet
  6. A. Derhab, R. Alawwad, K. Dehwah, N. Tariq, F. A. Khan and J. Al-Muhtadi, "Tweet-Based Bot Detection Using Big Data Analytics," in IEEE Access, vol. 9, pp. 65988-66005, 2021, doi: 10.1109/ACCESS.2021.3074953.
  7. Kabakus, Abdullah Talha & Kara, Resul. (2017). A Survey of Spam Detection Methods on Twitter. International Journal of Advanced Computer Science and Applications. 8. 10.14569/IJACSA.2017.080305.
  8. (2019) Twitter Bot Detection by Nivranshu Pasricha, Conor Hayes . [Online]. Available: https://paperswithcode.com/task/twitter-bot-detection#task-home
  9. Loukas Ilias, Ioanna Roussaki,Detecting malicious activity in Twitter using deep learning techniques,Applied Soft Computing,Volume 107,2021,107360,ISSN 1568-4946,https://doi.org/10.1016/j.asoc.2021.107360. (https://www.sciencedirect.com/science/article/pii/S1568494621002830)
  10. M. Fazil and M. Abulaish, "A Hybrid Approach for Detecting Automated Spammers in Twitter," in IEEE Transactions on Information Forensics and Security, vol. 13, no. 11, pp. 2707-2719, Nov. 2018, doi: 10.1109/TIFS.2018.2825958.
  11. A. A. Amleshwaram, N. Reddy, S. Yadav, G. Gu and C. Yang, "CATS: Characterizing automation of Twitter spammers," 2013 Fifth International Conference on Communication Systems and Networks (COMSNETS), 2013, pp. 1-10, doi: 10.1109/COMSNETS.2013.6465541.
  12. (2021) The IJARCCE website. [Online]. Available:https://ijarcce.com/papers/twitter-bot-detection/
  13. (2002) The Cornell University website. [Online].Available:https://arxiv.org/abs/2002.01336
  14. N. Narayan, "Twitter Bot Detection using Machine Learning Algorithms," 2021 Fourth International Conference on Electrical, Computer and Communication Technologies (ICECCT), 2021, pp. 1-4, doi: 10.1109/ICECCT52121.2021.9616841
  15. (2022) Correlation (Pearson, Kendall, Spearman)[Online] Available-https://www.statisticssolutions.com/free-resources/directory-of-statistical-analyses/correlation-pearson-kendall-spearman/
  16. (2022) Decision Tree Classification Algorithm[Online].Available:https://www.javatpoint.com/machine-learning-decision-tree-classification-algorithm
  17. (2022) Decision Tree Classification Algorithm[Online].Available:https://scikit-learn.org/stable/modules/tree.html
  18. (2022) multinomial-naive-bayes [Online]. Available:https://www.upgrad.com/blog/multinomial-naive-bayes-explained/
  19. (2022) Applying Multinomial Naive Bayes to NLP Problems [Online].Available:https://www.geeksforgeeks.org/applying-multinomial-naive-bayes-to-nlp-problems/
  20. (2022) Classification Algorithms - Logistic Regression [Online].Available:https://www.tutorialspoint.com/machine_learning_with_python/classification_algorithms_logistic_regression.htm#:~:text=Logistic%20regression%20is%20a%20supervised,be%20only%20two%20possible%20classes.
  21. (2022) Classification Algorithms - Logistic Regression [Online].Available:https://www.geeksforgeeks.org/understanding-logistic-regression/
  22. (2022) Random Forest Algorithm [Online].Available:https://www.javatpoint.com/machine-learning-random-forest-algorithm
  23. (2022) Random Forest Classifier [Online].Available:https://www.sciencedirect.com/topics/computer-science/random-forest-classifier
  24. (2022) K-Nearest Neighbor(KNN) Algorithm for Machine Learning [Online].Available:https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning
  25. (2022) A Guide to AdaBoost: Boosting To Save The Day [Online] .Available:https://blog.paperspace.com/adaboost-optimizer/
  26. (2022) A Guide to AdaBoost: Boosting To Save The Day [Online] .Available:https://www.geeksforgeeks.org/ml-voting-classifier-using-sklearn/

Downloads

Published

2022-05-30

Issue

Section

Research Articles

How to Cite

[1]
Shweta Patil, Aparna V. Mote "Twitter Data Analysis for BOT Classification" International Journal of Scientific Research in Science and Technology(IJSRST), Online ISSN : 2395-602X, Print ISSN : 2395-6011,Volume 9, Issue 3, pp.586-588, May-June-2022.