Twitter Data Analysis for BOT Classification
Keywords:
Twitter, Twitter Bots, Metadata, Feature engineering, Machine Learning, Classification.Abstract
With millions of users, Twitter is one of the most well-known microblogging platforms. Users are free to write about anything they like, including politics, sports, culinary, fashion, etc. Various assaults, including the dissemination of disinformation, phishing links, and malware, have targeted Twitter. Tweets must be posted by actual people and not by Twitter bots. The existing approaches leverage the user's tweets to make this determination, placing more emphasis on accuracy than efficiency. In this study, a feature engineering pipeline has been created to effectively distinguish between Twitter bots and actual users using user metadata such as default name, description, etc. There has been discussion of several machine learning technique algorithms.An accuracy of 98% using the proposed approach was obtained. The performances of various classifiers like the Decision tree classifier, Random Forest classifier, Multinomial Bayes classifier, KNN, and Logistic Regression classifier are compared to find the best classifier.
References
- (2022)The Wikipedia Website-Twitter [Online]. Available:https://en.wikipedia.org/wiki/Twitter
- (2022) Leading countries based on number of Twitter users as of January 2022 [Online].Available:https://www.statista.com/statistics/242606/number-of-active-twitter-users-in-selected-countries/
- (2022) Twitter Advantages and Disadvantages | How Twitter is Used?, Pros and Cons of Twitter [Online].Available: https://www.aplustopper.com/twitter-advantages-and-disadvantages/
- Stefano Cresci, Roberto Di Pietro, Marinella Petrocchi, Angelo Spognardi, and Maurizio Tesconi. 2017. The Paradigm-Shift of Social Spambots: Evidence, Theories, and Tools for the Arms Race. In Proceedings of the 26th International Conference on World Wide Web Companion (WWW '17 Companion). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, CHE, 963–972. https://doi.org/10.1145/3041021.3055135DataSet- http://mib.projects.iit.cnr.it/dataset.html
- Ranjana Battur, Nagaratna Yaligar, "Twitter Bot Detection using Machine Learning Algorithms", International Journal of Science and Research (IJSR), https://www.ijsr.net/get_abstract.php?paper_id=ART20199245, Volume 8 Issue 7, July 2019, 304 - 307, #ijsrnet
- A. Derhab, R. Alawwad, K. Dehwah, N. Tariq, F. A. Khan and J. Al-Muhtadi, "Tweet-Based Bot Detection Using Big Data Analytics," in IEEE Access, vol. 9, pp. 65988-66005, 2021, doi: 10.1109/ACCESS.2021.3074953.
- Kabakus, Abdullah Talha & Kara, Resul. (2017). A Survey of Spam Detection Methods on Twitter. International Journal of Advanced Computer Science and Applications. 8. 10.14569/IJACSA.2017.080305.
- (2019) Twitter Bot Detection by Nivranshu Pasricha, Conor Hayes . [Online]. Available: https://paperswithcode.com/task/twitter-bot-detection#task-home
- Loukas Ilias, Ioanna Roussaki,Detecting malicious activity in Twitter using deep learning techniques,Applied Soft Computing,Volume 107,2021,107360,ISSN 1568-4946,https://doi.org/10.1016/j.asoc.2021.107360. (https://www.sciencedirect.com/science/article/pii/S1568494621002830)
- M. Fazil and M. Abulaish, "A Hybrid Approach for Detecting Automated Spammers in Twitter," in IEEE Transactions on Information Forensics and Security, vol. 13, no. 11, pp. 2707-2719, Nov. 2018, doi: 10.1109/TIFS.2018.2825958.
- A. A. Amleshwaram, N. Reddy, S. Yadav, G. Gu and C. Yang, "CATS: Characterizing automation of Twitter spammers," 2013 Fifth International Conference on Communication Systems and Networks (COMSNETS), 2013, pp. 1-10, doi: 10.1109/COMSNETS.2013.6465541.
- (2021) The IJARCCE website. [Online]. Available:https://ijarcce.com/papers/twitter-bot-detection/
- (2002) The Cornell University website. [Online].Available:https://arxiv.org/abs/2002.01336
- N. Narayan, "Twitter Bot Detection using Machine Learning Algorithms," 2021 Fourth International Conference on Electrical, Computer and Communication Technologies (ICECCT), 2021, pp. 1-4, doi: 10.1109/ICECCT52121.2021.9616841
- (2022) Correlation (Pearson, Kendall, Spearman)[Online] Available-https://www.statisticssolutions.com/free-resources/directory-of-statistical-analyses/correlation-pearson-kendall-spearman/
- (2022) Decision Tree Classification Algorithm[Online].Available:https://www.javatpoint.com/machine-learning-decision-tree-classification-algorithm
- (2022) Decision Tree Classification Algorithm[Online].Available:https://scikit-learn.org/stable/modules/tree.html
- (2022) multinomial-naive-bayes [Online]. Available:https://www.upgrad.com/blog/multinomial-naive-bayes-explained/
- (2022) Applying Multinomial Naive Bayes to NLP Problems [Online].Available:https://www.geeksforgeeks.org/applying-multinomial-naive-bayes-to-nlp-problems/
- (2022) Classification Algorithms - Logistic Regression [Online].Available:https://www.tutorialspoint.com/machine_learning_with_python/classification_algorithms_logistic_regression.htm#:~:text=Logistic%20regression%20is%20a%20supervised,be%20only%20two%20possible%20classes.
- (2022) Classification Algorithms - Logistic Regression [Online].Available:https://www.geeksforgeeks.org/understanding-logistic-regression/
- (2022) Random Forest Algorithm [Online].Available:https://www.javatpoint.com/machine-learning-random-forest-algorithm
- (2022) Random Forest Classifier [Online].Available:https://www.sciencedirect.com/topics/computer-science/random-forest-classifier
- (2022) K-Nearest Neighbor(KNN) Algorithm for Machine Learning [Online].Available:https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning
- (2022) A Guide to AdaBoost: Boosting To Save The Day [Online] .Available:https://blog.paperspace.com/adaboost-optimizer/
- (2022) A Guide to AdaBoost: Boosting To Save The Day [Online] .Available:https://www.geeksforgeeks.org/ml-voting-classifier-using-sklearn/
Downloads
Published
Issue
Section
License
Copyright (c) IJSRST

This work is licensed under a Creative Commons Attribution 4.0 International License.