Enhanced Diabetes Prediction using Random Forest and XG Boost Machine Learning Classifiers with Dual Datasets

Authors

  • S. Mohan  M.Tech Student, Department of Electronics and Communication Engineering, S.V. University College of Engineering, Tirupati, A.P.India
  • Dr. D. Gowrisankar Reddy  Associate Professor, Department of Electronics and Communication Engineering, S.V. University College of Engineering, Tirupati, A.P. India

Keywords:

XG Boost Classifier, BMI, Random Forest Classifier, Attributes Blood Glucose Level

Abstract

Diabetes is a widespread chronic health condition with significant global implications. Early and accurate prediction of diabetes can enable timely interventions and improve patient outcomes. This paper explores the use of Random Forest and XG Boost machine learning classifiers to predict diabetes based on two distinct datasets. The first dataset includes attributes such as Pregnancies, Glucose levels, Blood Pressure, Skin Thickness, Insulin levels, BMI (Body Mass Index), Diabetes Pedigree Function, and Age. The Random Forest classifier achieves an accuracy of 91%, while the XG Boost classifier demonstrates superior performance with an accuracy of 93% in predicting diabetes on this dataset. The second dataset consists of attributes related to Hypertension, Heart Disease, Smoking History, BMI, HbA1c_level (glycated hemoglobin level), Blood Glucose Level, Diabetes Pedigree Function, and Age. In this dataset, the Random Forest classifier attains an accuracy of 96.98%, and the XG Boost classifier outperforms with an impressive accuracy of 97.25% in predicting diabetes. These results highlight the effectiveness of Random Forest and XG Boost machine learning classifiers in diabetes prediction, with the latter showing particularly promising results in both datasets. Such predictive models can assist healthcare professionals in identifying individuals at risk of diabetes, thereby enabling early intervention and better disease management.

References

  1. Kumar, S.; Mishra, S.; Asthana, P. Automated detection of acute leukemia using k-mean clustering algorithm. In Advances in Computer and Computational Sciences; Springer: Berlin/Heidelberg, Germany, 2018; pp. 655–670.
  2. Classification of Blasts in Acute Leukemia Blood samples Using k-Nearest Neighbour—IEEE Conference Publication. Available online: https://ieeexplore.ieee.org/abstract/document/6194769/ (accessed on 3 February 2020).
  3. Madhukar, M.; Agaian, S.; Chronopoulos, A.T. Deterministic model for acute myelogenous leukemia classification. In Proceedings of the 2012 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Seoul, Korea, 14–17 October 2012; pp. 433–438.
  4. Setiawan, A.; Harjoko, A.; Ratnaningsih, T.; Suryani, E.; Palgunadi, S. Classification of cell types in Acute Myeloid Leukemia (AML) of M4, M5 and M7 subtypes with support vector machine classifier. In Proceedings of the 2018 International Conference on Information and Communications Technology (ICOIACT), Yogyakarta, Indonesia, 6–7 March 2018; pp. 45–49.
  5. Faivdullah, L.; Azahar, F.; Htike, Z.Z.; Naing, W.N. Leukemia detection from blood smears. J. Med. Bioeng. 2015, 4, 488–491. [CrossRef]
  6. Laosai, J.; Chamnongthai, K. Acute leukemia classification by using SVM and K-Means clustering. In Proceedings of the 2014 IEEE International Electrical Engineering Congress (iEECON), Chonburi, Thailand, 19–21 March 2014; pp. 1–4.
  7. Patel, N.; Mishra, A. Automated leukaemia detection using microscopic images. Procedia Comput. Sci. 2015, 58, 635–642. [CrossRef]
  8. Sajjad, M.; Khan, S.; Jan, Z.; Muhammad, K.; Moon, H.; Kwak, J.T.; Rho, S.; Baik, S.W.; Mehmood, I. Leukocytes classification and segmentation in microscopic blood smear: A resource-aware healthcare service in smart cities. IEEE Access 2016, 5, 3475–3489. [CrossRef]
  9. Abdeldaim, A.M.; Sahlol, A.T.; Elhoseny, M.; Hassanien, A.E. Computer-aided acute lymphoblastic leukemia diagnosis system based on image analysis. In Advances in Soft Computing and Machine Learning in Image Processing; Springer: Berlin/Heidelberg, Germany, 2018; pp. 131–147.
  10. Dwivedi, A.K. Artificial neural network model for effective cancer classification using microarray gene expression data. Neural Comput. Appl. 2018, 29, 1545–1554. [CrossRef].
  11. Sahlol, A.T.; Abdeldaim, A.M.; Hassanien, A.E. Automatic acute lymphoblastic leukemia classification model using social spider optimization algorithm. Soft Comput. 2019, 23, 6345–6360. [CrossRef]
  12. Dharani, N. P., and N. Gireesh. 'Fusion of CT and PET Image of Lungs Using Hybrid Algorithms.' Solid State Technology 63.6 (2020): 7706-7719.
  13. F. Scotti, 'Robust Segmentation and Measurements Techniques of White Cells in Blood Microscope Images', in Proc. of the 2006 IEEE Instrumentation and Measurement Technology Conf. (IMTC 2006), Sorrento, Italy, pp. 43-48, April 24-27, 2006. ISSN: 1091-5281. [DOI:10.1109/IMTC.2006.328170]
  14. Dharani, N. P., and Polaiah Bojja. 'Analysis and prediction of COVID-19 by using recurrent LSTM neural network model in machine learning.' International Journal of Advanced Computer Science and Applications 13.5 (2022).
  15. Dharani, N. P. 'Detection of breast cancer by thermal based sensors using multilayered neural network classifier.' International Journal of Engineering and Advanced Technology. (2019).

Downloads

Published

2023-09-30

Issue

Section

Research Articles

How to Cite

[1]
S. Mohan, Dr. D. Gowrisankar Reddy "Enhanced Diabetes Prediction using Random Forest and XG Boost Machine Learning Classifiers with Dual Datasets" International Journal of Scientific Research in Science and Technology(IJSRST), Online ISSN : 2395-602X, Print ISSN : 2395-6011,Volume 10, Issue 5, pp.434-446, September-October-2023.