Improving Accuracy for Diabetes Mellitus Prediction Using Data Pre-Processing and Various New Learning Models


  • Garvit Khurana  School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, Tamil Nadu, India
  • Prof. Arun Kumar  School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, Tamil Nadu, India



Machine learning, Diabetes, Sugar, Data Analysis Diabetes, Support vector machines, Prediction algorithms, Classification algorithms


Data mining in medical data has successfully converted raw material into useful information. This information helps the medical experts in improving the diagnosis and treatment of diseases. Type II Diabetes Mellitus is one of silent killer diseases worldwide. According to World Health Organization, 346 million people are suffering from diabetes worldwide. Diagnosis or prediction of Diabetes is done through various data mining technique such as association, classification, clustering and pattern recognition. The study led to related open issues of identifying the need of a relation between the major factors that lead to the development of diabetes. This is possible by mining patterns found between the independent and dependant variable in the dataset. This paper compares classification accuracies of various machine learning models. Objective of paper is to find whether a person has diabetes or not and what features are highly responsible for diabetes. As due to its continuously increasing occurrences more and more families are influenced by diabetes mellitus. Most diabetic people know little about their health. In this study, we have proposed novel model on data mining techniques for predicting type 2 diabetes mellitus. Diabetes often referred to by doctors as metabolic disease in which the person has high blood glucose (blood sugar), because of inadequate insulin production.


  1. Abdullah A. Aljumah, Mohammed Gulam Ahamad, Mohammad Khubeb Siddiqui ," Application of data mining:Diabetes health care in young and old patients" , 2012
  2. Asha Gowda Karegowda and M.A. Jayaram, Cascading GA & CFS for Feature Subset Selection in Medical Data Mining , International Conference on IEEE International Advance Computing Conference (IACC?09), Thapar University, Patiala, Punjab India (Mar 2009).
  3. Margaret H. Danham,S. Sridhar, “Data mining, Introductory and Advanced Topics”, Person education , 1st ed., pp. 75-84,2006.
  4. Aman Kumar Sharma, SuruchiSahni, “A Comparative Study of Classification Algorithms for Spam Email Data Analysis”, IJCSE, Vol. 3, No. 5, pp. 1890-1895,2011.
  5. Barto, A. G. & Sutton, R., “Introduction to Reinforcement Learning”, MIT Press.M. Young, The Technical Writer?s Handbook Mill Valley, CA: University Science, pp. 45-60,1997.
  6. S. B. Kotsiantis, I. D. Zaharakis, P. E. Pintelas, “Machine learning: a review of classification and combining techniques”, Springer Science+Business Media B.V., ArtifIntell Rev, Vol. 26, pp. 159–190,2007.
  7. Leslie Pack Kaelbling, Michael L. Littman,“Reinforcement Learning:A Survey”, Journal of Atificial Intelligence Research, Vol. 4, pp. 237-285,1996.
  8. B.M Patil, R.C Joshi, Durga Tosniwal, Hybrid Prediction model for Type-2 Diabetic Patients, Expert System with Applications, 37, 8102-8108 (2010).
  9. Asha Gowda Karegowda, MA.Jayaram, Integrating Decision Tree and ANN for Categorization of Diabetics Data,International Conference on Computer Aided Engineering, December 13-15, IIT Madras, Chennai, India (2007).
  10. Humar, K., & Novruz, A. Design of a hybrid system for the diabetes and heart diseases. Expert Systems with Application.






Research Articles

How to Cite

Garvit Khurana, Prof. Arun Kumar, " Improving Accuracy for Diabetes Mellitus Prediction Using Data Pre-Processing and Various New Learning Models, International Journal of Scientific Research in Science and Technology(IJSRST), Online ISSN : 2395-602X, Print ISSN : 2395-6011, Volume 6, Issue 2, pp.502-515, March-April-2019. Available at doi :