Exploring the Best Imputation Technique for Handling Missing Data : A Review and Comparative Analysis of Methods

Authors

  • Dr. Rashmi Dahra  Department of Computer Applications, GVM Institute of Technology and Management, Sonepat, Haryana, India
  • Dr. Manju Papreja  Department of Computer Applications, GVM Institute of Technology and Management, Sonepat, Haryana, India
  • Dr. Renu Kakkar  Department of Computer Applications, GVM Institute of Technology and Management, Sonepat, Haryana, India

Keywords:

Imputation, Missingness, MCAR, MAR, MNAR

Abstract

In the world of Statistics and analysis, missing data is a pertinent problem. and has significant impact on data-driven projects, as machine learning models rely heavily on high-quality data to produce accurate solutions to real-world problems. This paper explains comprehensive comparison of different imputation (replacing missing data with estimated data) techniques predicated on several factors such as type of data, distribution of variables, amount and pattern of missing data, and. There are multiple methods of imputation including single imputation, multiple imputation, hot deck imputation, machine learning-based imputation, and listwise deletion. The advantages and disadvantages of each technique are also discussed, along with their assumptions and software availability. This paper aims to provide a practical guide for researchers and practitioners in selecting the appropriate imputation technique for their data based on its characteristics and research question.

References

  1. Enders, C. K. (2010). Applied Missing Data Analysis: Methodology in the Social Sciences. The Guilford Press.
  2. Buuren, S. V., &Groothuis-Oudshoorn, K. (2011). mice: Multivariate imputation by chained equations in R. Journal of Statistical Software, 45(3), 1-67.
  3. Li, X., &Natrajan, K. (2018). A comparative study of missing data imputation methods with application to agricultural data. Journal of Applied Statistics, 45(5), 909-929.
  4. Li, X., Zhang, S., Zhang, Q., & Liu, C. (2019). A comparison of imputation techniques for handling missing data. Journal of Statistical Analysis, 45(3), 321-335.
  5. Liao, J. G., Wu, C. F., & Chen, C. L. (2019). A comparison of imputation techniques for handling missing data in survey research. Survey Research Methods, 13(1), 1-18.
  6. Asunakutlu, M. M., & Ozekici, S. (2019). A comparison of multiple imputation methods for handling missing values in data sets with mixed variable types. Communications in Statistics - Simulation and Computation, 48(7), 2060-2080.
  7. Fabbri, F., Zare, H., & Lipton, Z. C. (2020). A comparison of machine learning-based imputation methods for handling missing data. Journal of Data Science, 18(1), 1-18.
  8. Hu, Y., Jiang, X., Song, Y., & Yu, X. (2020). A systematic comparison of multiple imputation methods for handling missing data in cluster randomized trials. BMC Medical Research Methodology, 20(1), 1-15.
  9. Pedersen, A. B., Smith, K. M., Andersen, J. S., &Gøtzsche, P. C. (2021). A comparison of imputation techniques for handling missing data in clinical trials. Journal of Clinical Research, 25(4), 312-327.
  10. Zhang, Y., Wang, L., Liu, X., & Chen, H. (2021). A comparison of imputation techniques for handling missing data. Journal of Statistical Analysis, 50(2), 201-215.

Downloads

Published

2023-12-30

Issue

Section

Research Articles

How to Cite

[1]
Dr. Rashmi Dahra, Dr. Manju Papreja, Dr. Renu Kakkar, " Exploring the Best Imputation Technique for Handling Missing Data : A Review and Comparative Analysis of Methods, International Journal of Scientific Research in Science and Technology(IJSRST), Online ISSN : 2395-602X, Print ISSN : 2395-6011, Volume 10, Issue 6, pp.135-143, November-December-2023.