A Novel Framework for Missing Data Imputation and Feature Selection in Incomplete Information Systems Using Hybrid Rough Set Theory

Authors

  • G.V. Suresh  Research Scholar, Jawaharlal Nehru Technological University, Hyderabad, Telangana, India
  • Dr. E. Sreenivasa Reddy  Professor, University College of Engineering, Acharya Nagarjuna University, Andhra Pradesh, India

Keywords:

Data Mining, Uncertain Data, Rough Set Theory, Classification, Information Gain, Conditional Mutual Information.

Abstract

Today’s real-world data is highly susceptible to noisy and missing data due to their normally huge size, besides their connection to heterogeneous sources. Decision making can be improved if data is preprocessed. Data cleaning has remarkable significance in data preprocessing. Missing values mostly occurs in any type of data sources. Imputation of missing values with appropriate values is a challenging task. Wrong imputations may affect the decisions observed. The main objective of our article is to investigate the impact of feature selection upon imputation of missing values across various datasets. We have used five distinct datasets with a range of feature sizes while performing experimental tasks. Also, we have compared three distinct forms of feature selection techniques and imputation approaches. The findings demonstrate the effectiveness of an approach that combines feature selection with imputation that may be suitable for a number of medical datasets. In fact, our experimental results show that it is possible to generate best results in case of well- designed feature selection algorithms.

References

  1. Z. Pawlak, Rough Sets, International Journal of Computer and Information Sciences 11 (5) (1982) 341–356.
  2. Z. Pawlak, Rough sets and fuzzy sets, Fuzzy Sets and Systems, Volume 17, Issue 1, September 1985, Pages 99 102.
  3. X. Hu, T.Y. Lin, J. Jianchao, A new rough sets model based on database systems, Fundamental Informaticae (2004) 1–18.
  4. P. Jaganathan, K. Thangavel, A. Pethalakshmi, M. Karnan, Classification rule discovery with ant colony optimization and Quickreduct Algorithm, in: Proceedings of Intelligent Optimization Modeling, Allied Publishers, 2006.
  5. J. Komorowski, A. Ohrn, Modelling prognostic power of cardiac tests using rough sets, Artificial Intelligence in Medicine 15 (1999) 167–191.
  6. Z. Xu, W. Qian, L. Huang, B. Yang, Comparative research of attribute reduction based on the new information entropy and on Skowron's discernibility matrix, Proceedings of the 2008 International Symposium on Computational Intelligence and Design, ISCID 2008, vol. 1, pp. 129- 132, 2008.
  7. K. Thangavel, A. Pethalakshmi, Dimensionality reduction based on rough set theory: A review, Applied Soft Computing 9 (2009) 1–12.
  8. S. Foithong, O. Pinngern, B. Attachoo, Feature subset selection wrapper based on mutual information and rough set, Expert Systems with Applications, Volume 39, Issue 1, January 2012, Pages 574-584.
  9. J. Dai and Q. Xu, Attribute selection based on information gain ratio in fuzzy rough set theory with application to tumor classification, Applied Soft Computing Journal, vol. 13, no. 1, pp. 211-221, January 2013.
  10. W.X. Zhang, J.S. Mi, W.Z. Wu, Approaches to knowledge reductions in inconsistent systems, International Journal of Intelligent Systems 21 (9) (2003) 989–1000.
  11. S. Mitra, P. Mitra, S.K. Pal, Evolutionary modular design of rough knowledge-based network using fuzzy attributes, Neurocomputing 36 (2001) 45–66.
  12. F.F. Xu, D.Q. Miao, L. Wei, Fuzzy-rough attribute reduction via mutual information with an application to cancer classification, Computers & Mathematics with Applications, Volume 57, Issue 6, March 2009, pp. 1010-1017.
  13. J. Zheng, R. Yan, Attribute reduction based on cross entropy in rough set theory, Journal of Information and Computational Science, vol. 9, no. 3, pp. 745-750, March 2012.
  14. Das H, Naik B, Behera HS. A Jaya algorithm based wrapper method for optimal feature selection in supervised classification. J King Saud Univ Comput Inf Sci. 2020. https://doi.org/10.1016/j.jksuci.2020.05.002.
  15. González J, Ortega J, Damas M, Martín-Smith P, Gan JQ. A new multi-objective wrapper method for feature selection—accuracy and stability analysis for BCI. Neurocomputing. 2019;333:407–18. https://doi.org/10.1016/j.neucom.2019.01.017.
  16. Lu M. Embedded feature selection accounting for unknown data heterogeneity. Expert Syst Appl. 2019;119:350–61. https://doi.org/10.1016/j.eswa.2018.11.006.
  17. Elmaizi A, Nhaila H, Sarhrouni E, Hammouch A, Nacir C. A novel information gain based approach for classification and dimensionality reduction of hyperspectral images. Proc Comput Sci. 2019;148:126–34. https://doi.org/10.1016/j.procs.2019.01.016.
  18. Jadhav S, He H, Jenkins K. Information gain directed genetic algorithm wrapper feature selection for credit rating. Appl Soft Comput. 2018;69:541–53. https://doi.org/10.1016/j.asoc.2018.04.033.
  19. Singer G, Anuar R, Ben-Gal I. A weighted information-gain measure for ordinal classification trees. Expert Syst Appl. 2020;152:113375. https://doi.org/10.1016/j.eswa.2020.113375.
  20. Tsai C-F, Sung Y-T. Ensemble feature selection in high dimension, low sample size datasets: Parallel and serial combination approaches. Knowl-Based Syst. 2020;203:106097. https://doi.org/10.1016/j.knosys.2020.106097.

Downloads

Published

2021-10-30

Issue

Section

Research Articles

How to Cite

[1]
G.V. Suresh, Dr. E. Sreenivasa Reddy "A Novel Framework for Missing Data Imputation and Feature Selection in Incomplete Information Systems Using Hybrid Rough Set Theory" International Journal of Scientific Research in Science and Technology(IJSRST), Online ISSN : 2395-602X, Print ISSN : 2395-6011,Volume 8, Issue 5, pp.591-603, September-October-2021.