A Natural Language Processing Framework for Analyzing Unstructured Gynecological Health Records

Authors

  • Paril Ghori  

Keywords:

Bag of Words, Electronic Health Records, Natural Language Processing, Principal Component Analysis, Text Mining.

Abstract

The rapid expansion of healthcare data, particularly in electronic health records (EHRs), has created a demand for advanced techniques to extract, process, and analyze clinical information effectively. This paper presents a comprehensive Natural Language Processing (NLP)-based framework tailored to handle unstructured textual data extracted from gynecological patient records. The methodology focuses on segmentation, tokenization, case folding, abbreviation expansion, stemming, and dimensionality reduction to preprocess and normalize data efficiently. Advanced techniques such as negation detection and frequency analysis were implemented to identify patterns and relationships within the data. The proposed framework was validated on a dataset comprising 18,341 gynecological anamnesis records. The analysis included identifying ICD codes, frequent trigrams, and affirmative/negative expressions to assess the patterns and characteristics present in the records. The performance evaluation demonstrated high accuracy (94.15%), precision (92.87%), recall (91.34%), and F1-score (92.10%), indicating the robustness of the approach. The results emphasize the framework's capability to extract key terms and insights, providing valuable support for clinical decision-making and research. This work highlights the potential of NLP methodologies in transforming unstructured clinical data into structured formats, enabling better management of health information and enriching biomedical ontologies for broader applications in healthcare informatics.

References

  1. Z. Wang, A. D. Shah, A. R. Tate, S. Denaxas, J. Shawe-Taylor, and H. Hemingway, "Extracting diagnoses and investigation results from unstructured text in electronic health records by semi-supervised machine learning," PLoS One, vol. 7, no. 1, e30412, Jan. 2012, doi: 10.1371/journal.pone.0030412.
  2. X. Zhou, H. Han, I. Chankai, A. A. Prestrud, and A. D. Brooks, "Approaches to text mining for clinical medical records," in The 21st Annual ACM Symposium on Applied Computing 2006, Technical Tracks on Computer Applications in Health Care (CAHC 2006), Dijon, France, Apr. 23-27, 2006, pp. 235-239. [Online]. Available: http://www.ischool.drexel.edu/faculty/hhan/SAC2006_CAHC.pdf
  3. Farlex Partner Medical Dictionary, "Anamnesis," 2012. [Online]. Available: https://medical-dictionary.thefreedictionary.com/anamnesis.
  4. M. López, "Anamnese," in Semiologia Médica: The Bases of Clinical Diagnosis, 3rd ed., M. López and J. L. Medeiros, Eds. Rio de Janeiro: Atheneu, 1990, ch. 2, pp. 20-34.
  5. Codazzi, A.C., Ippolito, R., Novara, C., Tondina, E., Cerbo, R.M. and Tzialla, C., 2021. Hypertrophic cardiomyopathy in infant newborns of diabetic mother: a heterogeneous condition, the importance of anamnesis, physical examination and follow-up. Italian Journal of Pediatrics, 47, pp.1-6.
  6. Khullar, S., Das, S., Rizvi, S.A.A., Abbas, S.Z., Sachdeva, A., Sibte, S. and Abidi, A., 2021. Changes In The Criteria Laid Down By The Medical Council Of India (MCI) For Faculty Appointment And Promotions In The Last 12 Years (2009-2021) And Its Implications. Int J Basic Appl Physiol, 11(1), p.38.
  7. Hirschberg, J. and Manning, C.D., 2015. Advances in natural language processing. Science, 349(6245), pp.261-266.
  8. Schulz, S., Rodrigues, J.M., Rector, A. and Chute, C.G., 2017. Interface terminologies, reference terminologies and aggregation terminologies: a strategy for better integration. In MEDINFO 2017: Precision Healthcare through Informatics (pp. 940-944). IOS Press.
  9. Baneyx, A., Charlet, J. and Jaulent, M.C., 2006. Methodology to build medical ontology from textual resources. In AMIA Annual Symposium proceedings (Vol. 2006, p. 21). American Medical Informatics Association.
  10. Gaudet-Blavignac, C., Foufi, V., Bjelogrlic, M. and Lovis, C., 2021. Use of the systematized nomenclature of medicine clinical terms (SNOMED CT) for processing free text in health care: systematic scoping review. Journal of medical Internet research, 23(1), p.e24594.
  11. Chowdhary, K. and Chowdhary, K.R., 2020. Natural language processing. Fundamentals of artificial intelligence, pp.603-649.
  12. Panesar, K., 2020. Natural Language Processing In Artificial Intelligence: A Functional Linguistic Perspective. The Age of Artificial Intelligence: An Exploration, 211.
  13. Ramagundam, S. (2021). Next Gen Linear Tv: Content Generation And Enhancement With Artificial Intelligence. International Neurourology Journal25(4), 22-28.
  14. Almeida, M.B., Souza, R.R. and Porto, R.B., 2015. Looking for the Identity of Information Science in the Age of Big Data, Computing Clouds and Social Networks. In ISI (pp. 55-65).
  15. Electronic Health Records (EHRs) Data Exploration. [Online]. Available: https://www.kaggle.com/code/gpreda/electronic-health-records-ehrs-data-exploration
  16. Tudorache, T., 2020. Ontology engineering: Current state, challenges, and future directions. Semantic Web, 11(1), pp.125-138.
  17. Dalianis, H. and Dalianis, H., 2018. Characteristics of patient records and clinical corpora. Clinical Text Mining: Secondary Use of Electronic Patient Records, pp.21-34.
  18. Kim, Y.S., Yoon, D., Byun, J., Park, H., Lee, A., Kim, I.H., Lee, S., Lim, H.S. and Park, R.W., 2017. Extracting information from free-text electronic patient records to identify practice-based evidence of the performance of coronary stents. Plos one, 12(8), p.e0182889.
  19. Meystre, S.M., Friedlin, F.J., South, B.R., Shen, S. and Samore, M.H., 2010. Automatic de-identification of textual documents in the electronic health record: a review of recent research. BMC medical research methodology, 10, pp.1-16.
  20. Thomas, C. ed., 2018. Ontology in Information Science. BoD–Books on Demand.

Downloads

Published

2021-07-30

Issue

Section

Research Articles

How to Cite

[1]
Paril Ghori "A Natural Language Processing Framework for Analyzing Unstructured Gynecological Health Records" International Journal of Scientific Research in Science and Technology(IJSRST), Online ISSN : 2395-602X, Print ISSN : 2395-6011,Volume 8, Issue 4, pp.756-769, July-August-2021.