Image Purification Technique for Myanmar OCR Applying Skew Aangle Detection and Free Skew

Authors

  • Chit San Lwin  School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, P. R. China
  • Wu Xiangqian  Department of Mathematics, Kyaing Tong University, Kyaing Tong City, Shan State, Myanmar

DOI:

https://doi.org//10.32628/IJSRST19615

Keywords:

Myanmar Script, Document Image Analysis, Skew Angle Detection, Free Skew, Border Edge Discarding, Lines Segmentation

Abstract

Optical Character Recognition (OCR) is a technology widely adopted for automatic translation of hardcopy text to editable text. The language dependence of technology makes it far less developed for less popular languages like Myanmar language. Also, the uniqueness and complexity of the Myanmar text system such as touching and complex characters have continued to pose serious challenges to several OCR investigators. In this paper, we propose a new technique to development Myanmar OCR system. Our technique implement skew angle detection and free skew, noisy border correction, extra page elimination, line segmentation from scanned images of Myanmar text. Performance of the proposed method is tested with 430 documents comprising different printed and handwritten Myanmar text of various fonts, sizes, multi-column, tables, stamps or photos, background effects. Our method gives an accuracy of 100% for line segmentation and 99.92% for skew angle detection and free skew. The ability of our method to effectively implement global and local skew angle detection, free skew and line segmentation in different handwritten and digital text images of the Myanmar character set with high accuracies confirm the robustness of the technique, its reliability and its suitability for application in many other related languages.

References

  1. T Jundale, R. Hegadi, Research survey on skew detection of Devanagari script, International Journal of Computer Applications, National Conference on Knowledge, Innovation in Technology and Engineering (NCKITE), 2015, 41-44.
  2. M Basavanna, S. S. Gornale, Skew detection and skew correction in scanned document image using principal component analysis, International Journal of Scientific & Engineering Research (IJSER), Vol. 6, Issue 1, 2015, 1414-1417.
  3. A Papandreou, B. Gatos, S. J. Perantonis, I. Gerardis, Efficient skew detection of printed document images based on novel combination of enhanced profiles, IJDAR 17, Springer, 2014, 433-454.
  4. N Watts, J. Rani, Performance evaluation of improved skew detection and correction using FFT and Median filtering, International Journal of Computer Applications (IJCA), Vol. 100, No. 15, 2014, 7- 16.
  5. O Boudraa, W. K. Hidouci, D. Michelucci, An improved skew angle detection and correction technique for historical scanned documents using morphological skeleton and progressive probabilistic Hough Transform, 5th International Conference on Electrical Engineering-Boumerdes (ICEE-B), IEEE, 2017, 1-6.
  6. M Shafii, M. Sid-Ahmed, Skew detection and correction based on an axes-parallel bounding box, IJDAR, Vol. 18, Springer, 2014, 59-71.
  7. T A. Jundale, R. S. Hegadi, Skew detection of Devanagari script using pixels of axes-parallel rectangle and linear regression, International Conference on Energy Systems and Application (ICESA), IEEE, 2015, 480-484.
  8. A Alaei, P. Nagabhushan, U. Pal, F. Kimura, An efficient skew estimation technique for scanned documents: an application of piece-wise painting algorithm, JOURNAL OF PATTERN RECOGNI-TION RESEARCH 1, 2016, 1-14.
  9. C S. Lwin, X. Wu, Zone-wise segmentation and lexicon-driven recognition for printed Myanmar characters, International Journal of Scientific Research in Computer Science, Engineering and Information Technology (IJSRCSEIT), Vol. 3, No. 8, 2018, 161-180.
  10. S. N. Holambe, Dr. U. B. Shinde, S. D. Mali, Reorganization of Devanagari script character using genetic algorithm, International Journal of Advanced Research in Computer Engineering & Technology (IJARCET), Vol. 6, Issue 5, 2017, 736-743.
  11. P. Nehete, A survey on estimation & correction of multiple skew in document image processing, International Journal of Current Trends in Engineering & Research (IJCTER), Vol. 2, Issue 3, 2016, 103-106.
  12. A. Papandreou, B. Gatos, A novel skew detection technique based on vertical projections, International Conference on Document Analysis and Recognition (ICDAR), IEEE, 2011, 384-388.
  13. H. P. P. Win, K. N. N. Tun, Converting Myanmar printed document image into machine understandable text format, 6th International Conference on Digital Information Management, IEEE, 2011, 96-101.
  14. S. W. Mohammed, N. R. Soora, Global skew detection and correction using morphological and statistical methods, Computational Vision and Bio Inspired Computing, Springer, 2018, 556-568.
  15. N. R. Soora, P. S. Deshpande, A novel local skew correction and segmentation approach for printed multilingual Indian documents, Alexandria Engineering Journal, Vol. 57, Issue 3, 2018, 1609-1618.
  16. R. Singh, R. Kaur, Improved skew detection and correction approach using discrete Fourier algorithm, International Journal of Soft Computing and Engineering (IJSCE), ISSN: 2231-2307, Vol. 3, Issue 4, 2013, 5-7.
  17. A. Boukharouba, A new algorithm for skew correction and baseline detection based on the randomized Hough Transform, Journal of King Saud University Computer and Information Sciences, Production and hosting by Elsevier B. B., Vol. 29, Issue 1, 2017, 29-38.
  18. K. C. Prakash, Y. M. Srikar, G. Trishal, S. Mandal, S. S. Channappayya, Optical character recognition (OCR) for Telugu: database, algorithm and application, 25th IEEE International Conference on Image Processing (ICIP), 2018, 3963-3967.
  19. F. Md. Hasan, T. Afroz, S. Ismail, S. Md. Islam, Document decomposition of Bangla printed text, 4th International Conference on Engineering Research, Innovation and Education (ICERIE), 2017.
  20. A. AL-Khatatneh, S. A. Pitchay, M. AI-qudah, A review of skew detection techniques for document, 17th UKSIM-AMSS International Conference on Modelling and Simulation, IEEE , 2015, 316-321.
  21. B. Jain, M. Borah, A survey paper on skew detection of offline handwritten character recognition system, International Journal of Computer Engineering and Applications, Vol. VI, Issue I, 2014.
  22. R. N. Verma, Dr. L. G. Malik, Review of illumination and skew correction techniques for scanned documents, Procedia Computer Science, Vol. 45, 2015, 322-327.
  23. http://www.worldometers.info/world-population/ myanmar-population/.
  24. Goddard, Cliff, The languages of East and Southeast Asia: An introduction, Oxford University Press, ISBN 0-19-924860-5, 2005.
  25. https://en.wikipedia.org/wiki/Burmese_language.
  26. https://www.unicode.org/charts/PDF/U1000.pdf.
  27. D. Brodic, C. A. B. Mello, C. A. Maluckov and Z. N. Milivojevic, An approach to skew detection of printed documents, Journal of Universal Computer Science (J.UCS), Vol. 20, No. 4, 2014, 488-506.
  28. N. Otsu, A threshold selection method from gray-level histograms, IEEE Transactions on Systems, Man, and Cybernetics, Vol. 9, No. 1, 1979, 62-66.
  29. https://ww2.mathworks.cn/help/images/ref/ imgaussfilt.html#bunfgk6-1-sigma.
  30. R. A. Haddad, A. N. Akansu, A class of fast Gaussian binomial filters for speech and image processing, Transactions on Acoustics, Speech and Signal Processing, vol. 39, March 1991, 723-727.
  31. https://en.wikipedia.org/wiki/Gaussian_filter #cite_note-NixonAguado-6.
  32. K. Arulmozhi, S. A. Perumal, C. S. T. Priyadarsini, K. Nallaperumal, Image refinement using skew angle detection and correction for Indian license plates, International Conference on Computational Intelligence and Computing Research, IEEE, 2012.
  33. https://en.wikipedia.org/wiki/Linear_equation, 2018.
  34. https://en.wikipedia.org/wiki/Slope, 2018.
  35. https://en.wikipedia.org/wiki/Hough_transform #cite_note-9, 2018.
  36. D. A. Noola, M. M. Kodabagi, An approach to extract line, word and character from scene text image, International Journal of Emerging Technology in Computer Science & Electronics (IJETCSE), Vol. 14, Issue 2, 2015, 916-922.
  37. V. Yadav, N. Ragot, Text extraction in document images: highlight on using corner points, 12th IAPR Workshop on Document Analysis Systems, IEEE, 2016, 281-286.
  38. J. Zhang, Y. Zhu, J. Du, L. Dai, Trajectory-based radical analysis network for online handwritten Chinese character recognition, 24th International Conference on Pattern Recognition (ICPAR), IEEE, 2018, 3681-3686.
  39. T. A. Jundale, R. S. Hegadi, Skew detection and correction of Devanagari script using Hough Transform, Procedia Computer Science, Vol. 45, 2015, 305-311.

Downloads

Published

2019-01-30

Issue

Section

Research Articles

How to Cite

[1]
Chit San Lwin, Wu Xiangqian, " Image Purification Technique for Myanmar OCR Applying Skew Aangle Detection and Free Skew, International Journal of Scientific Research in Science and Technology(IJSRST), Online ISSN : 2395-602X, Print ISSN : 2395-6011, Volume 6, Issue 1, pp.186-203, January-February-2019. Available at doi : https://doi.org/10.32628/IJSRST19615