Extraction of Text from Images of Big Data

Authors

  • Swati Keshav Meshram  Department of Computer Engineering (Data Science), Zeal College of Engineering and Research, Narhe, Pune, Maharashtra, India
  • Prof. B. A. Chaugule   Department of Computer Engineering (Data Science), Zeal College of Engineering and Research, Narhe, Pune, Maharashtra, India

Keywords:

Big Data, Color based partition, Canny edge detector, Hough transform, Text line grouping.

Abstract

Collection of data sets very large and complex that becomes difficult to be processed using on-hand database management tools or traditional data processing applications is called Big Data. Text information in images of big data serves as important clues for many image-based applications. However, locating text from a complex background with multiple colors is a difficult task. The proposed framework in this paper consists of two steps: -1. Colour based partition method. 2. Text line grouping method. Trained classifiers will be used after first step. Canny edge detector is used in first step and text line grouping makes use of Hough transform.

References

  1. Chucai Yi and YingLi Tian,”Text String Detection From Natural Scenes by Structure –Based Partition and Grouping”,IEEE Trans on Image Processing, Vol.20,No 9,September 2011.
  2. T. Kasar, J. Kumar, and A. G. Ramakrishnan, “Font and background color independent text binarization,” in Proc. 2nd Int. Workshop Camera-Based Document Anal. Recognit., 2007, pp. 3–9.
  3. H. Tran, A. Lux, H. L. Nguyen, and A. Boucher, “A novel approach for text detection in images using structural features,” in Proc. 3rd Int. Conf. Adv. Pattern Recognit., 2005, pp. 627–635.
  4. Q. Liu, C. Jung, and Y. Moon, “Text segmentation based on stroke filter,” in Proc. Int. Conf. Multimedia, 2006, pp. 129–132.
  5. K. Sobottka, H. Kronenberg, T. Perroud, and H. Bunke, “Text extraction from colored book and journal covers,” in Proc. 10th Int. Conf. Document Anal. Recognit., 1999, no. 4, pp. 163–176.
  6. Y. M. Y. Hasan and L. J. Karam, “Morphological text extraction from images,” IEEE Trans. Image Process., vol. 9, no. 11, pp. 1978–1983, Nov. 2000.
  7. C. Wolf, J. M. Jolion, and F. Chassaing, “Text localization, enhancement and binarization in multimedia documents,” in Proc. Int. Conf. Pattern Recognit., 2002, vol. 4, pp. 1037–1040.
  8. P. Shivakumara, W. Huang, and C. L. Tan, “An efficient edge based technique for text detection in video frames,” in The Eighth IAPRWorkshop on Document Analysis Systems, 2008.
  9. S. Lefevre and N. Vincent, “Caption localisation invideo sequences by fusion of multiple detectors,” in Proc. 8th Int. Conf. Document Anal. Recognit., 2005, pp. 106–110.
  10. J. Weinman, A. Hanson, and A. McCallum, “Sign detection in natural images with conditional random fields,” in Proc. IEEE Int. Workshop Mach. Learning Signal Process., 2004, pp. 549–558.
  11. T. Phan, P. Shivakumara, and C. L. Tan, “A Laplacian method for video text detection,” in Proc. 10th Int. Conf. Document Anal. Recognit., 2009, pp. 66– 70.
  12. J. Gao and J. Yang, “An adaptive algorithm for text detection from natural scenes,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.,2001, vol. 2, pp. 84–89.
  13. B. Epshtein, E. Ofek, and Y. Wexler, “Detecting text in nature scenes with stroke width transform,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2010, pp. 2963–2970.
  14. N. Nikolaou and N. Papamarkos, “Color reduction for complex document images,” Int. J. Imaging Syst. Technol., vol. 19, pp. 14–26,2009.
  15. Yi-Feng Pan,Xinwen Hou, and Cheng-Lin Liu ,”A Hybrid Approach to Detect and Localize Texts in Natural Scene Images”,IEEE Trans on Image Processing,Vol.20,No 3,March 2011.
  16. Y.-F. Pan, X. W. Hou, and C.-L. Liu, “A robust system to detect and localize texts in natural scene images,” in Proc. 8th IAPR Workshop on Document Analysis Syetems (DAS’08), Nara, Japan, 2008, pp. 35–42.
  17. N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition(CVPR’05), San Diego, CA, 2005, pp. 886–893.
  18. J. Sochman and J. Matas, “WaldBoost – Learning for time constrained sequential detection,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, 2005, pp. 150–156.
  19. J. Lafferty, A. McCallum, and F. Pereira, “Conditional random fields: Probabilistic models for segmenting and labeling sequence data,” in Proc. 18th Int. Conf. Machine Learning (ICML’01), San Francisco, CA, 2001, pp. 282–289. 20. F. Sha and F. Pereira, “Shallow parsing with conditional random fields,” in Proc. Conf. North American Chapter Assoc. Computational Linguistics on Human Language Technology (NAACL’03), Morristown, NJ, 2003, pp. 134–141

Downloads

Published

2022-04-30

Issue

Section

Research Articles

How to Cite

[1]
Swati Keshav Meshram, Prof. B. A. Chaugule "Extraction of Text from Images of Big Data" International Journal of Scientific Research in Science and Technology(IJSRST), Online ISSN : 2395-602X, Print ISSN : 2395-6011,Volume 9, Issue 2, pp.535-540, March-April-2022.