Deep Learning-Based Cursive Text Detection and Recognition in Natural Scene Images
DOI:
https://doi.org/10.32628/IJSRST523103161Keywords:
Cursive text detection, Cursive text recognition, Deep learning, Natural scene images, Convolutional neural networks, Recurrent neural networks, Attention mechanisms.Abstract
Cursive text detection and recognition in natural scene images are complex tasks due to the variability and intricacy of handwriting styles. This research focuses on developing a deep learning-based approach to address these challenges. The proposed solution leverages advancements in deep learning techniques to improve the accuracy and robustness of cursive text detection and recognition. The research involves collecting and annotating a diverse dataset of natural scene images containing cursive text. A deep learning model for cursive text detection is trained on the annotated dataset, and techniques for text line segmentation are investigated to enhance accuracy. Furthermore, a deep learning-based recognition system is designed and implemented to transcribe cursive text into machine-readable text. The proposed approach is evaluated and compared with existing methods using appropriate metrics and benchmark datasets. The research aims to provide insights into the challenges and opportunities of cursive text analysis in real-world scenarios and contribute to advancements in document digitization, handwriting analysis, and information retrieval.
References
- Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 580-587).
- Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., & Berg, A. C. (2016). SSD: Single shot multibox detector. In European conference on computer vision (ECCV) (pp. 21-37).
- Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 779-788).
- Graves, A., Mohamed, A. R., & Hinton, G. (2013). Speech recognition with deep recurrent neural networks. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6645-6649).
- Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.
- Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 580-587).
- Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., & Berg, A. C. (2016). SSD: Single shot multibox detector. In European conference on computer vision (ECCV) (pp. 21-37).
- Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 779-788).
- Graves, A., Mohamed, A. R., & Hinton, G. (2013). Speech recognition with deep recurrent neural networks. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6645-6649).
- Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.
- Marti, U. V., & Bunke, H. (2002). The IAM-database: An English sentence database for offline handwriting recognition. International Journal on Document Analysis and Recognition (IJDAR), 5(1), 39-46.
- Gatos, B., Pratikakis, I., Ntirogiannis, K., & Halatsis, C. (2009). A holistic approach to handwritten text line recognition. Pattern Recognition, 42(12), 2998-3003.
- ICDAR competitions datasets: https://rrc.cvc.uab.es/
Downloads
Published
Issue
Section
License
Copyright (c) IJSRST

This work is licensed under a Creative Commons Attribution 4.0 International License.