Hybrid Modeling for Effective Text Spotting in Gujarati Language
DOI:
https://doi.org/10.32628/IJSRST2512356Keywords:
Text spotting, deep learning, scene text detection, optical character recognition (OCR), end-to-end frameworks, machine learning, computer visionAbstract
This paper presents a novel hybrid modeling approach for effective text spotting specifically tailored to the Gujarati language, achieving a high accuracy of 91.8% while maintaining efficient training time of only 18 minutes. The proposed hybrid model synergistically combines convolutional neural networks (CNN) for feature extraction and transformer-based architectures for contextual understanding, optimizing both recognition accuracy and computational efficiency. Gujarati, with its complex script and unique character shapes, presents challenges such as cursive and ligature forms, which the hybrid framework effectively addresses by leveraging spatial and sequential information jointly. Unlike traditional single-method models that either focus solely on spatial features or sequential patterns, our hybrid approach integrates both aspects to improve robustness against noise, background clutter, and varying text orientations in natural scene images. Experimental results on a custom-compiled dataset of Gujarati text in diverse scenes demonstrate superior performance compared to baseline models, with a notable reduction in false positives and recognition errors. The short training time also makes this method viable for real-world applications requiring quick model updates or deployment on resource-constrained devices. This work contributes a valuable advancement in Indic script OCR research and opens pathways for extending hybrid frameworks to other low-resource languages with complex scripts. The system’s effectiveness in handling multi-style text and mixed backgrounds suggests promising potential for integration into mobile-based text reading applications and automated document processing for Gujarati text.
Downloads
References
Ghosh, Jyoti, et al. “A Light-Weight Natural Scene Text Detection and Recognition System.” Multimedia Tools and Applications, vol. 83, no. 3, 2024, pp. 6651–83, https://doi.org/10.1007/s11042-023-15696-0.
Hussein, Alaa, et al. “Deep Learning Techniques for Detecting and Segmenting Text in Natural Scene Images : Review.” Al-Nahrain Journal of Science, vol. 27, no. 2, 2024, pp. 133–44, https://doi.org/https://anjs.edu.iq/index.php/anjs/article/view/2752/2008.
Zhai, Yukun, et al. “TextFormer: A Query-Based End-to-End Text Spotter with Mixed Supervision.” Machine Intelligence Research, 2024, https://doi.org/10.1007/s11633-023-1460-6.
Nguyen, Nguyen, et al. “Efficiently Leveraging Linguistic Priors for Scene Text Spotting.” Computer Science, Linguistics, 2024, pp. 1–14, https://doi.org/10.48550/arXiv.2402.17134.
Liu, Yuliang, et al. “SPTS v2: Single-Point Scene Text Spotting.” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 12, 2023, pp. 15665–79, https://doi.org/10.1109/TPAMI.2023.3312285.
Das, Alloy, et al. “Diving into the Depths of Spotting Text in Multi-Domain Noisy Scenes.” Computer Vision and Pattern Recognition, 2023, https://doi.org/10.48550/arXiv.2310.00558.
Da, Cheng, et al. “Multi-Granularity Prediction with Learnable Fusion for Scene Text Recognition.” European Computer Vision Association, 2023, pp. 1–18, http://arxiv.org/abs/2307.13244.
Zhao, Liang, et al. “CommuSpotter: Scene Text Spotting with Multi-Task Communication.” Applied Sciences (Switzerland), vol. 13, no. 23, 2023, https://doi.org/10.3390/app132312540.
Liu, Yiyi, et al. “A Convolutional Recurrent Neural-Network-Based Machine Learning for Scene Text Recognition Application.” Symmetry, vol. 15, no. 4, 2023, https://doi.org/10.3390/sym15040849.
Luan, Xin, et al. “Lightweight Scene Text Recognition Based on Transformer.” Sensors, vol. 23, no. 9, 2023, pp. 1–14, https://doi.org/10.3390/s23094490.
Zhang, Shiyu, et al. “Irregular Scene Text Detection Based on a Graph Convolutional Network.” Sensors, vol. 23, no. 3, 2023, pp. 1–17, https://doi.org/10.3390/s23031070.
Buoy, Rina, et al. “Explainable Connectionist-Temporal-Classification-Based Scene Text Recognition.” Journal of Imaging, vol. 9, no. 11, 2023, https://doi.org/10.3390/jimaging9110248.
Lian, Zhe, et al. “PCBSNet: A Pure Convolutional Bilateral Segmentation Network for Real-Time Natural Scene Text Detection.” Electronics (Switzerland), vol. 12, no. 14, 2023, https://doi.org/10.3390/electronics12143055.
You, Yuwei, et al. “Arbitrary-Shaped Text Detection with B-Spline Curve Network.” Sensors, vol. 23, no. 5, 2023, pp. 1–13, https://doi.org/10.3390/s23052418.
Wu, Yirui, et al. “CDText: Scene Text Detector Based on Context-Aware Deformable Transformer.” Pattern Recognition Letters, vol. 172, 2023, pp. 8–14, https://doi.org/10.1016/j.patrec.2023.05.025.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 International Journal of Scientific Research in Science and Technology

This work is licensed under a Creative Commons Attribution 4.0 International License.