Image Caption Generation using Natural Language Processing
Keywords:
Deep Learning, part of speech, image captioning, multi-task learningAbstract
An image-based web crawler is a web crawler that searches for data using similar photos. On the web, there is a large assortment of image assets, with a significant proportion of photographs carrying both named and unidentified captions. Users must sort through the photos according to their requirements. A significant proportion of users are unable to recover the necessary images as a result of their unanticipated appropriate inscription on their photographs. The goal of our project is to create an automated photo caption depending on the image quality. To begin, a picture's content should be easily understood, followed by a statement or declaration that is consistent with the image's grammatical laws and semantical information. Computer vision and natural language processing technologies are required to merge the two forms of material, which is a difficult task. The goal of the paper is to generate mechanical inscriptions by analysing the information of an image. Currently, pictures must be removed through human involvement, which is nearly impossible in large databases. As a contribution, the picture information base is sent to a deep neural network. The Convolutional Neural Network encoder creates captions that extract the image's highlights and nuances, while the Recurrent Neural Network decoder interprets the image's highlights and articles to produce a continuous, intelligible description of the image.
References
- J. Lu, C. Xiong, D. Parikh, and R. Socher, “Knowing when to look: Adaptive attention via a visual sentinel for image captioning,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 3242–3250.
- P. Anderson et al., “Bottom-up and top-down attention for image captioning and visual question answering,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Jun. 2018, pp. 6077–6086.
- L. Chen et al., “SCA-CNN: Spatial and channel-wise attention in convolutional networks for image captioning,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 5659–5667.
- T. Yao, Y. Pan, Y. Li, Z. Qiu, and T. Mei, “Boosting image captioning with attributes,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Oct. 2017, pp. 4904–4912.
- X. Yang, K. Tang, H. Zhang, and J. Cai, “Auto-encoding scene graphs for image captioning,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 10685–10694.
- M. Cornia, L. Baraldi, G. Serra, and R. Cucchiara, “Paying more attention to saliency: Image captioning with saliency and context attention,” ACM Transactions on Multimedia Computing, Communications, and Applications, vol. 14, no. 2, p. 48, 2018.
- M. Yang, W. Zhao, W. Xu, Y. Feng, Z. Zhao, X. Chen, and K. Lei, “Multitask learning for cross-domain image captioning,” IEEE Transactions on Multimedia, vol. 21, no. 4, pp. 1047–1061, 2018.
- X. Xiao, L. Wang, K. Ding, S. Xiang, and C. Pan, “Deep hierarchical encoder-decoder network for image captioning,” IEEE Transactions on Multimedia, 2019.
- J. H. Tan, C. S. Chan, and J. H. Chuah, “Comic: Towards a compact image captioning model with attention,” IEEE Transactions on Multimedia, 2019.
- X. Li and S. Jiang, “Know more say less: Image captioning based on scene graphs,” IEEE Transactions on Multimedia, 2019.
- Z. Zhang, Q. Wu, Y. Wang, and F. Chen, “High-quality image captioning with fine-grained and semantic-guided visual attention,” IEEE Transactions on Multimedia, vol. 21, no. 7, pp. 1681–1693, 2018.
- M. Tanti, A. Gatt, and K. P. Camilleri, “Where to put the image in an image caption generator,” Natural Language Engineering, vol. 24, no. 3, pp. 467–489, 2018.
Downloads
Published
Issue
Section
License
Copyright (c) IJSRST

This work is licensed under a Creative Commons Attribution 4.0 International License.