Automatic Image Caption Generation

A Hima Bindu; Marripelli Sharanya; K Srinidhi

doi:10.32628/IJSRST2310145

Authors

A Hima Bindu Assistant Professor, Department of CSE, Bhoj Reddy Engineering College for Woman, Vinay Nagar, Hyderabad-59, Telangana, India
Marripelli Sharanya B.Tech. Scholar, Department of CSE, Bhoj Reddy Engineering College for Woman, Vinay Nagar, Hyderabad-59, Telangana, India
K Srinidhi B.Tech. Scholar, Department of CSE, Bhoj Reddy Engineering College for Woman, Vinay Nagar, Hyderabad-59, Telangana, India

Keywords:

Computer Vision, Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), Xception, Flicker 8K, LSTM, Preprocessing.

Abstract

Computer vision has become omnipresent in our society, with uses in several fields. In this project, we specialize in one among the visually imparting recognition of images in computer vision, that is image captioning. The problem of generating language descriptions for images is still considered a problem which needs a resolution and this has been studied more regressively within the field of videos. From past few years more emphasis has been given to still images and their descriptions with human understandable natural language. The task of detecting scenes and object has become easier due studies that have taken place in last few years. The main motive of our project is to train convolutional neural networks and applying various hyper parameters with huge datasets of images like Flicker 8k and Resnet, and combining the results of these images and their classifiers with a recurrent neural and obtain the desired caption for the image. In this paper we would be presenting the detailed architecture of the image captioning model.

References

G Geetha,T.Kirthigadevi,G GODWIN Ponsam,T.Karthik,M.Safa,” Image Captioning Using Deep Convolutional Neural Networks(CNNs)” Published under licence by IOP Publishing Ltd in Journal of Physics :Conference Series ,Volume 1712, International Conference On Computational Physics in Emerging Technologies(ICCPET) 2020 August 2020,Manglore India in 2015.
Simonyan, Karen, and Andrew Zisserman. ”Very deep convolutional networks for large-scale image recognition.” arXiv preprint arXiv:1409.1556 (2014).
Donahue, Jeffrey, et al. ”Long-term recurrent convolutional networks for visual recogni-tion and description.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2015.
Lu, Jiasen, et al. ”Knowing when to look: Adaptive attention via a visual sentinel for im-age captioning.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Vol. 6. 2017. [5] Ordonez, Vicente, Girish Kulkarni, and Tamara L. Berg. ”Im2text: Describing images us-ing 1 million captioned photographs.” Advances in neural information processing systems. 2011. [9] Chen, Xinlei, and C. Lawrence Zitnick. ”Mind’s eye: A recurrent visual representation for image caption generation.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2015.
Feng, Yansong, and Mirella Lapata. ”How many words is a picture worth? automatic caption generation for news images.” Proceedings of the 48th annual meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2010.
Rashtchian, Cyrus, et al. ”Collecting image annotations using Amazon’s Mechanical Turk.” Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk. Association for Computational Linguistics, 2010

Automatic Image Caption Generation

Authors

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite