Development of Naïve Algorithm for Generation of Digital Image by Generative Adversarial Text using Convolutional Generative Adversarial Network Algorithm
Keywords:
Text To Image Conversion, eep Convolutional Generative Adversarial Networks, Virtual Reality, Augmented Reality, MS-COCO DatasetAbstract
Text-to-image synthesis is a novel endeavor within the realm of picture synthesis. In previous studies, the primary objective of text-to-image synthesis was to match words and pictures by retrieval based on sentences or keywords. The advancement of deep learning, particularly the use of deep generative models in picture synthesis, has led to significant advances in image synthesis. Generative adversarial networks (GANs) are very influential generative models that have found effective applications in computer vision, natural language processing, and other fields. This paper aims to comprehensively examine and consolidate the latest research on text-to-image synthesis using Generative Adversarial Networks (GANs). The input for GANs-based text-to-image synthesis now encompasses not just the conventional text description, but also incorporates scene layout and conversation text. It may be categorized into three classes based on advancements in text information usage, network topology, and output control conditions. Deep convolutional generative adversarial networks (GANs) are capable of producing visually captivating pictures that belong to certain categories, such as album covers, room interiors, and faces. In this study, we propose a new and innovative deep architecture and GAN formulation to efficiently connect the progress made in text and picture modeling. Our approach aims to convert visual notions from letters to pixels. We showcase the proficiency of our model in producing realistic photos of birds and flowers based on elaborate textual descriptions.
Downloads
References
E. Perez, F. Strub, H. De Vries, V. Dumoulin, A. Courville, FiLM: Visual reasoning with a general conditioning layer, arXiv preprint arXiv:1709.07871, 2017.
T. Miyato, T. Kataoka, M. Koyama, Y. Yoshida, Spectral Normalization for Generative Adversarial Networks, arXiv:1802.05957, 2018.
M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, S. Hochreiter, GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium, arXiv:1706.08500, 2017.
T. Qiao, J. Zhang, D. Xu, D. Tao, MirrorGAN: Learning Text-to-image Generation by Redescription, arXiv:1903.05854, 2019.
J. Zhu, T. Park, P. Isola, A.A. Efros, Unpaired image-to-image translation using cycle-consistent adversarial networks, Computer Vision and Pattern Recognition (CVPR) (2017) 2242–2251.
Z. Zhang, Y. Xie, L. Yang, Photographic text-to-image synthesis with a hierarchically-nested adversarial network, Conference on Computer Vision and Pattern Recognition (2018) 6199–6208.
M. Zhu, P. Pan, W. Chen, Y. Yang, DM-GAN: Dynamic memory generative adversarial networks for text-to-image synthesis, in, IEEE Computer Vision and Pattern Recognition(CVPR) (2019) 5802–5810.
J. Sun, Y. Zhou, B. Zhang, ResFPA-GAN: Text-to-Image Synthesis with Generative Adversarial Network Based on Residual Block Feature Pyramid Attention, in: IEEE International Conference on Advanced Robotics and its Social Impacts (ARSO), 2019, pp. 317–322
M. Arjovsky, L. Bottou, Towards Principled Methods for Training Generative Adversarial Networks, arXiv:1701.04862, 2017.
C.K. Sønderby, J. Caballero, L. Theis, W. Shi, F. Huszár, Amortised MAP Inference for Image Super-resolution, arXiv:1610.04490, 2019.
C. Doersch, Tutorial on Variational Autoencoders, arXiv:1606.05908, 2016.
A.B.L. Larsen, S.K. Sønderby, H. Larochelle, O. Winther, Autoencoding beyond pixels using a learned similarity metric, in, International Conference on Machine Learning(ICML) (2020) 2341–2349.
T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, X. Chen, Improved Techniques for Training GANs, arXiv:1606.03498, 2016.
H. Dong, S. Yu, C.Wu, Y. Guo, semantic image synthesis via adversarial learning, International Conference on Computer Vision (ICCV) (2017) 5707–5715.
C. Gulcehre, S. Chandar, K. Cho, Y. Bengio, Dynamic neural turing machine with continuous and discrete addressing schemes, Neural Computation 30 (4) (2018) 857–884
A. Dash, J.C.B. Gamboa, S. Ahmed, M. Liwicki, M.Z. Afzal, TAC-GAN – Text Conditioned Auxiliary Classifier Generative Adversarial Network, arXiv: 1703.06412, 2017.
H. Park, Y. Yoo, N. Kwak, MC-GAN: Multi-conditional Generative Adversarial Network for Image Synthesis, arXiv:1805.01123v5, 2023.
M. Cha, Y.L. Gwon, H.T. Kung, Adversarial Learning of Semantic Relevance in Text to Image Synthesis, Proceedings of the AAAI Conference on Artificial Intelligence, 33 (2019) 3272–3279.
P. Anderson, B. Fernando, M. Johnson, S. Gould, SPICE: Semantic propositional image caption evaluation, Adaptive Behavior 11 (4) (2016) 382–398.
S. Liu, Z. Zhu, N. Ye, S. Guadarrama, K. Murphy, Improved image captioning via policy gradient optimization of SPIDEr, IEEE International Conference on Computer Vision (ICCV) (2017) 873–881.
J. Johnson, A. Gupta, L. Fei-Fei, Image generation from scene graphs, Conference on Computer Vision and Pattern Recognition (2018) 1219–1228.
Q. Chen, V. Koltun, Photographic image synthesis with cascaded refinement networks, International Conference on Computer Vision (ICCV) (2019) 1520–1529.
S. Hong, D. Yang, J. Choi, H. Lee, Inferring semantic layout for hierarchical text-to-image synthesis, Computer Vision and Pattern Recognition(CVPR) (2022) 7986–7994.
Downloads
Published
Issue
Section
License
Copyright (c) 2024 International Journal of Scientific Research in Science and Technology
This work is licensed under a Creative Commons Attribution 4.0 International License.