Text-to-Image Generation
282 papers with code • 11 benchmarks • 18 datasets
Text-to-Image Generation is a task in computer vision and natural language processing where the goal is to generate an image that corresponds to a given textual description. This involves converting the text input into a meaningful representation, such as a feature vector, and then using this representation to generate an image that matches the description.
Libraries
Use these libraries to find Text-to-Image Generation models and implementationsDatasets
Subtasks
Most implemented papers
Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference
Inspired by Consistency Models (song et al.), we propose Latent Consistency Models (LCMs), enabling swift inference with minimal steps on any pre-trained LDMs, including Stable Diffusion (rombach et al).
Navigating the Synthetic Realm: Harnessing Diffusion-based Models for Laparoscopic Text-to-Image Generation
We demonstrate the usage of state-of-the-art text-to-image architectures in the context of laparoscopic imaging with regard to the surgical removal of the gallbladder as an example.
Generating Images from Captions with Attention
Motivated by the recent progress in generative models, we introduce a model that generates images from natural language descriptions.
MC-GAN: Multi-conditional Generative Adversarial Network for Image Synthesis
This block enables MC-GAN to generate a realistic object image with the desired background by controlling the amount of the background information from the given base image using the foreground information from the text attributes.
MirrorGAN: Learning Text-to-image Generation by Redescription
Generating an image from a given text description has two goals: visual realism and semantic consistency.
Controllable Text-to-Image Generation
In this paper, we propose a novel controllable text-to-image generative adversarial network (ControlGAN), which can effectively synthesise high-quality images and also control parts of the image generation according to natural language descriptions.
Semantic Object Accuracy for Generative Text-to-Image Synthesis
To address these challenges we introduce a new model that explicitly models individual objects within an image and a new evaluation metric called Semantic Object Accuracy (SOA) that specifically evaluates images given an image caption.
Towards Open-World Text-Guided Face Image Generation and Manipulation
To be specific, we propose a brand new paradigm of text-guided image generation and manipulation based on the superior characteristics of a pretrained GAN model.
LAFITE: Towards Language-Free Training for Text-to-Image Generation
One of the major challenges in training text-to-image generation models is the need of a large number of high-quality image-text pairs.
Vector Quantized Diffusion Model for Text-to-Image Synthesis
Our experiments indicate that the VQ-Diffusion model with the reparameterization is fifteen times faster than traditional AR methods while achieving a better image quality.