Text-to-Image Generation

282 papers with code • 11 benchmarks • 18 datasets

Text-to-Image Generation is a task in computer vision and natural language processing where the goal is to generate an image that corresponds to a given textual description. This involves converting the text input into a meaningful representation, such as a feature vector, and then using this representation to generate an image that matches the description.

Libraries

Use these libraries to find Text-to-Image Generation models and implementations

Most implemented papers

Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference

luosiallen/latent-consistency-model 6 Oct 2023

Inspired by Consistency Models (song et al.), we propose Latent Consistency Models (LCMs), enabling swift inference with minimal steps on any pre-trained LDMs, including Stable Diffusion (rombach et al).

Navigating the Synthetic Realm: Harnessing Diffusion-based Models for Laparoscopic Text-to-Image Generation

simeonallmendinger/syntheticimagegeneration 5 Dec 2023

We demonstrate the usage of state-of-the-art text-to-image architectures in the context of laparoscopic imaging with regard to the surgical removal of the gallbladder as an example.

Generating Images from Captions with Attention

emansim/text2image 9 Nov 2015

Motivated by the recent progress in generative models, we introduce a model that generates images from natural language descriptions.

MC-GAN: Multi-conditional Generative Adversarial Network for Image Synthesis

HYOJINPARK/MC_GAN 3 May 2018

This block enables MC-GAN to generate a realistic object image with the desired background by controlling the amount of the background information from the given base image using the foreground information from the text attributes.

MirrorGAN: Learning Text-to-image Generation by Redescription

komiya-m/MirrorGAN CVPR 2019

Generating an image from a given text description has two goals: visual realism and semantic consistency.

Controllable Text-to-Image Generation

mrlibw/ControlGAN NeurIPS 2019

In this paper, we propose a novel controllable text-to-image generative adversarial network (ControlGAN), which can effectively synthesise high-quality images and also control parts of the image generation according to natural language descriptions.

Semantic Object Accuracy for Generative Text-to-Image Synthesis

tohinz/semantic-object-accuracy-for-generative-text-to-image-synthesis 29 Oct 2019

To address these challenges we introduce a new model that explicitly models individual objects within an image and a new evaluation metric called Semantic Object Accuracy (SOA) that specifically evaluates images given an image caption.

Towards Open-World Text-Guided Face Image Generation and Manipulation

weihaox/TediGAN 18 Apr 2021

To be specific, we propose a brand new paradigm of text-guided image generation and manipulation based on the superior characteristics of a pretrained GAN model.

LAFITE: Towards Language-Free Training for Text-to-Image Generation

drboog/Lafite 27 Nov 2021

One of the major challenges in training text-to-image generation models is the need of a large number of high-quality image-text pairs.

Vector Quantized Diffusion Model for Text-to-Image Synthesis

cientgu/vq-diffusion CVPR 2022

Our experiments indicate that the VQ-Diffusion model with the reparameterization is fifteen times faster than traditional AR methods while achieving a better image quality.