Text-to-Image Generation

282 papers with code • 11 benchmarks • 18 datasets

Text-to-Image Generation is a task in computer vision and natural language processing where the goal is to generate an image that corresponds to a given textual description. This involves converting the text input into a meaningful representation, such as a feature vector, and then using this representation to generate an image that matches the description.

Benchmarks

Add a Result

These leaderboards are used to track progress in Text-to-Image Generation

Dataset	Best Model	Compare
MS COCO	Parti Finetuned	See all
CUB	TLDM	See all
Multi-Modal-CelebA-HQ	Swinv2-Imagen	See all
Oxford 102 Flowers	VQ-Diffusion-F	See all
Conceptual Captions	Contextual RQ-Transformer	See all
LHQC	NUWA-Infinity	See all
MS-COCO	AttnGAN	See all
GeNeVA (CoDraw)	LatteGAN	See all
GeNeVA (i-CLEVR)	LatteGAN	See all
LAION COCO	Parti Finetuned	See all
Colors	BiLSTMS on color generation	See all

Show all 11 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Text-to-Image Generation models and implementations

faceonlive/ai-research

4 papers

233

hanzhanggit/StackGAN

3 papers

1,852

kakaobrain/rq-vae-transformer

3 papers

703

hanzhanggit/StackGAN-Pytorch

3 papers

482

See all 18 libraries.

Datasets

Subtasks

Concept Alignment

Conditional Text-to-Image Synthesis

Consistent Character Generation

DreamBooth Personalized Generation

Most implemented papers

Most implemented Social Latest No code

Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference

luosiallen/latent-consistency-model • • 6 Oct 2023

Inspired by Consistency Models (song et al.), we propose Latent Consistency Models (LCMs), enabling swift inference with minimal steps on any pre-trained LDMs, including Stable Diffusion (rombach et al).

Paper
Code

Navigating the Synthetic Realm: Harnessing Diffusion-based Models for Laparoscopic Text-to-Image Generation

simeonallmendinger/syntheticimagegeneration • • 5 Dec 2023

We demonstrate the usage of state-of-the-art text-to-image architectures in the context of laparoscopic imaging with regard to the surgical removal of the gallbladder as an example.

Paper
Code

Generating Images from Captions with Attention

emansim/text2image • 9 Nov 2015

Motivated by the recent progress in generative models, we introduce a model that generates images from natural language descriptions.

Paper
Code

MC-GAN: Multi-conditional Generative Adversarial Network for Image Synthesis

HYOJINPARK/MC_GAN • • 3 May 2018

This block enables MC-GAN to generate a realistic object image with the desired background by controlling the amount of the background information from the given base image using the foreground information from the text attributes.

Paper
Code

MirrorGAN: Learning Text-to-image Generation by Redescription

komiya-m/MirrorGAN • • CVPR 2019

Generating an image from a given text description has two goals: visual realism and semantic consistency.

Paper
Code

Controllable Text-to-Image Generation

mrlibw/ControlGAN • • NeurIPS 2019

In this paper, we propose a novel controllable text-to-image generative adversarial network (ControlGAN), which can effectively synthesise high-quality images and also control parts of the image generation according to natural language descriptions.

Paper
Code

Semantic Object Accuracy for Generative Text-to-Image Synthesis

tohinz/semantic-object-accuracy-for-generative-text-to-image-synthesis • • 29 Oct 2019

To address these challenges we introduce a new model that explicitly models individual objects within an image and a new evaluation metric called Semantic Object Accuracy (SOA) that specifically evaluates images given an image caption.

Paper
Code

Towards Open-World Text-Guided Face Image Generation and Manipulation

weihaox/TediGAN • • 18 Apr 2021

To be specific, we propose a brand new paradigm of text-guided image generation and manipulation based on the superior characteristics of a pretrained GAN model.

Paper
Code

LAFITE: Towards Language-Free Training for Text-to-Image Generation

drboog/Lafite • • 27 Nov 2021

One of the major challenges in training text-to-image generation models is the need of a large number of high-quality image-text pairs.

Paper
Code

Vector Quantized Diffusion Model for Text-to-Image Synthesis

cientgu/vq-diffusion • • CVPR 2022

Our experiments indicate that the VQ-Diffusion model with the reparameterization is fifteen times faster than traditional AR methods while achieving a better image quality.

Paper
Code

Text-to-Image Generation

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Most implemented papers

Content

Benchmarks

Add a Result