Text-to-Image Generation

300 papers with code • 11 benchmarks • 19 datasets

Text-to-Image Generation is a task in computer vision and natural language processing where the goal is to generate an image that corresponds to a given textual description. This involves converting the text input into a meaningful representation, such as a feature vector, and then using this representation to generate an image that matches the description.

Benchmarks

Add a Result

These leaderboards are used to track progress in Text-to-Image Generation

Dataset	Best Model	Compare
MS COCO	Parti Finetuned	See all
CUB	TLDM	See all
Multi-Modal-CelebA-HQ	Swinv2-Imagen	See all
Oxford 102 Flowers	VQ-Diffusion-F	See all
Conceptual Captions	Contextual RQ-Transformer	See all
LHQC	NUWA-Infinity	See all
MS-COCO	AttnGAN	See all
GeNeVA (CoDraw)	LatteGAN	See all
GeNeVA (i-CLEVR)	LatteGAN	See all
LAION COCO	Parti Finetuned	See all
Colors	BiLSTMS on color generation	See all

Show all 11 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Text-to-Image Generation models and implementations

faceonlive/ai-research

4 papers

294

hanzhanggit/StackGAN

3 papers

1,852

kakaobrain/rq-vae-transformer

3 papers

713

hanzhanggit/StackGAN-Pytorch

3 papers

484

See all 18 libraries.

Datasets

Subtasks

Concept Alignment

Conditional Text-to-Image Synthesis

Consistent Character Generation

DreamBooth Personalized Generation

Most implemented papers

Most implemented Social Latest No code

LAFITE: Towards Language-Free Training for Text-to-Image Generation

drboog/Lafite • • 27 Nov 2021

One of the major challenges in training text-to-image generation models is the need of a large number of high-quality image-text pairs.

Paper
Code

Vector Quantized Diffusion Model for Text-to-Image Synthesis

cientgu/vq-diffusion • • CVPR 2022

Our experiments indicate that the VQ-Diffusion model with the reparameterization is fifteen times faster than traditional AR methods while achieving a better image quality.

Paper
Code

Exploration into Translation-Equivariant Image Quantization

wcshin-git/te-vqgan • • 1 Dec 2021

This is an exploratory study that discovers the current image quantization (vector quantization) do not satisfy translation equivariance in the quantized space due to aliasing.

Paper
Code

ERNIE-ViLG: Unified Generative Pre-training for Bidirectional Vision-Language Generation

PaddlePaddle/PaddleNLP • • 31 Dec 2021

To explore the landscape of large-scale pre-training for bidirectional text-image generation, we train a 10-billion parameter ERNIE-ViLG model on a large-scale dataset of 145 million (Chinese) image-text pairs which achieves state-of-the-art performance for both text-to-image and image-to-text tasks, obtaining an FID of 7. 9 on MS-COCO for text-to-image synthesis and best results on COCO-CN and AIC-ICC for image captioning.

Paper
Code

DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generation Models

j-min/dalleval • • ICCV 2023

In this work, we investigate the visual reasoning capabilities and social biases of different text-to-image models, covering both multimodal transformer language models and diffusion models.

Paper
Code

CLIP-GEN: Language-Free Training of a Text-to-Image Generator with CLIP

HFAiLab/clip-gen • • 1 Mar 2022

Once trained, the transformer can generate coherent image tokens based on the text embedding extracted from the text encoder of CLIP upon an input text.

Paper
Code

Scaling Autoregressive Models for Content-Rich Text-to-Image Generation

lucidrains/parti-pytorch • • 22 Jun 2022

We present the Pathways Autoregressive Text-to-Image (Parti) model, which generates high-fidelity photorealistic images and supports content-rich synthesis involving complex compositions and world knowledge.

Paper
Code

Diffusion Models: A Comprehensive Survey of Methods and Applications

YangLing0818/Diffusion-Models-Papers-Survey-Taxonomy • 2 Sep 2022

This survey aims to provide a contextualized, in-depth look at the state of diffusion models, identifying the key areas of focus and pointing to potential areas for further exploration.

Paper
Code

Character-Centric Story Visualization via Visual Planning and Token Alignment

sairin1202/vp-csv • • 16 Oct 2022

This task requires machines to 1) understand long text inputs and 2) produce a globally consistent image sequence that illustrates the contents of the story.

Paper
Code

ERNIE-ViLG 2.0: Improving Text-to-Image Diffusion Model with Knowledge-Enhanced Mixture-of-Denoising-Experts

PaddlePaddle/ERNIE-ViLG • CVPR 2023

Recent progress in diffusion models has revolutionized the popular technology of text-to-image generation.

Paper
Code

Text-to-Image Generation

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Most implemented papers

Content

Benchmarks

Add a Result