Text-to-Image Generation

300 papers with code • 11 benchmarks • 19 datasets

Text-to-Image Generation is a task in computer vision and natural language processing where the goal is to generate an image that corresponds to a given textual description. This involves converting the text input into a meaningful representation, such as a feature vector, and then using this representation to generate an image that matches the description.

Libraries

Use these libraries to find Text-to-Image Generation models and implementations

Most implemented papers

LAFITE: Towards Language-Free Training for Text-to-Image Generation

drboog/Lafite 27 Nov 2021

One of the major challenges in training text-to-image generation models is the need of a large number of high-quality image-text pairs.

Vector Quantized Diffusion Model for Text-to-Image Synthesis

cientgu/vq-diffusion CVPR 2022

Our experiments indicate that the VQ-Diffusion model with the reparameterization is fifteen times faster than traditional AR methods while achieving a better image quality.

Exploration into Translation-Equivariant Image Quantization

wcshin-git/te-vqgan 1 Dec 2021

This is an exploratory study that discovers the current image quantization (vector quantization) do not satisfy translation equivariance in the quantized space due to aliasing.

ERNIE-ViLG: Unified Generative Pre-training for Bidirectional Vision-Language Generation

PaddlePaddle/PaddleNLP 31 Dec 2021

To explore the landscape of large-scale pre-training for bidirectional text-image generation, we train a 10-billion parameter ERNIE-ViLG model on a large-scale dataset of 145 million (Chinese) image-text pairs which achieves state-of-the-art performance for both text-to-image and image-to-text tasks, obtaining an FID of 7. 9 on MS-COCO for text-to-image synthesis and best results on COCO-CN and AIC-ICC for image captioning.

DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generation Models

j-min/dalleval ICCV 2023

In this work, we investigate the visual reasoning capabilities and social biases of different text-to-image models, covering both multimodal transformer language models and diffusion models.

CLIP-GEN: Language-Free Training of a Text-to-Image Generator with CLIP

HFAiLab/clip-gen 1 Mar 2022

Once trained, the transformer can generate coherent image tokens based on the text embedding extracted from the text encoder of CLIP upon an input text.

Scaling Autoregressive Models for Content-Rich Text-to-Image Generation

lucidrains/parti-pytorch 22 Jun 2022

We present the Pathways Autoregressive Text-to-Image (Parti) model, which generates high-fidelity photorealistic images and supports content-rich synthesis involving complex compositions and world knowledge.

Diffusion Models: A Comprehensive Survey of Methods and Applications

YangLing0818/Diffusion-Models-Papers-Survey-Taxonomy 2 Sep 2022

This survey aims to provide a contextualized, in-depth look at the state of diffusion models, identifying the key areas of focus and pointing to potential areas for further exploration.

Character-Centric Story Visualization via Visual Planning and Token Alignment

sairin1202/vp-csv 16 Oct 2022

This task requires machines to 1) understand long text inputs and 2) produce a globally consistent image sequence that illustrates the contents of the story.

ERNIE-ViLG 2.0: Improving Text-to-Image Diffusion Model with Knowledge-Enhanced Mixture-of-Denoising-Experts

PaddlePaddle/ERNIE-ViLG CVPR 2023

Recent progress in diffusion models has revolutionized the popular technology of text-to-image generation.