
331 papers with code • 1 benchmarks • 1 datasets

This task has no description! Would you like to contribute one?


Most implemented papers

Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books

soskek/homemade_bookcorpus ICCV 2015

Books are a rich source of both fine-grained information, how a character, an object or a scene looks like, as well as high-level semantics, what someone is thinking, feeling and how these states evolve through a story.

Improving LSTM-based Video Description with Linguistic Knowledge Mined from Text

TejInaco/multimodalML EMNLP 2016

This paper investigates how linguistic knowledge mined from large text corpora can aid the generation of natural language descriptions of videos.

A Hierarchical Approach for Generating Descriptive Image Paragraphs

chenxinpeng/im2p CVPR 2017

Recent progress on image captioning has made it possible to generate novel sentences describing images in natural language, but compressing an image into a single sentence can describe visual content in only coarse detail.

PL-SLAM: a Stereo SLAM System through the Combination of Points and Line Segments

rubengooj/pl-slam 26 May 2017

This paper proposes PL-SLAM, a stereo visual SLAM system that combines both points and line segments to work robustly in a wider variety of scenarios, particularly in those where point features are scarce or not well-distributed in the image.

CLEVRER: CoLlision Events for Video REpresentation and Reasoning

chuangg/CLEVRER ICLR 2020

While these models thrive on the perception-based task (descriptive), they perform poorly on the causal tasks (explanatory, predictive and counterfactual), suggesting that a principled approach for causal reasoning should incorporate the capability of both perceiving complex visual and language inputs, and understanding the underlying dynamics and causal relations.

Uninformed Students: Student-Teacher Anomaly Detection with Discriminative Latent Embeddings

denguir/student-teacher-anomaly-detection CVPR 2020

Our experiments demonstrate improvements over state-of-the-art methods on a number of real-world datasets, including the recently introduced MVTec Anomaly Detection dataset that was specifically designed to benchmark anomaly segmentation algorithms.

Generating images from caption and vice versa via CLIP-Guided Generative Latent Space Search

galatolofederico/clip-glass 2 Feb 2021

In this research work we present CLIP-GLaSS, a novel zero-shot framework to generate an image (or a caption) corresponding to a given caption (or image).

Visual Classification via Description from Large Language Models

sachit-menon/classify_by_description_release 13 Oct 2022

By basing decisions on these descriptors, we can provide additional cues that encourage using the features we want to be used.

Music transcription modelling and composition using deep learning

IraKorshunova/folk-rnn 29 Apr 2016

We apply deep learning methods, specifically long short-term memory (LSTM) networks, to music transcription modelling and composition.

Picture It In Your Mind: Generating High Level Visual Representations From Textual Descriptions

AlexMoreo/tensorflow-Tex2Vis 23 Jun 2016

We choose to implement the actual search process as a similarity search in a visual feature space, by learning to translate a textual query into a visual representation.