Descriptive
329 papers with code • 1 benchmarks • 1 datasets
Most implemented papers
Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books
Books are a rich source of both fine-grained information, how a character, an object or a scene looks like, as well as high-level semantics, what someone is thinking, feeling and how these states evolve through a story.
Improving LSTM-based Video Description with Linguistic Knowledge Mined from Text
This paper investigates how linguistic knowledge mined from large text corpora can aid the generation of natural language descriptions of videos.
A Hierarchical Approach for Generating Descriptive Image Paragraphs
Recent progress on image captioning has made it possible to generate novel sentences describing images in natural language, but compressing an image into a single sentence can describe visual content in only coarse detail.
PL-SLAM: a Stereo SLAM System through the Combination of Points and Line Segments
This paper proposes PL-SLAM, a stereo visual SLAM system that combines both points and line segments to work robustly in a wider variety of scenarios, particularly in those where point features are scarce or not well-distributed in the image.
CLEVRER: CoLlision Events for Video REpresentation and Reasoning
While these models thrive on the perception-based task (descriptive), they perform poorly on the causal tasks (explanatory, predictive and counterfactual), suggesting that a principled approach for causal reasoning should incorporate the capability of both perceiving complex visual and language inputs, and understanding the underlying dynamics and causal relations.
Uninformed Students: Student-Teacher Anomaly Detection with Discriminative Latent Embeddings
Our experiments demonstrate improvements over state-of-the-art methods on a number of real-world datasets, including the recently introduced MVTec Anomaly Detection dataset that was specifically designed to benchmark anomaly segmentation algorithms.
Generating images from caption and vice versa via CLIP-Guided Generative Latent Space Search
In this research work we present CLIP-GLaSS, a novel zero-shot framework to generate an image (or a caption) corresponding to a given caption (or image).
Visual Classification via Description from Large Language Models
By basing decisions on these descriptors, we can provide additional cues that encourage using the features we want to be used.
Music transcription modelling and composition using deep learning
We apply deep learning methods, specifically long short-term memory (LSTM) networks, to music transcription modelling and composition.
Picture It In Your Mind: Generating High Level Visual Representations From Textual Descriptions
We choose to implement the actual search process as a similarity search in a visual feature space, by learning to translate a textual query into a visual representation.