Scene Graph Generation
113 papers with code • 5 benchmarks • 7 datasets
A scene graph is a structured representation of an image, where nodes in a scene graph correspond to object bounding boxes with their object categories, and edges correspond to their pairwise relationships between objects. The task of Scene Graph Generation is to generate a visually-grounded scene graph that most accurately correlates with an image.
Libraries
Use these libraries to find Scene Graph Generation models and implementationsMost implemented papers
RLIPv2: Fast Scaling of Relational Language-Image Pre-training
In this paper, we propose RLIPv2, a fast converging model that enables the scaling of relational pre-training to large-scale pseudo-labelled scene graph data.
Panoptic Video Scene Graph Generation
PVSG relates to the existing video scene graph generation (VidSGG) problem, which focuses on temporal interactions between humans and objects grounded with bounding boxes in videos.
4D Panoptic Scene Graph Generation
To facilitate research in this new area, we build a richly annotated PSG-4D dataset consisting of 3K RGB-D videos with a total of 1M frames, each of which is labeled with 4D panoptic segmentation masks as well as fine-grained, dynamic scene graphs.
Visual Graphs from Motion (VGfM): Scene understanding with object geometry reasoning
Recent approaches on visual scene understanding attempt to build a scene graph -- a computational representation of objects and their pairwise relationships.
Relation Transformer Network
In this work, we propose a novel transformer formulation for scene graph generation and relation prediction.
Learning Visual Commonsense for Robust Scene Graph Generation
Scene graph generation models understand the scene through object and predicate recognition, but are prone to mistakes due to the challenges of perception in the wild.
Learning and Reasoning with the Graph Structure Representation in Robotic Surgery
Learning to infer graph representations and performing spatial reasoning in a complex surgical environment can play a vital role in surgical scene understanding in robotic surgery.
SceneGraphFusion: Incremental 3D Scene Graph Prediction from RGB-D Sequences
Scene graphs are a compact and explicit representation successfully used in a variety of 2D scene understanding tasks.
Fine-Grained Scene Graph Generation with Data Transfer
Scene graph generation (SGG) is designed to extract (subject, predicate, object) triplets in images.
Manga109Dialog: A Large-scale Dialogue Dataset for Comics Speaker Detection
For further understanding of comics, an automated approach is needed to link text in comics to characters speaking the words.