🔔 Share your dataset with the ML community!

Filter by Modality

Filter by Task (clear)

Filter by Language

62 dataset results for Object Tracking

BioDrone is the first bionic drone-based single object tracking benchmark, it features videos captured from a flapping-wing UAV system with a major camera shake due to its aerodynamics. BioDrone highlights the tracking of tiny targets with drastic changes between consecutive frames, providing a new robust vision benchmark for SOT. 1. Large-scale and high-quality benchmark with robust vision challenges 2. Rich challenging factor annotation 3. Videos from Bionic-based UAV 4. Tracking baselines with comprehensive experimental analyses

1 PAPER • NO BENCHMARKS YET

DIVOTrack

DIVOTrack is a cross-view multi-object tracking dataset for DIVerse Open scenes with dense tracking pedestrians in realistic and non-experimental environments. DIVOTrack has ten distinct scenarios and 550 cross-view tracks.

1 PAPER • NO BENCHMARKS YET

HT1080WT cells - 3D collagen type I matrices

HT1080WT cells - 3D collagen type I matrices (HT1080WT cells embedded in 3D collagen type I matrices - manual annotations for cell instance segmentation and tracking)

Human fibrosarcoma HT1080WT (ATCC) cells at low cell densities embedded in 3D collagen type I matrices [1]. The time-lapse videos were recorded every 2 minutes for 16.7 hours and covered a field of view of 1002 pixels × 1004 pixels with a pixel size of 0.802 μm/pixel The videos were pre-processed to correct frame-to-frame drift artifacts, resulting in a final size of 983 pixels × 985 pixels pixels.

1 PAPER • NO BENCHMARKS YET

Lindenthal Camera Traps

This data set contains 775 video sequences, captured in the wildlife park Lindenthal (Cologne, Germany) as part of the AMMOD project, using an Intel RealSense D435 stereo camera. In addition to color and infrared images, the D435 is able to infer the distance (or “depth”) to objects in the scene using stereo vision. Observed animals include various birds (at daytime) and mammals such as deer, goats, sheep, donkeys, and foxes (primarily at nighttime). A subset of 412 images is annotated with a total of 1038 individual animal annotations, including instance masks, bounding boxes, class labels, and corresponding track IDs to identify the same individual over the entire video.

1 PAPER • NO BENCHMARKS YET

MPHOI-72 (Multi-person Human-object Interaction Dataset 72)

MPHOI-72 is a multi-person human-object interaction dataset that can be used for a wide variety of HOI/activity recognition and pose estimation/object tracking tasks. The dataset is challenging due to many body occlusions among the humans and objects. It consists of 72 videos captured from 3 different angles at 30 fps, with totally 26,383 frames and an average length of 12 seconds. It involves 5 humans performing in pairs, 6 object types, 3 activities and 13 sub-activities. The dataset includes color video, depth video, human skeletons, human and object bounding boxes.

1 PAPER • NO BENCHMARKS YET

MobiFace

MobiFace is the first dataset for single face tracking in mobile situations. It consists of 80 unedited live-streaming mobile videos captured by 70 different smartphone users in fully unconstrained environments. Over 95K bounding boxes are manually labelled. The videos are carefully selected to cover typical smartphone usage. The videos are also annotated with 14 attributes, including 6 newly proposed attributes and 8 commonly seen in object tracking.

1 PAPER • NO BENCHMARKS YET

PESMOD (PExels Small Moving Object Detection)

The PESMOD (PExels Small Moving Object Detection) dataset consists of high resolution aerial images in which moving objects are labelled manually. It was created from videos selected from the Pexels website. The aim of this dataset is to provide a different and challenging dataset for moving object detection methods evaluation. Each moving object is labelled for each frame with PASCAL VOC format in a XML file. The dataset consists of 8 different video sequences.

1 PAPER • NO BENCHMARKS YET

PersonPath22

PersonPath22 is a large-scale multi-person tracking dataset containing 236 videos captured mostly from static-mounted cameras, collected from sources where we were given the rights to redistribute the content and participants have given explicit consent. Each video has ground-truth annotations including both bounding boxes and tracklet-ids for all the persons in each frame.

1 PAPER • 1 BENCHMARK

RMOT-223

In this dataset, various objects are arranged on a white table. A UR5e robot picks and place a target object specified on the title of the video/image sequence. Videos under auto- folder are collected with automatic operation of the robot. Videos under human- folders are collected with the tele-operation of the robot. Ground-truth tracking bounding boxes are generated with STARK, and when the target exits the camera frame, the bounding box estimation is switched to [-1, -1, -1, -1], indicating target not shown.

1 PAPER • NO BENCHMARKS YET

S-ODv2 (SeaDronesSee-Object Detection v2)

SeaDronesSee-Object Detection v2 (S-ODv2) dataset contains 14,227 RGB images (training: 8,930; validation: 1,547; testing: 3,750). The images are captured from various altitudes and viewing angles ranging from 5 to 260 meters and 0 to 90° degrees (gimbal pitch angle) while providing the respective meta information for altitude, viewing angle and other meta data for almost all frames.

1 PAPER • NO BENCHMARKS YET

SFU-HW-Tracks

SFU-HW-Tracks is a dataset for Object Tracking on raw video sequences that contains object annotations with unique object identities (IDs) for the High Efficiency Video Coding (HEVC) v1 Common Test Conditions (CTC) sequences. Ground-truth annotations for 13 sequences were prepared and released as the dataset called SFU-

1 PAPER • NO BENCHMARKS YET

SOTVerse

SOTVerse is a user-defined task space of single object tracking. It allows users to customize SOT tasks according to their research purposes, which on the one hand makes research more targeted, and on the other hand can significantly improve the efficiency of research.

1 PAPER • NO BENCHMARKS YET

TREK-100

The dataset is composed of 100 video sequences densely annotated with 60K bounding boxes, 17 sequence attributes, 13 action verb attributes and 29 target object attributes.

1 PAPER • NO BENCHMARKS YET

aiMotive Dataset (aiMotive Multimodal Dataset)

aiMotive dataset is a multimodal dataset for robust autonomous driving with long-range perception. The dataset consists of 176 scenes with synchronized and calibrated LiDAR, camera, and radar sensors covering a 360-degree field of view. The collected data was captured in highway, urban, and suburban areas during daytime, night, and rain and is annotated with 3D bounding boxes with consistent identifiers across frames.

1 PAPER • 1 BENCHMARK

Datasets

62 dataset results for Object Tracking