The Visual Task Adaptation Benchmark (VTAB) is a benchmark designed to evaluate general visual representations². It consists of a diverse and challenging suite of tasks². The benchmark defines a good general visual representation as one that yields good performance on unseen tasks, when trained on limited task-specific data².
The VTAB benchmark contains the following 19 tasks that are derived from public datasets¹: - Caltech101 - CIFAR-100 - CLEVR distance prediction - CLEVR counting - Diabetic Rethinopathy - Dmlab Frames - dSprites orientation prediction - dSprites location prediction - Describable Textures Dataset (DTD) - EuroSAT - KITTI distance prediction - 102 Category Flower Dataset - Oxford IIIT Pet dataset - PatchCamelyon - Resisc45 - Small NORB azimuth prediction - Small NORB elevation prediction - SUN397 - SVHN
The given model is independently fine-tuned for solving each of the above tasks¹. Average accuracy across all tasks is used to measure the model's performance¹. Detailed description of all tasks, evaluation protocol, and other details can be found in the VTAB paper¹.
(1) Visual Task Adaptation Benchmark. https://google-research.github.io/task_adaptation/. (2) GitHub - google-research/task_adaptation. https://github.com/google-research/task_adaptation. (3) GitHub - KMnP/vpt: ️ Visual Prompt Tuning [ECCV 2022] https://arxiv .... https://github.com/KMnP/vpt.
Paper | Code | Results | Date | Stars |
---|