TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Object Detection	COCO minival	Sparse R-CNN (PVTv2-B2)	box AP	50.1	# 76
Object Detection	COCO minival	Sparse R-CNN (PVTv2-B2)	AP50	69.5	# 23
Object Detection	COCO minival	Sparse R-CNN (PVTv2-B2)	AP75	54.9	# 18
Object Detection	COCO-O	PVTv2-B5 (Mask R-CNN)	Average mAP	28.2	# 23
Object Detection	COCO-O	PVTv2-B5 (Mask R-CNN)	Effective Robustness	6.85	# 17
Image Classification	ImageNet	PVTv2-B3	Top 1 Accuracy	83.2%	# 412
Image Classification	ImageNet	PVTv2-B3	Number of params	45.2M	# 710
Image Classification	ImageNet	PVTv2-B3	GFLOPs	6.9	# 248
Image Classification	ImageNet	PVTv2-B1	Top 1 Accuracy	78.7%	# 745
Image Classification	ImageNet	PVTv2-B1	Number of params	13.1M	# 508
Image Classification	ImageNet	PVTv2-B1	GFLOPs	2.1	# 151
Image Classification	ImageNet	PVTv2-B0	Top 1 Accuracy	70.5%	# 947
Image Classification	ImageNet	PVTv2-B0	Number of params	3.4M	# 375
Image Classification	ImageNet	PVTv2-B0	GFLOPs	0.6	# 65
Image Classification	ImageNet	PVTv2-B2	Top 1 Accuracy	82%	# 529
Image Classification	ImageNet	PVTv2-B2	Number of params	25.4M	# 597
Image Classification	ImageNet	PVTv2-B2	GFLOPs	4	# 191
Image Classification	ImageNet	PVTv2-B4	Top 1 Accuracy	83.8%	# 357
Image Classification	ImageNet	PVTv2-B4	Number of params	82M	# 810
Image Classification	ImageNet	PVTv2-B4	GFLOPs	11.8	# 313

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/pvtv2-improved-baselines-with-pyramid-vision/object-detection-on-coco-o)](https://paperswithcode.com/sota/object-detection-on-coco-o?p=pvtv2-improved-baselines-with-pyramid-vision)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/pvtv2-improved-baselines-with-pyramid-vision/object-detection-on-coco-minival)](https://paperswithcode.com/sota/object-detection-on-coco-minival?p=pvtv2-improved-baselines-with-pyramid-vision)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/pvtv2-improved-baselines-with-pyramid-vision/image-classification-on-imagenet)](https://paperswithcode.com/sota/image-classification-on-imagenet?p=pvtv2-improved-baselines-with-pyramid-vision)`

PVT v2: Improved Baselines with Pyramid Vision Transformer

25 Jun 2021 · Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, Ling Shao ·

Transformer recently has presented encouraging progress in computer vision. In this work, we present new baselines by improving the original Pyramid Vision Transformer (PVT v1) by adding three designs, including (1) linear complexity attention layer, (2) overlapping patch embedding, and (3) convolutional feed-forward network. With these modifications, PVT v2 reduces the computational complexity of PVT v1 to linear and achieves significant improvements on fundamental vision tasks such as classification, detection, and segmentation. Notably, the proposed PVT v2 achieves comparable or better performances than recent works such as Swin Transformer. We hope this work will facilitate state-of-the-art Transformer researches in computer vision. Code is available at https://github.com/whai362/PVT.

PDF Abstract

Code

Add Remove Mark official

whai362/PVT official

1,664

rwightman/pytorch-image-models

30,320

open-mmlab/mmdetection

28,266

PaddlePaddle/PaddleClas

5,298

open-mmlab/mmpose

5,162

See all 17 implementations

Tasks

Add Remove

Image Classification

Object Detection

Panoptic Segmentation

Datasets

ImageNet

MS COCO

ADE20K ImageNet-1K

COCO-O

Results from the Paper

Edit

Ranked #23 on Object Detection on COCO-O

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Object Detection	COCO minival	Sparse R-CNN (PVTv2-B2)	box AP	50.1	# 76	Compare
			AP50	69.5	# 23	Compare
			AP75	54.9	# 18	Compare
Object Detection	COCO-O	PVTv2-B5 (Mask R-CNN)	Average mAP	28.2	# 23	Compare
Object Detection	COCO-O	PVTv2-B5 (Mask R-CNN)	Effective Robustness	6.85	# 17	Compare
Image Classification	ImageNet	PVTv2-B3	Top 1 Accuracy	83.2%	# 412	Compare
			Number of params	45.2M	# 710	Compare
			GFLOPs	6.9	# 248	Compare
Image Classification	ImageNet	PVTv2-B1	Top 1 Accuracy	78.7%	# 745	Compare
			Number of params	13.1M	# 508	Compare
			GFLOPs	2.1	# 151	Compare
Image Classification	ImageNet	PVTv2-B0	Top 1 Accuracy	70.5%	# 947	Compare
			Number of params	3.4M	# 375	Compare
			GFLOPs	0.6	# 65	Compare
Image Classification	ImageNet	PVTv2-B2	Top 1 Accuracy	82%	# 529	Compare
			Number of params	25.4M	# 597	Compare
			GFLOPs	4	# 191	Compare
Image Classification	ImageNet	PVTv2-B4	Top 1 Accuracy	83.8%	# 357	Compare
			Number of params	82M	# 810	Compare
			GFLOPs	11.8	# 313	Compare

Methods

Add Remove

Absolute Position Encodings • Adam • BPE • Dense Connections • Depthwise Convolution • Dropout • GELU • Label Smoothing • Layer Normalization • Linear Layer • Multi-Head Attention • Position-Wise Feed-Forward Layer • PVTv2 • Residual Connection • Scaled Dot-Product Attention • Softmax • Stochastic Depth • Swin Transformer • Transformer • Vision Transformer

Edit Social Preview

PVT v2: Improved Baselines with Pyramid Vision Transformer

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove