SAIL-VOS Dataset Statistics Download Publication People

SAIL-VOS 3D, a dataset with frame-by-frame mesh annotations which extends SAIL-VOS is now available! See the website [here].

The SAIL-VOS (Semantic Amodal Instance Level Video Object Segmentation) is a dataset aiming to stimulate semantic amodal segmentation research. Humans can effortlessly recognize partially occluded objects and reliably estimate their spatial extent beyond the visible. However, few modern computer vision techniques are capable of reasoning about occluded parts of an object. This is partly due to the fact that very few image datasets and no video dataset exist which permit development of those methods. To address this issue, we present the SAIL-VOS dataset, a synthetic dataset extracted from the photo-realistic game GTA-V.

Video

Dataset Statistics

The SAIL-VOS dataset contains in total 201 video sequences and 111,654 frames. The training set contains 160 video sequences (84,781 images, 1,388,389 objects) while the validation set contains 41 video sequences (26,873 images, 507,906 objects). In addition to the training and validation set, we retain a test-dev set and a test-challenge set for future use.

Please see the following table for comparison of statistics with the existing amodal datasets, COCOA, COCOA-cls, D2S and DYCE.

	SAIL-VOS (Ours)	COCOA	COCOA-cls	D2S	DYCE
Image or Video	Video	Image	Image	Image	Image
Synthetic or Real	Synthetic	Real	Real	Real	Synthetic
Number of Images	111,654	5,073	3,499	5,600	5,500
Number of Classes	162	-	80	60	79
Number of Instances	1,896,296	46,314	10,562	28,720	85,975
Number of Occluded Instances	1,653,980	28,106	5,175	16,337	70,766
Average Occlusion Rate	56.3%	18.8%	10.7%	15.0%	27.7%

Download Dataset

By downloading the data you agree to the following terms:
1. You will use the data only for non-commercial research and educational purposes. Commercial use is prohibited.
2. You will NOT distribute the data.
3. You buy Grand Theft Auto V.

Publication

SAIL-VOS: Semantic Amodal Instance Level Video Object Segmentation -- A Synthetic Dataset and Baselines, Yuan-Ting Hu, Hong-Shuo Chen, Kexin Hui, Jia-Bin Huang, Alexander G. Schwing, Computer Vision and Pattern Recognition (CVPR), 2019.
[BibTeX] [PDF]

@inproceedings{HuCVPR2019,
  author = {Y.-T. Hu and H.-S. Chen and K. Hui and J.-B. Huang and A.~G. Schwing},
  title = { {SAIL-VOS: Semantic Amodal Instance Level Video Object Segmentation -- A Synthetic Dataset and Baselines} },
  booktitle = {Proc. CVPR},
  year = {2019},
}