SAIL-VOS 3D, a dataset with frame-by-frame mesh annotations which extends SAIL-VOS is now available! See the website [here].

The SAIL-VOS (Semantic Amodal Instance Level Video Object Segmentation) is a dataset aiming to stimulate semantic amodal segmentation research. Humans can effortlessly recognize partially occluded objects and reliably estimate their spatial extent beyond the visible. However, few modern computer vision techniques are capable of reasoning about occluded parts of an object. This is partly due to the fact that very few image datasets and no video dataset exist which permit development of those methods. To address this issue, we present the SAIL-VOS dataset, a synthetic dataset extracted from the photo-realistic game GTA-V.


Dataset Statistics

The SAIL-VOS dataset contains in total 201 video sequences and 111,654 frames. The training set contains 160 video sequences (84,781 images, 1,388,389 objects) while the validation set contains 41 video sequences (26,873 images, 507,906 objects). In addition to the training and validation set, we retain a test-dev set and a test-challenge set for future use.

Please see the following table for comparison of statistics with the existing amodal datasets, COCOA, COCOA-cls, D2S and DYCE.

Image or Video Video Image Image Image Image
Synthetic or Real Synthetic Real Real Real Synthetic
Number of Images 111,654 5,073 3,499 5,600 5,500
Number of Classes 162 - 80 60 79
Number of Instances 1,896,296 46,314 10,562 28,720 85,975
Number of Occluded Instances 1,653,980 28,106 5,175 16,337 70,766
Average Occlusion Rate 56.3% 18.8% 10.7% 15.0% 27.7%

Download Dataset

By downloading the data you agree to the following terms:
1. You will use the data only for non-commercial research and educational purposes. Commercial use is prohibited.
2. You will NOT distribute the data.
3. You buy Grand Theft Auto V.


SAIL-VOS: Semantic Amodal Instance Level Video Object Segmentation -- A Synthetic Dataset and Baselines, Yuan-Ting Hu, Hong-Shuo Chen, Kexin Hui, Jia-Bin Huang, Alexander G. Schwing, Computer Vision and Pattern Recognition (CVPR), 2019.
[BibTeX] [PDF]
  author = {Y.-T. Hu and H.-S. Chen and K. Hui and J.-B. Huang and A.~G. Schwing},
  title = { {SAIL-VOS: Semantic Amodal Instance Level Video Object Segmentation -- A Synthetic Dataset and Baselines} },
  booktitle = {Proc. CVPR},
  year = {2019},



This work is supported by the National Science Foundation and by the Global Research Outreach program of Samsung Advanced Institute of Technology.