SAIL-VOS 3D Dataset

Download

[Recommended] Use the download script we share to download the dataset including the RGB frames, the meshes, camera matrices, the segmentation masks and the MSCOCO style json annotation files. Send us a request via email for the script.

Dataset Overview

The SAIL-VOS 3D dataset contains the RGB frames, the object meshes (.obj files), camera matrices (Rt and K), the visible masks (modal masks) and the amodal masks.

After you unzip the files, you will see folders which correspondes to video sequences.

.
├── ah_1_mcs_1               
├── ah_3a_ext   
├── ah_3a_mcs_3                 
└── ...

Under each video sequence, you will see three folders images/, visible/ and camera/ which stores the RGB frames, the instance level visible segmentation masks and camera matrices, respectively. Other folders with a 4-digit prefix store the amodal segmentation masks, and the 4-digit prefix number represents the object ID which is consistent to the visible masks stored in visible/.

ah_1_mcs_1  
├── images          # RGB frames      
├── visible         # visible masks
├── camera          # camera matrices
├── depth           # depth data, available upon requests
├── 0001_*          # meshes and amodal masks of the object with object id = 1            
└── ...

The folder images stores the RGB frames.

images  
├── 000000.bmp      # the first frame
├── 000001.bmp      # the second frame       
├── 000002.bmp      # the third frame 
└── ...

The folder visible stores the visible masks

visible  
├── 000000.npy      # the visible segmentation of the first frame
├── 000001.npy      # the visible segmentation of the second frame
├── 000002.npy      # the visible segmentation of the third frame
└── ...

To load the visible mask in python,

import numpy as np
m = np.load('ah_1_mcs_1/visible/000000.npy')

To find the visible mask of the object with object ID = 1,

import numpy as np
m = np.load('ah_1_mcs_1/visible/000000.npy')
m1 = (m==1)

The folder camera stores the camera matrices

camera  
├── 000000.yaml      # the camera matrices (Rt and K) of the first frame
├── 000001.yaml      # the camera matrices (Rt and K) of the second frame
├── 000002.yaml      # the camera matrices (Rt and K) of the third frame
└── ...

To load the camera matrices in python,

import yaml
m = yaml.load(open('ah_1_mcs_1/camera/000000.yaml'))
print(m['K'])
print(m['Rt'])

The mesh of the object with object ID = 1 are in the object folder 0001_*. All the mesh.obj files are in the camera coordinate!

ah_1_mcs_1  
├── 0001_Ped_000000225514697_000000000702722_00   # meshes/amodal masks for object with ID=1
│   ├── 000000_mesh/mesh.obj   # the mesh for object with ID=1 in the first frame
│   ├── 000001_mesh/mesh.obj   # the mesh for object with ID=1 in the second frame
│   ├── 000002_mesh/mesh.obj   # the mesh for object with ID=1 in the third frame
│   └── ...
├── 0007_Ped_000002602752943_000000000000002_00   # meshes/amodal masks for object with ID=7
├── 0008_Ped_000003254803008_000000000702978_00   # meshes/amodal masks for object with ID=8
└── ...

The amodal segmentation annotations of the object with object ID = 1 are in the folder 0001_*.

ah_1_mcs_1  
├── 0001_Ped_000000225514697_000000000702722_00   # amodal masks for object with ID=1
│   ├── 000000.png   # amodal masks for object with ID=1 in the first frame
│   ├── 000001.png   # amodal masks for object with ID=1 in the second frame
│   ├── 000002.png   # amodal masks for object with ID=1 in the third frame
│   └── ...
├── 0007_Ped_000002602752943_000000000000002_00   # amodal masks for object with ID=7
├── 0008_Ped_000003254803008_000000000702978_00   # amodal masks for object with ID=8
└── ...

An example python code to read the amodal segmentation:

from PIL import Image
import numpy as np
am = Image.open('ah_1_mcs_1/0001_Ped_000000225514697_000000000172546_00/000000.png')
am = np.asarray(am)
print(np.unique(am)) # [0, 255]

The folder depth stores the depth data. Depth data is available upon requests.

depth  
├── 000000.npy      # depth of the first frame
├── 000001.npy      # depth of the second frame
├── 000002.npy      # depth of the third frame
└── ...

To load the visible mask in python,

import numpy as np
depth = np.load('ah_1_mcs_1/depth/000000.npy')