TUB Multi-Object and Multi-Camera Tracking Dataset

The TU Berlin Multi-Object and Multi-Camera Tracking Dataset (MOCAT) is a synthetic dataset to train and test tracking and detection systems in a virtual world. One of the key advantages of this dataset is that there is a complete and accurate ground truth, including pixel accurate object masks, available. All sequences are rendered 3 times, each with different illumination settings. This allows to directly measure the influence of the illumination to the algorithm under test. There are 8 to 10 different camera views (including camera calibration information) with partly overlapping FOVs for each sequence available. The ground truth contains the world position for each object, so the multi-camera tracking performance can be evaluated as well. All sequences contain vehicles, animals and pedestrians as objects to detect and track.


Related Publications


The data is divided into 3 separate sets called evo_1, evo_3 and ineu_1. They can be downloaded separately in 7z compressed archives. Extracted, they are roughly 80Gbyte in size. The sequences of each set are recorded from different camera views with dawn, day and dusk illumination settings. All sequences are recorded at 30fps and are fully annotated. They include 3 types of NPCs: vehicles, pedestrians and animals. The table below states some key information of each set, sample images are given in the gallery below.

Dataset Statistics    
NameTypeNumber of NPCsNumber of CamerasNumber of Frames (per Camera)

Ground Truth

There are 2 versions of the ground truth available. The first one includes for each visible object the information as listed in the table below. The second version is compatible to the devkit of the MOTChallenge and is intended to reproduce the results of the respective publication and to easily evaluate your own detection and tracking results. Both can be downloaded separately.

Ground Truth Information 
framethe respective frame number of the frame and mask files
idunique object id
classclass name of the object: "npc_car", "npc_animal" or "npc_pedestrian"
bboxapproximate bounding box of the object (including occluded area)
posposition of the object in world coordinates
obbvolumetric bounding box of the object
cam poscamera position in world coordinates
cam angleyaw, pitch and roll of the camera
visible bboxbounding box of the bigest visible part of the object
maskcolor of the object in the mask frame (RGB)
Terms of Use

All data is provided for research purposes only and without any warranty. Any commercial use is prohibited. By using the dataset in your research work, you should cite the respective paper.