The TU Berlin Multi-Object and Multi-Camera Tracking Dataset (MOCAT) is a synthetic dataset to train and test tracking and detection systems in a virtual world. One of the key advantages of this dataset is that there is a complete and accurate ground truth, including pixel accurate object masks, available. All sequences are rendered 3 times, each with different illumination settings. This allows to directly measure the influence of the illumination to the algorithm under test. There are 8 to 10 different camera views (including camera calibration information) with partly overlapping FOVs for each sequence available. The ground truth contains the world position for each object, so the multi-camera tracking performance can be evaluated as well. All sequences contain vehicles, animals and pedestrians as objects to detect and track.
The data is divided into 3 separate sets called evo_1, evo_3 and ineu_1. They can be downloaded separately in 7z compressed archives. Extracted, they are roughly 80Gbyte in size. The sequences of each set are recorded from different camera views with dawn, day and dusk illumination settings. All sequences are recorded at 30fps and are fully annotated. They include 3 types of NPCs: vehicles, pedestrians and animals. The table below states some key information of each set, sample images are given in the gallery below.
|Number of NPCs
|Number of Cameras
|Number of Frames (per Camera)
There are 2 versions of the ground truth available. The first one includes for each visible object the information as listed in the table below. The second version is compatible to the devkit of the MOTChallenge and is intended to reproduce the results of the respective publication and to easily evaluate your own detection and tracking results. Both can be downloaded separately.
|Ground Truth Information
|the respective frame number of the frame and mask files
|unique object id
|class name of the object: "npc_car", "npc_animal" or "npc_pedestrian"
|approximate bounding box of the object (including occluded area)
|position of the object in world coordinates
|volumetric bounding box of the object
|camera position in world coordinates
|yaw, pitch and roll of the camera
|bounding box of the bigest visible part of the object
|color of the object in the mask frame (RGB)
All data is provided for research purposes only and without any warranty. Any commercial use is prohibited. By using the dataset in your research work, you should cite the respective paper.